Structural regular expressions

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Structural regular expressions
@ 2010-09-07 19:25 Tom
  2010-09-07 20:08 ` Lennart Borgman
  2010-09-08  0:00 ` Drew Adams
  0 siblings, 2 replies; 42+ messages in thread
From: Tom @ 2010-09-07 19:25 UTC (permalink / raw)
  To: emacs-devel

This structural regex thing is interesting. You can perform operations
(e.g. replace text) on all strings in the file, or everywhere except
in strings and comments, etc. Here's the description of the feature
on the E editor blog if someone wants to implement something like
this for emacs:

http://e-texteditor.com/blog/2010/beyond-vi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-07 19:25 Structural regular expressions Tom
@ 2010-09-07 20:08 ` Lennart Borgman
  2010-09-07 20:27   ` Tom
  2010-09-08  0:00 ` Drew Adams
  1 sibling, 1 reply; 42+ messages in thread
From: Lennart Borgman @ 2010-09-07 20:08 UTC (permalink / raw)
  To: Tom; +Cc: emacs-devel

On Tue, Sep 7, 2010 at 9:25 PM, Tom <levelhalom@gmail.com> wrote:
> This structural regex thing is interesting. You can perform operations
> (e.g. replace text) on all strings in the file, or everywhere except
> in strings and comments, etc. Here's the description of the feature
> on the E editor blog if someone wants to implement something like
> this for emacs:
>
> http://e-texteditor.com/blog/2010/beyond-vi


Looks indeed like a useful idea. I suggest adding a new function
argument PREDICATE to query-replace-regexp etc. (Think of the argument
PREDICATE in completing-read.)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-07 20:08 ` Lennart Borgman
@ 2010-09-07 20:27   ` Tom
  2010-09-07 21:08     ` Lennart Borgman
  2010-09-08  1:13     ` Eric Schulte
  0 siblings, 2 replies; 42+ messages in thread
From: Tom @ 2010-09-07 20:27 UTC (permalink / raw)
  To: emacs-devel

Lennart Borgman <lennart.borgman <at> gmail.com> writes:
> 
> Looks indeed like a useful idea. I suggest adding a new function
> argument PREDICATE to query-replace-regexp etc. (Think of the argument
> PREDICATE in completing-read.)
> 

It can be a good start, but the feature in the E editor is more general
than search and replace. You can perform any operation on the selected 
text. It's sort of like working on the narrowed part of a buffer, only 
the narrowed part in this case consists of several separate ranges of 
the same buffer (like all comments, etc.).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-07 20:27   ` Tom
@ 2010-09-07 21:08     ` Lennart Borgman
  2010-09-08  1:13     ` Eric Schulte
  1 sibling, 0 replies; 42+ messages in thread
From: Lennart Borgman @ 2010-09-07 21:08 UTC (permalink / raw)
  To: Tom; +Cc: emacs-devel

On Tue, Sep 7, 2010 at 10:27 PM, Tom <levelhalom@gmail.com> wrote:
> Lennart Borgman <lennart.borgman <at> gmail.com> writes:
>>
>> Looks indeed like a useful idea. I suggest adding a new function
>> argument PREDICATE to query-replace-regexp etc. (Think of the argument
>> PREDICATE in completing-read.)
>>
>
> It can be a good start, but the feature in the E editor is more general
> than search and replace. You can perform any operation on the selected
> text. It's sort of like working on the narrowed part of a buffer, only
> the narrowed part in this case consists of several separate ranges of
> the same buffer (like all comments, etc.).

That makes me think of my favorite idea for a better multi major mode
support in Emacs:

    Hide parts of the buffer from all low level functions.

Such an ability could be used here too.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Structural regular expressions
  2010-09-07 19:25 Structural regular expressions Tom
  2010-09-07 20:08 ` Lennart Borgman
@ 2010-09-08  0:00 ` Drew Adams
  1 sibling, 0 replies; 42+ messages in thread
From: Drew Adams @ 2010-09-08  0:00 UTC (permalink / raw)
  To: 'Tom', emacs-devel

> This structural regex thing is interesting. You can perform operations
> (e.g. replace text) on all strings in the file, or everywhere except
> in strings and comments, etc. Here's the description of the feature
> on the E editor blog if someone wants to implement something like
> this for emacs: http://e-texteditor.com/blog/2010/beyond-vi

FWIW -

Not to pretend that this is exactly the same thing, but you can use Icicles to
do that.  A similar approach could be adopted by vanilla Emacs.

Icicles can use a text property to identify the parts of the buffer to search.
Those parts then act as completion candidates that you can match using a regexp
or other pattern (which you can change on the fly, to dynamically filter the
search hits).

Font-locking already provides such labeling-using-a-property, for free. It was
designed with another purpose in mind, of course, so the buffer parts identified
by font-lock might not always be those most pertinent for the job at hand.
Depends on just what "structures" you need - those provided by font-lock are
pretty basic.

Anyway, as an example, using the identification provided by font-lock, you can
use `C-c "' (`icicle-search-text-property') to search (e.g. using regexps) among
only the strings or only the comments, etc. of a buffer (or of multiple buffers
or files) - based on their different font-lock faces. (You cannot, however,
search among the complementary parts - e.g. the non-comments, without defining a
new Icicles search command.)

Font-lock faces can be used this way to do what you describe, provided the
"structural" parts of the buffer you are interested in are font-locked using
different faces.

This feature does not depend on font-lock, however.  The text property that is
used to divide the buffer into searchable parts need not be `face' - any
property will do.  So if you have a function that parses buffer parts (code
structures) in a more meaningful way (in some sense) than font-locking does, it
can add a text property with different values to identify the parts, and Icicles
search can exploit that labeling immediately.

And the property could be an overlay property instead of a text property.  And
you can replace matches while you search, on-demand.  And you could easily
define a specialized search command that allows other actions besides
replacement (e.g. a popup menu of alternative actions).

http://www.emacswiki.org/emacs/Icicles_-_Other_Search_Commands#toc2

In addition, it looks like the "structure" described in the blog post you cited
is in fact defined just by a set of regexp matches (but I'm no expert on reading
vi-ese):

Y/^\n/ V%A.*Pike<enter> \ V|^%T

It looks as though a few simple patterns do the trick to select the target
lines, for the example given.  If true, then for that simple kind of structure
definition you can just use ordinary Icicles search - no need for any fancy
(non-regexp) parsing or the application of a text property.  Ordinary Icicles
search (like the text-property search) lets you combine the filtering of any
number of input patterns (e.g. regexps).

And if you have a hairy pattern or set of patterns that you want to reuse,
instead of typing it interactively each time (as would seem to be the case for
the bibtex/refer references, though the blob touts the "effortlessness" of
typing such incantations), then you can define a command that incorporates that
info for the initial Icicle-search parse.

`C-c =' (`icicle-imenu') does that, for instance: it just passes the hairy imenu
regexp to `icicle-search'.  Any additional, dynamic pattern you then type just
filters the imenu candidates (e.g. function definitions).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-07 20:27   ` Tom
  2010-09-07 21:08     ` Lennart Borgman
@ 2010-09-08  1:13     ` Eric Schulte
  2010-09-08  8:46       ` Stefan Monnier
  2010-09-09 15:51       ` Tom
  1 sibling, 2 replies; 42+ messages in thread
From: Eric Schulte @ 2010-09-08  1:13 UTC (permalink / raw)
  To: Tom; +Cc: emacs-devel

Tom <levelhalom@gmail.com> writes:

> Lennart Borgman <lennart.borgman <at> gmail.com> writes:
>> 
>> Looks indeed like a useful idea. I suggest adding a new function
>> argument PREDICATE to query-replace-regexp etc. (Think of the argument
>> PREDICATE in completing-read.)
>> 
>
> It can be a good start, but the feature in the E editor is more general
> than search and replace. You can perform any operation on the selected 
> text. It's sort of like working on the narrowed part of a buffer, only 
> the narrowed part in this case consists of several separate ranges of 
> the same buffer (like all comments, etc.).

Would generalizing the narrowing behavior to arbitrarily many ranges in
a buffer instead of a single range have extensive ramifications?  Would
this be an easy or difficult thing to implement?

If it's not too difficult then providing behavior like that mentioned in
the article above should be trivial.

Cheers -- Eric

hmm, it seems that `narrow-to-region' works by changing the bounds (min
and max indices) of the current buffer, not something that naturally
generalizes to multiple regions.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-08  1:13     ` Eric Schulte
@ 2010-09-08  8:46       ` Stefan Monnier
  2010-09-08  9:20         ` Lawrence Mitchell
  2010-09-09 15:51       ` Tom
  1 sibling, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2010-09-08  8:46 UTC (permalink / raw)
  To: Eric Schulte; +Cc: Tom, emacs-devel

> Would generalizing the narrowing behavior to arbitrarily many ranges in
> a buffer instead of a single range have extensive ramifications?  Would
> this be an easy or difficult thing to implement?

Since the non-narrowed part is not displayed at all, it wouldn't be
quite what we want anyway.
We'd need to add something new, tho it could be based on something
pre-existing (e.g. it could rely on text properties like to `invisible'
and/or `intangible').

> If it's not too difficult then providing behavior like that mentioned in
> the article above should be trivial.

Nothing's trivial when you have to ensure some amount of backward
compatibility with code written many years ago ;-)

But of course, it would be OK to start with something that may break
pre-existing code, as long as it's only broken when you use the
new feature.

And I agree with Lennart, that such a new tool, if done right, could be
a good basis for better multi-mode support.

        Stefan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-08  8:46       ` Stefan Monnier
@ 2010-09-08  9:20         ` Lawrence Mitchell
  2010-09-08 10:30           ` Kan-Ru Chen
  2010-09-08 14:29           ` Stefan Monnier
  0 siblings, 2 replies; 42+ messages in thread
From: Lawrence Mitchell @ 2010-09-08  9:20 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier wrote:
>> Would generalizing the narrowing behavior to arbitrarily many ranges in
>> a buffer instead of a single range have extensive ramifications?  Would
>> this be an easy or difficult thing to implement?

> Since the non-narrowed part is not displayed at all, it wouldn't be
> quite what we want anyway.
> We'd need to add something new, tho it could be based on something
> pre-existing (e.g. it could rely on text properties like to `invisible'
> and/or `intangible').

>> If it's not too difficult then providing behavior like that mentioned in
>> the article above should be trivial.

> Nothing's trivial when you have to ensure some amount of backward
> compatibility with code written many years ago ;-)

> But of course, it would be OK to start with something that may break
> pre-existing code, as long as it's only broken when you use the
> new feature.

> And I agree with Lennart, that such a new tool, if done right, could be
> a good basis for better multi-mode support.

A halfway house, similar to that suggested by Drew, would be
something like
http://www2.ph.ed.ac.uk/~s0198183/multi-region.el.  ISTR some
discussion when it was posted in g.e.sources, *grovels through
mail*:
http://thread.gmane.org/gmane.emacs.sources/1390

Maybe this is a useful feature to now think about incorporating
:P.

Lawrence
-- 
Lawrence Mitchell <wence@gmx.li>




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-08  9:20         ` Lawrence Mitchell
@ 2010-09-08 10:30           ` Kan-Ru Chen
  2010-09-09  6:34             ` Harald Hanche-Olsen
  2010-09-08 14:29           ` Stefan Monnier
  1 sibling, 1 reply; 42+ messages in thread
From: Kan-Ru Chen @ 2010-09-08 10:30 UTC (permalink / raw)
  To: emacs-devel

On Wed, Sep 8, 2010 at 5:20 PM, Lawrence Mitchell <wence@gmx.li> wrote:
> Stefan Monnier wrote:
>>> Would generalizing the narrowing behavior to arbitrarily many ranges in
>>> a buffer instead of a single range have extensive ramifications?  Would
>>> this be an easy or difficult thing to implement?
>
>> Since the non-narrowed part is not displayed at all, it wouldn't be
>> quite what we want anyway.
>> We'd need to add something new, tho it could be based on something
>> pre-existing (e.g. it could rely on text properties like to `invisible'
>> and/or `intangible').
>
>>> If it's not too difficult then providing behavior like that mentioned in
>>> the article above should be trivial.
>
>> Nothing's trivial when you have to ensure some amount of backward
>> compatibility with code written many years ago ;-)
>
>> But of course, it would be OK to start with something that may break
>> pre-existing code, as long as it's only broken when you use the
>> new feature.
>
>> And I agree with Lennart, that such a new tool, if done right, could be
>> a good basis for better multi-mode support.
>
> A halfway house, similar to that suggested by Drew, would be
> something like
> http://www2.ph.ed.ac.uk/~s0198183/multi-region.el.  ISTR some
> discussion when it was posted in g.e.sources, *grovels through
> mail*:
> http://thread.gmane.org/gmane.emacs.sources/1390
>
> Maybe this is a useful feature to now think about incorporating
> :P.

Could this be implemented like a `virtual-buffer'.

  (with-virtual-buffer LABEL &rest BODY)

From the virtual-buffer point of view, the multiple regions marked by
LABEL are as a whole, connected buffer. Then legacy code could work on
this buffer without change.

- Kanru



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-08  9:20         ` Lawrence Mitchell
  2010-09-08 10:30           ` Kan-Ru Chen
@ 2010-09-08 14:29           ` Stefan Monnier
  2010-09-08 15:52             ` Lawrence Mitchell
  2010-09-09 20:47             ` Davis Herring
  1 sibling, 2 replies; 42+ messages in thread
From: Stefan Monnier @ 2010-09-08 14:29 UTC (permalink / raw)
  To: Lawrence Mitchell; +Cc: emacs-devel

> A halfway house, similar to that suggested by Drew, would be something
> like http://www2.ph.ed.ac.uk/~s0198183/multi-region.el.  ISTR some
> discussion when it was posted in g.e.sources, *grovels through mail*:
> http://thread.gmane.org/gmane.emacs.sources/1390

> Maybe this is a useful feature to now think about incorporating

Indeed, we could probably go a long way by simply extending our notion
of region so as to allow it to be non-contiguous.

Patches welcome,


        Stefan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-08 14:29           ` Stefan Monnier
@ 2010-09-08 15:52             ` Lawrence Mitchell
  2010-09-08 22:46               ` Stefan Monnier
  2010-09-09 20:47             ` Davis Herring
  1 sibling, 1 reply; 42+ messages in thread
From: Lawrence Mitchell @ 2010-09-08 15:52 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier wrote:
>> A halfway house, similar to that suggested by Drew, would be something
>> like http://www2.ph.ed.ac.uk/~s0198183/multi-region.el.  ISTR some
>> discussion when it was posted in g.e.sources, *grovels through mail*:
>> http://thread.gmane.org/gmane.emacs.sources/1390

>> Maybe this is a useful feature to now think about incorporating

> Indeed, we could probably go a long way by simply

This must be a new and exciting definition of simply that I am
not previously aware of.  Else I'm being particularly dense.

> extending our notion of region so as to allow it to be
> non-contiguous.

Glancing through the source, this seems like it would be a pretty
major change.  I guess BEGV and ZV would have to be changed from
buffer positions to lists of buffer positions.  Then everything
that looked at them would be updated to respect this change.  And
so forth.  Or do I have the wrong end of the stick?

Lawrence
-- 
Lawrence Mitchell <wence@gmx.li>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-08 15:52             ` Lawrence Mitchell
@ 2010-09-08 22:46               ` Stefan Monnier
  2010-09-09  7:07                 ` David Kastrup
  0 siblings, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2010-09-08 22:46 UTC (permalink / raw)
  To: Lawrence Mitchell; +Cc: emacs-devel

>> extending our notion of region so as to allow it to be
>> non-contiguous.

> Glancing through the source, this seems like it would be a pretty
> major change.  I guess BEGV and ZV would have to be changed from
> buffer positions to lists of buffer positions.  Then everything
> that looked at them would be updated to respect this change.  And
> so forth.  Or do I have the wrong end of the stick?

Yes, you're confusing the region with the visible part of the buffer:
BEGV and ZV have to do with narrowing and extending them to discontinuous
areas would indeed be a major undertaking, whereas the region is just the
part of the buffer between point and mark.


        Stefan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-08 10:30           ` Kan-Ru Chen
@ 2010-09-09  6:34             ` Harald Hanche-Olsen
  0 siblings, 0 replies; 42+ messages in thread
From: Harald Hanche-Olsen @ 2010-09-09  6:34 UTC (permalink / raw)
  To: emacs-devel

+ Kan-Ru Chen <kanru@kanru.info>:

> Could this be implemented like a `virtual-buffer'.
> 
>   (with-virtual-buffer LABEL &rest BODY)
> 
> From the virtual-buffer point of view, the multiple regions marked by
> LABEL are as a whole, connected buffer. Then legacy code could work on
> this buffer without change.

And you might get very surprised when a search-and-replace replaced
some text spanning more than one of the regions.

- Harald



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-08 22:46               ` Stefan Monnier
@ 2010-09-09  7:07                 ` David Kastrup
  2010-09-09 17:03                   ` Stefan Monnier
  0 siblings, 1 reply; 42+ messages in thread
From: David Kastrup @ 2010-09-09  7:07 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> extending our notion of region so as to allow it to be
>>> non-contiguous.
>
>> Glancing through the source, this seems like it would be a pretty
>> major change.  I guess BEGV and ZV would have to be changed from
>> buffer positions to lists of buffer positions.  Then everything
>> that looked at them would be updated to respect this change.  And
>> so forth.  Or do I have the wrong end of the stick?
>
> Yes, you're confusing the region with the visible part of the buffer:
> BEGV and ZV have to do with narrowing and extending them to discontinuous
> areas would indeed be a major undertaking, whereas the region is just the
> part of the buffer between point and mark.

And what is narrow-to-region supposed to do then?

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-08  1:13     ` Eric Schulte
  2010-09-08  8:46       ` Stefan Monnier
@ 2010-09-09 15:51       ` Tom
  2010-09-09 16:01         ` Lennart Borgman
  1 sibling, 1 reply; 42+ messages in thread
From: Tom @ 2010-09-09 15:51 UTC (permalink / raw)
  To: emacs-devel

Eric Schulte <schulte.eric <at> gmail.com> writes:
> >
> > It can be a good start, but the feature in the E editor is more general
> > than search and replace. You can perform any operation on the selected 
> > text. It's sort of like working on the narrowed part of a buffer, only 
> > the narrowed part in this case consists of several separate ranges of 
> > the same buffer (like all comments, etc.).
> 
> Would generalizing the narrowing behavior to arbitrarily many ranges in
> a buffer instead of a single range have extensive ramifications?  

The mentioned E feature is sort of like narrowing, but is not the same if
I understand it correctly.

For example, if I want to replace the word "formatted" to "structured" in all
comments then considering the following case (<> indicates comment range 
boundaries):

<.... is the format>
<tedious work is done here>

the word "format" at the end of the first range and the word "tedious" at
the beginning of the next should not be handled as a contiguous text, because
in that case the text "formattedious" would match the word
to be replaced ("formatted") and it's clearly not correct behavior.

So if such "multiple narrowing" is implemented it must maintain the boundaries
between the different ranges and shouldn't simply handle it as contiguous text.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09 15:51       ` Tom
@ 2010-09-09 16:01         ` Lennart Borgman
  2010-09-09 16:23           ` Tom
  0 siblings, 1 reply; 42+ messages in thread
From: Lennart Borgman @ 2010-09-09 16:01 UTC (permalink / raw)
  To: Tom; +Cc: emacs-devel

On Thu, Sep 9, 2010 at 5:51 PM, Tom <levelhalom@gmail.com> wrote:
>
> So if such "multiple narrowing" is implemented it must maintain the boundaries
> between the different ranges and shouldn't simply handle it as contiguous text.


Or handle the text outside the multiple narrowing as whitespace.

I think that maybe would make it easier to implement. Then it can be
implemented in the low level routines that access the buffer contents.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09 16:01         ` Lennart Borgman
@ 2010-09-09 16:23           ` Tom
  2010-09-09 16:44             ` Lennart Borgman
  2010-09-09 19:27             ` Daniel Colascione
  0 siblings, 2 replies; 42+ messages in thread
From: Tom @ 2010-09-09 16:23 UTC (permalink / raw)
  To: emacs-devel

Lennart Borgman <lennart.borgman <at> gmail.com> writes:

> 
> On Thu, Sep 9, 2010 at 5:51 PM, Tom <levelhalom <at> gmail.com> wrote:
> >
> > So if such "multiple narrowing" is implemented it must maintain the 
boundaries
> > between the different ranges and shouldn't simply handle it as 
contiguous text.
> 
> Or handle the text outside the multiple narrowing as whitespace.
> 

And what happens then if I want to regexp replace "foo\s-*bar"? It would
still be susceptible to the above mentioned boundary problem, so it's
not a robust workaround.





^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09 16:23           ` Tom
@ 2010-09-09 16:44             ` Lennart Borgman
  2010-09-09 16:53               ` Tom
  2010-09-09 19:27             ` Daniel Colascione
  1 sibling, 1 reply; 42+ messages in thread
From: Lennart Borgman @ 2010-09-09 16:44 UTC (permalink / raw)
  To: Tom; +Cc: emacs-devel

On Thu, Sep 9, 2010 at 6:23 PM, Tom <levelhalom@gmail.com> wrote:
> Lennart Borgman <lennart.borgman <at> gmail.com> writes:
>
>>
>> On Thu, Sep 9, 2010 at 5:51 PM, Tom <levelhalom <at> gmail.com> wrote:
>> >
>> > So if such "multiple narrowing" is implemented it must maintain the
> boundaries
>> > between the different ranges and shouldn't simply handle it as
> contiguous text.
>>
>> Or handle the text outside the multiple narrowing as whitespace.
>>
>
> And what happens then if I want to regexp replace "foo\s-*bar"? It would
> still be susceptible to the above mentioned boundary problem, so it's
> not a robust workaround.

It does not look to me like it would be susceptible to that problem.
Maybe I am misunderstanding you. Can you explain more in detail why
you think it would be a problem with the solution I suggested? (Please
note that I said the parts outside of the multiple narrowing should be
treated as "whitespace", not "invisible" or "non-existent".)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09 16:44             ` Lennart Borgman
@ 2010-09-09 16:53               ` Tom
  2010-09-09 17:02                 ` Lennart Borgman
  0 siblings, 1 reply; 42+ messages in thread
From: Tom @ 2010-09-09 16:53 UTC (permalink / raw)
  To: emacs-devel

Lennart Borgman <lennart.borgman <at> gmail.com> writes:

> > And what happens then if I want to regexp replace "foo\s-*bar"? It would
> > still be susceptible to the above mentioned boundary problem, so it's
> > not a robust workaround.
> 
> It does not look to me like it would be susceptible to that problem.
> Maybe I am misunderstanding you. Can you explain more in detail why
> you think it would be a problem with the solution I suggested? (Please
> note that I said the parts outside of the multiple narrowing should be
> treated as "whitespace", not "invisible" or "non-existent".)

Maybe I am misunderstanding you.

As I understood your suggestion:

<.....foo> ... whitespace ... <bar ... >

Since \s- as a regexp matches whitespace the regexp "foo\s-*bar" would match 
the end of the first range and the beginning of the second range separated
by whitespace.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09 16:53               ` Tom
@ 2010-09-09 17:02                 ` Lennart Borgman
  0 siblings, 0 replies; 42+ messages in thread
From: Lennart Borgman @ 2010-09-09 17:02 UTC (permalink / raw)
  To: Tom; +Cc: emacs-devel

On Thu, Sep 9, 2010 at 6:53 PM, Tom <levelhalom@gmail.com> wrote:
> Lennart Borgman <lennart.borgman <at> gmail.com> writes:
>
>> > And what happens then if I want to regexp replace "foo\s-*bar"? It would
>> > still be susceptible to the above mentioned boundary problem, so it's
>> > not a robust workaround.
>>
>> It does not look to me like it would be susceptible to that problem.
>> Maybe I am misunderstanding you. Can you explain more in detail why
>> you think it would be a problem with the solution I suggested? (Please
>> note that I said the parts outside of the multiple narrowing should be
>> treated as "whitespace", not "invisible" or "non-existent".)
>
> Maybe I am misunderstanding you.
>
> As I understood your suggestion:
>
> <.....foo> ... whitespace ... <bar ... >
>
> Since \s- as a regexp matches whitespace the regexp "foo\s-*bar" would match
> the end of the first range and the beginning of the second range separated
> by whitespace.

Ah, I see. Yes, it could be a problem in an example like that.

So if something like my suggestion was implemented then perhaps we
have to distinguish this "whitespace" from other whitespace.

However I think that it would still be useful to let it behave as
whitespace in many situations. I am thinking about parsers in multi
major mode buffers for example.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09  7:07                 ` David Kastrup
@ 2010-09-09 17:03                   ` Stefan Monnier
  2010-09-10 12:23                     ` David Kastrup
  0 siblings, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2010-09-09 17:03 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

> And what is narrow-to-region supposed to do then?

Signal an error when the region is not contiguous?


        Stefan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09 16:23           ` Tom
  2010-09-09 16:44             ` Lennart Borgman
@ 2010-09-09 19:27             ` Daniel Colascione
  1 sibling, 0 replies; 42+ messages in thread
From: Daniel Colascione @ 2010-09-09 19:27 UTC (permalink / raw)
  To: Tom; +Cc: emacs-devel

On Thu, Sep 9, 2010 at 9:23 AM, Tom <levelhalom@gmail.com> wrote:
> And what happens then if I want to regexp replace "foo\s-*bar"? It would
> still be susceptible to the above mentioned boundary problem, so it's
> not a robust workaround.

What about a brand new syntax class?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-08 14:29           ` Stefan Monnier
  2010-09-08 15:52             ` Lawrence Mitchell
@ 2010-09-09 20:47             ` Davis Herring
  2010-09-09 22:52               ` Lennart Borgman
  1 sibling, 1 reply; 42+ messages in thread
From: Davis Herring @ 2010-09-09 20:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Lawrence Mitchell, emacs-devel

> Indeed, we could probably go a long way by simply extending our notion
> of region so as to allow it to be non-contiguous.
>
> Patches welcome,

This is no patch, but I had an idea for the interface for this:

Definition: simple region
The interval (possibly empty) between point and mark, exactly as it is now.

Variable: region-list
A set of non-empty, disjoint intervals, always local to each buffer.  Each
is a cons of two markers.  Typically each is highlighted in a subtle
fashion, even outside Transient Mark Mode.

Function: multi-region
Returns the union of the region list and the simple region (using
`point-marker' and/or `mark-marker' as needed).  (If the simple region is
empty and the region list is not, the simple region is ignored and the
return value equals `region-list'.)  This is the user-visible
possibly-disconnected upgrade to the region concept.

User option: multi-region-separator (default: "\n")
String to insert between separate intervals of the multi-region when
concatenated.

(defun multi-region-string (&optional sep)
  "Return the contents of the multi-region.
Separate intervals with SEP (or `multi-region-separator' if omitted)."
  (mapconcat (lambda (c) (buffer-substring (car c) (cdr c)))
             (multi-region) (or sep multi-region-separator)))

Rule: (interactive "r") maps over the multi-region.
Perhaps with some way to disable it (prefix command, or just a quick way
to suppress/restore the region list while leaving the simple region
alone), `call-interactively' would handle an interactive spec once
(including any prompting), then repeatedly call the function with the
start and end set to the start and end of each interval in the
multi-region in turn, in buffer order.

Rationale: This is a very intrusive change!  But it's often the right
thing (delete-region, upcase-region, ispell-region, translate-region,
underline-region, indent-region, count-lines-region,
expand-region-abbrevs, and probably eval-region) and is one of very few
ways of letting existing code apply in any sense to multi-regions.  (If
doing it by default is too much, a prefix "mutlify" command could be
provided instead, and all of this could be optional.)

Another spec ("R"?) could be added for commands like `narrow-to-region'
that should either operate only on the simple region (or fail if the
region isn't simple?).  Yet another spec might pass all of the
multi-region at once so that commands like `kill-region' and
`write-region' could use `multi-region-string' or otherwise act on them
coherently.

Command: keep-region
Unions the current simple region into the region list (may coalesce
existing intervals).  Immediately afterwards, the simple region is
entirely redundant and has no effect (until point or mark moves).

Command: drop-region
Removes the current simple region from the region list (may split existing
intervals).  Immediately afterwards, the multi-region is no different!

Command: drop-this-region
Remove the interval that contains point from the region list.

Command: drop-multi-region
Clears the region list (causing the multi-region to equal the simple region).

These low-level commands would be too tedious to be the principal user
mechanism for manipulating the multi-region.  So we add:

Command: mark-regexp
Add to the region list all matches for a regexp (following point, for
consistency with `how-many' and `keep-lines').  Framing the regexp with
^.*....*$ allows this command to mark lines (or a separate command could
do that for you).  Even when lines are marked in that fashion, the
newlines between them are not, so each line is a separate interval.

Command: unmark-regexp
Delete from the region list all regions within which a match for a regexp
exists.

These are analogous to the "highlight all" feature in Firefox, for
instance.  Then we can navigate among them:

Command: next-region
Move point to the closest following beginning of a region list interval. 
This could be used in macros.

Command: count-regions
Display in the echo area how many intervals are in the region list and the
multi-region (which may be one more or many fewer).

Since region lists are complicated things, the user might want to save
them and reuse them later, so letting registers hold them would be good. 
(Should they store the region list or the multi-region?)

WDOT?

Davis

-- 
This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09 20:47             ` Davis Herring
@ 2010-09-09 22:52               ` Lennart Borgman
  2010-09-10 10:48                 ` Stefan Monnier
  2010-09-10 15:43                 ` Richard Stallman
  0 siblings, 2 replies; 42+ messages in thread
From: Lennart Borgman @ 2010-09-09 22:52 UTC (permalink / raw)
  To: herring; +Cc: Lawrence Mitchell, Stefan Monnier, emacs-devel

On Thu, Sep 9, 2010 at 10:47 PM, Davis Herring <herring@lanl.gov> wrote:
>> Indeed, we could probably go a long way by simply extending our notion
>> of region so as to allow it to be non-contiguous.
>>
>> Patches welcome,
>
> This is no patch, but I had an idea for the interface for this:
>
> Definition: simple region
> The interval (possibly empty) between point and mark, exactly as it is now.
>
> Variable: region-list
> A set of non-empty, disjoint intervals, always local to each buffer.  Each
> is a cons of two markers.  Typically each is highlighted in a subtle
> fashion, even outside Transient Mark Mode.
>
> Function: multi-region
> Returns the union of the region list and the simple region (using
> `point-marker' and/or `mark-marker' as needed).  (If the simple region is
> empty and the region list is not, the simple region is ignored and the
> return value equals `region-list'.)  This is the user-visible
> possibly-disconnected upgrade to the region concept.
>
> User option: multi-region-separator (default: "\n")
> String to insert between separate intervals of the multi-region when
> concatenated.
>
> (defun multi-region-string (&optional sep)
>  "Return the contents of the multi-region.
> Separate intervals with SEP (or `multi-region-separator' if omitted)."
...
>
> WDOT?


I think that kind of interface could be built upon a low level
interface, but the important thing to discus at this point is rather
the low level interface. Otherwise I think we might soon has multiple
ways of doing this.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09 22:52               ` Lennart Borgman
@ 2010-09-10 10:48                 ` Stefan Monnier
  2010-09-10 15:43                 ` Richard Stallman
  1 sibling, 0 replies; 42+ messages in thread
From: Stefan Monnier @ 2010-09-10 10:48 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Lawrence Mitchell, emacs-devel

> I think that kind of interface could be built upon a low level
> interface, but the important thing to discus at this point is rather
> the low level interface. Otherwise I think we might soon has multiple
> ways of doing this.

The proposal is to completely avoid any low-level changes, and only work
at the level of regions.  Actually, there might be some changes at
a lowish level to handle highlighting, but that's about it.

That would be of no help for multi-mode buffers, of course.


        Stefan



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09 17:03                   ` Stefan Monnier
@ 2010-09-10 12:23                     ` David Kastrup
  2010-09-10 13:31                       ` Stefan Monnier
  0 siblings, 1 reply; 42+ messages in thread
From: David Kastrup @ 2010-09-10 12:23 UTC (permalink / raw)
  To: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> And what is narrow-to-region supposed to do then?
>
> Signal an error when the region is not contiguous?

Opens another can of worms because quite a number of commands accepting
a region argument implement it internally using narrow-to-region.

-- 
David Kastrup




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-10 12:23                     ` David Kastrup
@ 2010-09-10 13:31                       ` Stefan Monnier
  0 siblings, 0 replies; 42+ messages in thread
From: Stefan Monnier @ 2010-09-10 13:31 UTC (permalink / raw)
  To: David Kastrup; +Cc: emacs-devel

>>> And what is narrow-to-region supposed to do then?
>> Signal an error when the region is not contiguous?
> Opens another can of worms because quite a number of commands accepting
> a region argument implement it internally using narrow-to-region.

What can of worms?  Old uses will still work just as well as before.

And as explained in earlier threads, some of those commands could be
magically made to work by letting an "r" in the interactive spec mean
"apply once per contiguous region segment".  I haven't experimented with
such a change, so it may or may not be an acceptable heuristic, but in
any case I don't see it as a problem that commands need to be adjusted
in order to work in the case that the region is split into more than
1 chunk.

        Stefan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-09 22:52               ` Lennart Borgman
  2010-09-10 10:48                 ` Stefan Monnier
@ 2010-09-10 15:43                 ` Richard Stallman
  2010-09-10 17:03                   ` David House
       [not found]                   ` <AANLkTi=dv8n40x-rTtz@mail.gmail.com>
  1 sibling, 2 replies; 42+ messages in thread
From: Richard Stallman @ 2010-09-10 15:43 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: wence, monnier, emacs-devel

Could someone please explain what a "structural regular expression"
means?  The message that started the thread did not say.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-10 15:43                 ` Richard Stallman
@ 2010-09-10 17:03                   ` David House
       [not found]                   ` <AANLkTi=dv8n40x-rTtz@mail.gmail.com>
  1 sibling, 0 replies; 42+ messages in thread
From: David House @ 2010-09-10 17:03 UTC (permalink / raw)
  To: rms; +Cc: wence, Lennart Borgman, monnier, emacs-devel

On 10 September 2010 16:43, Richard Stallman <rms@gnu.org> wrote:
> Could someone please explain what a "structural regular expression"
> means?  The message that started the thread did not say.

It is not a property of the regexps themselves, but pertains to
functions that use regexps: namely that they only apply to a subset of
your buffer. For example, you might do a query-replace-regexp on only
the comments of a C file, or an isearch-regexp on only the strings. So
note that the subsets they apply to are non-contiguous in general.

It has been proposed to support this by generalizing the concept of
the region to actually be a list of (contiguous) regions. Another idea
further up was to use special text properties.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
       [not found]                   ` <AANLkTi=dv8n40x-rTtz@mail.gmail.com>
@ 2010-09-10 20:29                     ` Tom
  2010-09-10 23:50                       ` Drew Adams
  2010-09-11 15:49                       ` Richard Stallman
  0 siblings, 2 replies; 42+ messages in thread
From: Tom @ 2010-09-10 20:29 UTC (permalink / raw)
  To: emacs-devel

David House <dmhouse <at> gmail.com> writes:

> 
> On 10 September 2010 16:43, Richard Stallman <rms <at> gnu.org> wrote:
> > Could someone please explain what a "structural regular expression"
> > means?  The message that started the thread did not say.
> 
> It is not a property of the regexps themselves, but pertains to
> functions that use regexps: namely that they only apply to a subset of
> your buffer. For example, you might do a query-replace-regexp on only
> the comments of a C file, or an isearch-regexp on only the strings. So
> note that the subsets they apply to are non-contiguous in general.
> 

It is the property of the regexps, because the main point of the
feature is there are enhanced regexps which are aware of the
syntax of the buffer contents, so you can select comments,
strings, scopes, etc.

Examples for the mentioned blog post:

V/pattern  select all matches
V|pattern  select all lines with match
V{scope    select all matching scopes
Vatype     select all objects (inclusive)
Vttype     select all objects (exclusive)
Y/pattern  select everything but matches
Y|pattern  select all lines without match
Y{scope    select everything but scope
Yatype     select everything but objects (inclusive)
Yttype     select everything but objects (exclusive)

And you can perform further selections after the first selection
recursively, so you can select comments in scopes, etc.

The document that inspired the above feature of the E editor:

"The current UNIX® text processing tools are weakened by the
built-in concept of a line. There is a simple notation that can
describe the `shape' of files when the typical array-of-lines
picture is inadequate. That notation is regular
expressions. Using regular expressions to describe the structure
in addition to the contents of files has interesting
applications, and yields elegant methods for dealing with some
problems the current tools handle clumsily. When operations using
these expressions are composed, the result is reminiscent of
shell pipelines."

http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf

> It has been proposed to support this by generalizing the concept of
> the region to actually be a list of (contiguous) regions. Another idea
> further up was to use special text properties.

I wonder if there is a simpler solution.

For example, during the selection process a separate buffer could
display interactively the current selection made by the user and
this buffer could be set up with text properties and such, so
that it is known where the individual ranges start and end.

After the user done his work in this temporary buffer the
resulting ranges could be copied back to the appropriate sections
of the original buffer thereby committing the changes.

This way nothing has to be changed in Emacs core.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Structural regular expressions
  2010-09-10 20:29                     ` Tom
@ 2010-09-10 23:50                       ` Drew Adams
  2010-09-11  2:23                         ` Miles Bader
  2010-09-11 15:49                       ` Richard Stallman
  1 sibling, 1 reply; 42+ messages in thread
From: Drew Adams @ 2010-09-10 23:50 UTC (permalink / raw)
  To: 'Tom', emacs-devel

> the main point of the feature is there are enhanced regexps
> which are aware of the syntax of the buffer contents, so you
> can select comments, strings, scopes, etc.
> 
> Examples for the mentioned blog post:
> 
> V/pattern  select all matches
> V|pattern  select all lines with match
> V{scope    select all matching scopes
> Vatype     select all objects (inclusive)
> Vttype     select all objects (exclusive)
> Y/pattern  select everything but matches
> Y|pattern  select all lines without match
> Y{scope    select everything but scope
> Yatype     select everything but objects (inclusive)
> Yttype     select everything but objects (exclusive)
> 
> And you can perform further selections after the first selection
> recursively, so you can select comments in scopes, etc.
> 
> The document that inspired the above feature of the E editor:
> 
> "The current UNIXR text processing tools are weakened by the
> built-in concept of a line. There is a simple notation that can
> describe the `shape' of files when the typical array-of-lines
> picture is inadequate. That notation is regular
> expressions. Using regular expressions to describe the structure
> in addition to the contents of files has interesting
> applications, and yields elegant methods for dealing with some
> problems the current tools handle clumsily. When operations using
> these expressions are composed, the result is reminiscent of
> shell pipelines."
> 
> http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf

Some more from the paper you cite:

"In these programs, regular expressions are being used to do
 more than just select the input, the way they are used in all
 the traditional UNIX tools.  Instead, the expressions are doing
 a simple parsing (or at least a breaking into lexical tokens)
 of the input. Such expressions are called structural regular
 expressions or just structural expressions."

And: [these programs] "benefit from an additional regular expression to define
the structure of [their] input."

That's the real point, I believe: the paper touts the use of regexps to divide
text into chunks that match - chunks that are not necessarily lines, in order to
then act on those chunks in some way.

This is just what Icicles search does.  You can provide an initial regexp that
parses the buffer to define a set of search contexts.  The regexp .* just parses
it into all of its lines.  Regexp \([^\f]*[\f]\|[^\f]+$\) parses it into pages;
\(.+\n\)+ into paragraphs; [A-Z][^.?!]+[.?!] into sentences; and so on.

You can provide such a regexp interactively, or define different commands that
encapsulate different context-defining regexps (e.g. search-lines (occur/grep),
search-paragraphs, search-sentences).

In general, a regexp used this way does not necessarily _partition_ the buffer -
there can be areas (gaps) that do not match at all.  Hence the mention by others
of possibly non-contiguous areas ("regions" or multi-part region).  The regexp
`(concat comint-prompt-regexp "\S-.*")' selects comint prompt lines, for
instance; and using an imenu generic regexp selects just function etc.
definitions for the current mode (just their first lines, typically).

But while a regexp is one handy way to parse a buffer, there is no reason to
limit the idea to using a regexp.  In spite of the fancy name "structural
regexp", _any_ way of dividing the buffer into a set of areas of interest can be
useful in the same way (e.g. as search contexts).  The real argument is that
lines are not the only way to go - grep/occur is not the only search tool (which
is not really news).

And it is misleading to say that regexps "describe the `shape' of files when the
typical array of lines picture is inadequate."  It is not about some file
"shape" or an inherent "structure" of the file content (e.g. code structure).

It is about being able to shape the parts of interest as you want and not always
be limited to lines as parts.  Use any regexp or any other pattern or algorithm
to define the _parts you want_ (e.g as search contexts).  _You_ define the
shapes of interest.

Can you use regexps to mimic/follow the "shape" of code?  Sure.  But you can
also use them to shape text (including code parts) in other ways.  Generalize
the shaping by regexps, and generalize the tools of shaping beyond just regexps.

And there is not even any need to limit this to areas of a buffer or file.

What this is really about (IMO) is these features:

1. Some way to come up with a set of strings as defined by pairs of buffer
positions.  The strings need not be associated with buffer positions, but that
is the typical case discussed.

2. Some way to filter those strings as a set.

3. Some way to act on the (filtered) strings, individually and perhaps also as a
set.  Search is one such action.

For the "structural regexp" fan, #1 is a regexp.  But a regexp is only one tool
you might use to parse a buffer into such a set.

For Icicles, #1 is often a regexp, but it need not be.  Font-lock provides
another #1.  Font-lock typically uses an ordered combination of regexps, but in
the general case it allows any parsing functions.  There are any number of other
#1's that could prove interesting.  A sophisticated parser can be just as useful
for #1 as is a simple regexp.

As another #1, Icicles can treat bookmarked regions as a search set.  (This
assumes an ability, as in Bookmark+, to bookmark regions: 2 positions, not 1.)
IOW, the strings ("regions") to be searched need not even be in the same buffer
or file.  A tags file could be used similarly, to "parse" a set of source files
into strings that represent function etc. definitions.

All that's needed is some way to define a set of strings and their locations.

For #2, Icicles lets you type an input pattern that filters the set dynamically
(incrementally).  Pattern matching here can use regexps, fuzzy matching,
whatever.  You can "pipe-filter": progressively apply multiple patterns to
narrow the set.  And you can complement the set of matches (complement the
current set wrt the previous filtering).

For #3, search has been mentioned as an obvious action for individual matches.
Likewise search-and-replace.  (Those are what Icicles search provides by
default.)  But in general any action might be applicable.  

A final comment.  There is nothing earth-shaking about using a regexp in this
way, to define a set of strings/areas to act on.  It hardly merits special
trumpeting.  And in spite of the usefulness of not being _limited_ to a
hard-coded parsing into lines, it is also true that (partly because much in the
way of programming does involve lines) acting on the lines of a file or buffer
or command-line input stream or error log _is_ often useful.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-10 23:50                       ` Drew Adams
@ 2010-09-11  2:23                         ` Miles Bader
  2010-09-11  7:44                           ` Tom
                                             ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Miles Bader @ 2010-09-11  2:23 UTC (permalink / raw)
  To: emacs-devel

"Drew Adams" <drew.adams@Oracle.Com> writes:
> That's the real point, I believe: the paper touts the use of regexps
> to divide text into chunks that match - chunks that are not
> necessarily lines, in order to then act on those chunks in some way.

Not a good base, I think -- regexps are not really powerful enough to do
the job well.

-Miles

-- 
Happiness, n. An agreeable sensation arising from contemplating the misery of
another.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-11  2:23                         ` Miles Bader
@ 2010-09-11  7:44                           ` Tom
  2010-09-11  7:58                           ` Wojciech Meyer
  2010-09-11 15:04                           ` Drew Adams
  2 siblings, 0 replies; 42+ messages in thread
From: Tom @ 2010-09-11  7:44 UTC (permalink / raw)
  To: emacs-devel

Miles Bader <miles <at> gnu.org> writes:

> 
> "Drew Adams" <drew.adams <at> Oracle.Com> writes:
> > That's the real point, I believe: the paper touts the use of regexps
> > to divide text into chunks that match - chunks that are not
> > necessarily lines, in order to then act on those chunks in some way.
> 
> Not a good base, I think -- regexps are not really powerful enough to do
> the job well.
> 

Well, it doesn't have to be implemented with regexps, but the concept itself
seems useful that you can address syntactical blocks intelligently in the
buffer.

So it's a selection mechanism which can utilize both standard,
line oriented regexps and syntax-aware, multi line oriented patterns 
in a recursive fashion to efficiently select parts of the buffer to
perform an operation on.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-11  2:23                         ` Miles Bader
  2010-09-11  7:44                           ` Tom
@ 2010-09-11  7:58                           ` Wojciech Meyer
  2010-09-11  8:33                             ` tomas
  2010-09-11 15:04                           ` Drew Adams
  2 siblings, 1 reply; 42+ messages in thread
From: Wojciech Meyer @ 2010-09-11  7:58 UTC (permalink / raw)
  To: Miles Bader; +Cc: emacs-devel

Miles Bader <miles@gnu.org> writes:

> "Drew Adams" <drew.adams@Oracle.Com> writes:
>> That's the real point, I believe: the paper touts the use of regexps
>> to divide text into chunks that match - chunks that are not
>> necessarily lines, in order to then act on those chunks in some way.
>
> Not a good base, I think -- regexps are not really powerful enough to do
> the job well.

Yes regexp are quite limited.
Maybe a simple PEG parser based on packrat, with a syntax sugar for
defining one line set of rules?

>
> -Miles

Wojciech



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-11  7:58                           ` Wojciech Meyer
@ 2010-09-11  8:33                             ` tomas
  0 siblings, 0 replies; 42+ messages in thread
From: tomas @ 2010-09-11  8:33 UTC (permalink / raw)
  To: Wojciech Meyer; +Cc: emacs-devel, Miles Bader

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, Sep 11, 2010 at 08:58:04AM +0100, Wojciech Meyer wrote:
> Miles Bader <miles@gnu.org> writes:
> 
> > "Drew Adams" <drew.adams@Oracle.Com> writes:
> >> That's the real point, I believe: the paper touts the use of regexps
> >> to divide text into chunks that match - chunks that are not
> >> necessarily lines, in order to then act on those chunks in some way.
> >
> > Not a good base, I think -- regexps are not really powerful enough to do
> > the job well.
> 
> Yes regexp are quite limited.
> Maybe a simple PEG parser based on packrat, with a syntax sugar for
> defining one line set of rules?

While PEG is interesting in itself (and I think Emacs should have
something like that, just to test its strengths/weaknesses wrt regex), I
think Drew is right: A way, *any* way to define a "buffer subset", maybe
partitioned into "chunks" is useful here. So at this level, I'd think
concentrating on interface design (user & programmer) makes most sense,
abstracting from possible implementations (regex, peg, font-lock,
hand-built parser).

The (possible) implementations should (I think) just guide the design of
the interfaces (as examples). In the ideal case, it should be possible
to use whatever implementation is most helpful (or combine them: union,
intersection, symmetric difference).

Just dreaming?

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFMiz7OBcgs9XrR2kYRAjZvAJ9Hzc4Dk2Z4t3wohMQJX/8544MvIQCffrxr
WKNM0E3e/fJ3UF61J4Ez7c4=
=tDCG
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Structural regular expressions
  2010-09-11  2:23                         ` Miles Bader
  2010-09-11  7:44                           ` Tom
  2010-09-11  7:58                           ` Wojciech Meyer
@ 2010-09-11 15:04                           ` Drew Adams
  2 siblings, 0 replies; 42+ messages in thread
From: Drew Adams @ 2010-09-11 15:04 UTC (permalink / raw)
  To: 'Miles Bader', emacs-devel

> > That's the real point, I believe: the paper touts the use of regexps
> > to divide text into chunks that match - chunks that are not
> > necessarily lines, in order to then act on those chunks in some way.
> 
> Not a good base, I think -- regexps are not really powerful 
> enough to do the job well.

That's too vague.  Good base (= ?) for what?  Do what job?  How well is well?

We use regexps to select chunks of text all the time in Emacs.  Regexps are not
sufficiently powerful to select an _arbitrary_ chunk, but so what?  You can use
them to select lots of kinds of chunks (reg langs + Emacs "regexp" extensions) -
certainly more than just lines.

Nothing limits us to regexps (= one of my points), but regexps can be useful in
selecting chunks of text.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-10 20:29                     ` Tom
  2010-09-10 23:50                       ` Drew Adams
@ 2010-09-11 15:49                       ` Richard Stallman
  2010-09-12 13:48                         ` Stefan Monnier
  1 sibling, 1 reply; 42+ messages in thread
From: Richard Stallman @ 2010-09-11 15:49 UTC (permalink / raw)
  To: Tom; +Cc: emacs-devel

Thanks for the explanation.  I think the term "structural regular expressions"
is misleading because its grammatical construction implies a different
kind of regexp, rather than a different way of applying them.

    V/pattern  select all matches
    V|pattern  select all lines with match
    V{scope    select all matching scopes
    Vatype     select all objects (inclusive)
    Vttype     select all objects (exclusive)
    Y/pattern  select everything but matches
    Y|pattern  select all lines without match
    Y{scope    select everything but scope
    Yatype     select everything but objects (inclusive)
    Yttype     select everything but objects (exclusive)

Are `V/' etc. literal, or do they stand for some other text?

Where would this syntax be used?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-11 15:49                       ` Richard Stallman
@ 2010-09-12 13:48                         ` Stefan Monnier
  2010-09-12 14:09                           ` Lennart Borgman
  0 siblings, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2010-09-12 13:48 UTC (permalink / raw)
  To: rms; +Cc: Tom, emacs-devel

> Thanks for the explanation.  I think the term "structural regular
> expressions" is misleading because its grammatical construction
> implies a different kind of regexp, rather than a different way of
> applying them.

>     V/pattern  select all matches
>     V|pattern  select all lines with match
>     V{scope    select all matching scopes
>     Vatype     select all objects (inclusive)
>     Vttype     select all objects (exclusive)
>     Y/pattern  select everything but matches
>     Y|pattern  select all lines without match
>     Y{scope    select everything but scope
>     Yatype     select everything but objects (inclusive)
>     Yttype     select everything but objects (exclusive)

> Are `V/' etc. literal, or do they stand for some other text?

> Where would this syntax be used?

The term "structural regular expression" is indeed misleading, I think.
They use it to refer to the combination of 2 things:
1- the ability to select particular kinds of elements in the text
   (which we could do in Emacs with non-contiguous regions).  The main
   example being commands that select "all the strings" or "all the
   comments" or that inverts the selection (select everything that
   wasn't selected before).
2- the ability to apply regexp-operations to only those selected parts
   of the text (to the extent that we already have commands that apply
   only to the active region, we already have that, although it would
   probably require several tweaks to make it work right in the face of
   non-contiguous regions).

Together this allows you to do things like apply query-replace to all
non-string non-comment parts of the buffer, which is why they call it
"structural" regexps.

        Stefan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-12 13:48                         ` Stefan Monnier
@ 2010-09-12 14:09                           ` Lennart Borgman
  2010-09-12 16:43                             ` Drew Adams
  0 siblings, 1 reply; 42+ messages in thread
From: Lennart Borgman @ 2010-09-12 14:09 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Tom, rms, emacs-devel

On Sun, Sep 12, 2010 at 3:48 PM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
>
> The term "structural regular expression" is indeed misleading, I think.
> They use it to refer to the combination of 2 things:
> 1- the ability to select particular kinds of elements in the text
>   (which we could do in Emacs with non-contiguous regions).  The main
>   example being commands that select "all the strings" or "all the
>   comments" or that inverts the selection (select everything that
>   wasn't selected before).
> 2- the ability to apply regexp-operations to only those selected parts
>   of the text (to the extent that we already have commands that apply
>   only to the active region, we already have that, although it would
>   probably require several tweaks to make it work right in the face of
>   non-contiguous regions).
>
> Together this allows you to do things like apply query-replace to all
> non-string non-comment parts of the buffer, which is why they call it
> "structural" regexps.


There is a related need for searching that could be built on such
capability: AND.

Quite often I find myself searching for a node/a tree in a big .org
file containing both word a and word b.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Structural regular expressions
  2010-09-12 14:09                           ` Lennart Borgman
@ 2010-09-12 16:43                             ` Drew Adams
  2010-09-12 17:03                               ` Lennart Borgman
  0 siblings, 1 reply; 42+ messages in thread
From: Drew Adams @ 2010-09-12 16:43 UTC (permalink / raw)
  To: 'Lennart Borgman', 'Stefan Monnier'
  Cc: 'Tom', rms, emacs-devel

> There is a related need for searching that could be built on such
> capability: AND.
> 
> Quite often I find myself searching for a node/a tree in a big .org
> file containing both word a and word b.

And in an unspecified order, no doubt.  In vanilla Emacs the closest we have for
this is `apropos' with keyword input.  The behavior is special-built for this
particular command; it is not a general feature.

In Icicles you can hit `S-SPC' to get such `AND' filtering during completion (of
any input).  You can add patterns on the fly, preceding each by `S-SPC', thus
narrowing down the choices progressively, as you see fit.  I call this
"progressive" completion.
http://www.emacswiki.org/emacs/Icicles_-_Nutshell_View#toc10

And you can hit `C-~' to get the complement (`AND NOT') after seeing what
`S-SPC' (`AND') shows.

Example: `M-x for' shows command names that have substring `for' (similarly `C-h
f for', `C-h v for',...).  `M-x for S-SPC ediff' shows the subset of those `for'
commands that also have substring `ediff'.  `M-x for S-SPC ediff C-~' shows the
names that have substring `for' but do not have substring `ediff'.
http://www.emacswiki.org/emacs/Icicles_-_Nutshell_View#toc11

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Structural regular expressions
  2010-09-12 16:43                             ` Drew Adams
@ 2010-09-12 17:03                               ` Lennart Borgman
  2010-09-12 21:31                                 ` Drew Adams
  0 siblings, 1 reply; 42+ messages in thread
From: Lennart Borgman @ 2010-09-12 17:03 UTC (permalink / raw)
  To: Drew Adams; +Cc: Tom, emacs-devel, Stefan Monnier, rms

On Sun, Sep 12, 2010 at 6:43 PM, Drew Adams <drew.adams@oracle.com> wrote:
>> There is a related need for searching that could be built on such
>> capability: AND.
>>
>> Quite often I find myself searching for a node/a tree in a big .org
>> file containing both word a and word b.
>
> And in an unspecified order, no doubt.  In vanilla Emacs the closest we have for
> this is `apropos' with keyword input.  The behavior is special-built for this
> particular command; it is not a general feature.
>
> In Icicles you can hit `S-SPC' to get such `AND' filtering during completion (of
> any input).  You can add patterns on the fly, preceding each by `S-SPC', thus
> narrowing down the choices progressively, as you see fit.  I call this
> "progressive" completion.
> http://www.emacswiki.org/emacs/Icicles_-_Nutshell_View#toc10

It does not look to me like it I can search for org nodes containing
both word a and word b with the progressive completion in Icicles. Can
I?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Structural regular expressions
  2010-09-12 17:03                               ` Lennart Borgman
@ 2010-09-12 21:31                                 ` Drew Adams
  0 siblings, 0 replies; 42+ messages in thread
From: Drew Adams @ 2010-09-12 21:31 UTC (permalink / raw)
  To: 'Lennart Borgman'
  Cc: 'Tom', emacs-devel, 'Stefan Monnier', rms

> It does not look to me like it I can search for org nodes containing
> both word a and word b with the progressive completion in Icicles. Can I?

I have no idea whether you can.  If you think something does not work as
documented, you can file an Icicles bug using `M-x icicle-send-bug-report',
being sure to provide a concrete recipe, preferably starting from `emacs -Q'.

But whenever input completion is available in Org mode you should be able to use
Icicles progressive completion.  (I do not use Org mode, myself.  I am convinced
that it is a Very Good Thing (TM), but I do not have any particular use for it.)

And as for search, yes, you can use progressive completion during Icicles
search.  You could, for example, (1) use a context regexp that defines the
search space to be the Org nodes (strings), and then (2) type your first word
`a', `S-SPC', and your second word `b', to narrow the search space to those Org
nodes that contain both words `a' and `b' (in either order).

It is up to you to come up with the regexp needed to do #1.  If for some reason
Org nodes cannot be selected using just a regexp (dunno), then you will need to
define a function that parses the buffer(s) and creates an alist of Org nodes
(buffer substrings and their positions).

How so?  You can use function `icicle-search' to define your own search command.
If you need something other than a regexp to parse your text into the set of
search contexts (e.g. Org nodes), then pass a parsing function as the second arg
to `icicle-search', `SCAN-FN-OR-REGEXP'.  The function needs to fill variable
`icicle-candidates-alist': Each alist entry has a search-context string as car
and the string end's buffer position as cdr.

For an example of a function that serves as arg `SCAN-FN-OR-REGEXP' see
`icicle-search-char-property-scan'.  It parses a buffer into the strings that
are determined by a text or overlay property (e.g. `face') with a given value
(e.g. `font-lock-string-face').

These are the `icicle-search' args (from the doc string):

---
(icicle-search BEG END SCAN-FN-OR-REGEXP REQUIRE-MATCH
               &optional WHERE &rest ARGS)

BEG is the beginning of the region to search; END is the end.
SCAN-FN-OR-REGEXP: Regexp or function that determines the set of
  initial candidates (match zones).  If a function, it is passed, as
  arguments, the buffer to search, the beginning and end of the search
  region in that buffer, and ARGS.
REQUIRE-MATCH is passed to `completing-read'.
Optional arg WHERE is a list of bookmarks, buffers, or files to be
  searched.  If nil, then search only the current buffer or region.
  (To search bookmarks you must also use library `bookmark+.el').
ARGS are arguments that are passed to function SCAN-FN-OR-REGEXP.

Note that if SCAN-FN-OR-REGEXP is a regexp string, then function
`icicle-search-regexp-scan' is used to determine the set of match
zones.  You can limit hits to regexp matches that also satisfy a
predicate, by using `(PREDICATE)' as ARGS: PREDICATE is then passed to
`icicle-search-regexp-scan' as its PREDICATE argument.
---

So if you have a simple regexp that selects the Org nodes, then just use command
`icicle-search' interactively (`C-c `'): type that regexp followed by `RET',
then `a S-SPC b'.  If the regexp is complex and you don't want to type it
interactively, then define a search command `foo' like this:

(defun foo ()
  (interactive)
  (icicle-search nil nil org-regexp t))

(defconst org-regexp "HAIRY-ORG-NODE-IDENTIFYING-REGEXP")

If you do not have such a regexp - e.g. Org-node parsing is too complex for a
regexp, then define a search command `foo' like this:

(defun foo ()
  (interactive)
  (icicle-search nil nil 'org-parser t))

(defun org-parser (buffer beg end)
  "Fill `icicle-candidates-alist' with Org nodes and their positions."
  ... ; Parsing magic
  (setq icicle-candidates-alist ...))

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2010-09-12 21:31 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-07 19:25 Structural regular expressions Tom
2010-09-07 20:08 ` Lennart Borgman
2010-09-07 20:27   ` Tom
2010-09-07 21:08     ` Lennart Borgman
2010-09-08  1:13     ` Eric Schulte
2010-09-08  8:46       ` Stefan Monnier
2010-09-08  9:20         ` Lawrence Mitchell
2010-09-08 10:30           ` Kan-Ru Chen
2010-09-09  6:34             ` Harald Hanche-Olsen
2010-09-08 14:29           ` Stefan Monnier
2010-09-08 15:52             ` Lawrence Mitchell
2010-09-08 22:46               ` Stefan Monnier
2010-09-09  7:07                 ` David Kastrup
2010-09-09 17:03                   ` Stefan Monnier
2010-09-10 12:23                     ` David Kastrup
2010-09-10 13:31                       ` Stefan Monnier
2010-09-09 20:47             ` Davis Herring
2010-09-09 22:52               ` Lennart Borgman
2010-09-10 10:48                 ` Stefan Monnier
2010-09-10 15:43                 ` Richard Stallman
2010-09-10 17:03                   ` David House
     [not found]                   ` <AANLkTi=dv8n40x-rTtz@mail.gmail.com>
2010-09-10 20:29                     ` Tom
2010-09-10 23:50                       ` Drew Adams
2010-09-11  2:23                         ` Miles Bader
2010-09-11  7:44                           ` Tom
2010-09-11  7:58                           ` Wojciech Meyer
2010-09-11  8:33                             ` tomas
2010-09-11 15:04                           ` Drew Adams
2010-09-11 15:49                       ` Richard Stallman
2010-09-12 13:48                         ` Stefan Monnier
2010-09-12 14:09                           ` Lennart Borgman
2010-09-12 16:43                             ` Drew Adams
2010-09-12 17:03                               ` Lennart Borgman
2010-09-12 21:31                                 ` Drew Adams
2010-09-09 15:51       ` Tom
2010-09-09 16:01         ` Lennart Borgman
2010-09-09 16:23           ` Tom
2010-09-09 16:44             ` Lennart Borgman
2010-09-09 16:53               ` Tom
2010-09-09 17:02                 ` Lennart Borgman
2010-09-09 19:27             ` Daniel Colascione
2010-09-08  0:00 ` Drew Adams

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).