* Structural regular expressions @ 2010-09-07 19:25 Tom 2010-09-07 20:08 ` Lennart Borgman 2010-09-08 0:00 ` Drew Adams 0 siblings, 2 replies; 42+ messages in thread From: Tom @ 2010-09-07 19:25 UTC (permalink / raw) To: emacs-devel This structural regex thing is interesting. You can perform operations (e.g. replace text) on all strings in the file, or everywhere except in strings and comments, etc. Here's the description of the feature on the E editor blog if someone wants to implement something like this for emacs: http://e-texteditor.com/blog/2010/beyond-vi ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-07 19:25 Structural regular expressions Tom @ 2010-09-07 20:08 ` Lennart Borgman 2010-09-07 20:27 ` Tom 2010-09-08 0:00 ` Drew Adams 1 sibling, 1 reply; 42+ messages in thread From: Lennart Borgman @ 2010-09-07 20:08 UTC (permalink / raw) To: Tom; +Cc: emacs-devel On Tue, Sep 7, 2010 at 9:25 PM, Tom <levelhalom@gmail.com> wrote: > This structural regex thing is interesting. You can perform operations > (e.g. replace text) on all strings in the file, or everywhere except > in strings and comments, etc. Here's the description of the feature > on the E editor blog if someone wants to implement something like > this for emacs: > > http://e-texteditor.com/blog/2010/beyond-vi Looks indeed like a useful idea. I suggest adding a new function argument PREDICATE to query-replace-regexp etc. (Think of the argument PREDICATE in completing-read.) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-07 20:08 ` Lennart Borgman @ 2010-09-07 20:27 ` Tom 2010-09-07 21:08 ` Lennart Borgman 2010-09-08 1:13 ` Eric Schulte 0 siblings, 2 replies; 42+ messages in thread From: Tom @ 2010-09-07 20:27 UTC (permalink / raw) To: emacs-devel Lennart Borgman <lennart.borgman <at> gmail.com> writes: > > Looks indeed like a useful idea. I suggest adding a new function > argument PREDICATE to query-replace-regexp etc. (Think of the argument > PREDICATE in completing-read.) > It can be a good start, but the feature in the E editor is more general than search and replace. You can perform any operation on the selected text. It's sort of like working on the narrowed part of a buffer, only the narrowed part in this case consists of several separate ranges of the same buffer (like all comments, etc.). ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-07 20:27 ` Tom @ 2010-09-07 21:08 ` Lennart Borgman 2010-09-08 1:13 ` Eric Schulte 1 sibling, 0 replies; 42+ messages in thread From: Lennart Borgman @ 2010-09-07 21:08 UTC (permalink / raw) To: Tom; +Cc: emacs-devel On Tue, Sep 7, 2010 at 10:27 PM, Tom <levelhalom@gmail.com> wrote: > Lennart Borgman <lennart.borgman <at> gmail.com> writes: >> >> Looks indeed like a useful idea. I suggest adding a new function >> argument PREDICATE to query-replace-regexp etc. (Think of the argument >> PREDICATE in completing-read.) >> > > It can be a good start, but the feature in the E editor is more general > than search and replace. You can perform any operation on the selected > text. It's sort of like working on the narrowed part of a buffer, only > the narrowed part in this case consists of several separate ranges of > the same buffer (like all comments, etc.). That makes me think of my favorite idea for a better multi major mode support in Emacs: Hide parts of the buffer from all low level functions. Such an ability could be used here too. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-07 20:27 ` Tom 2010-09-07 21:08 ` Lennart Borgman @ 2010-09-08 1:13 ` Eric Schulte 2010-09-08 8:46 ` Stefan Monnier 2010-09-09 15:51 ` Tom 1 sibling, 2 replies; 42+ messages in thread From: Eric Schulte @ 2010-09-08 1:13 UTC (permalink / raw) To: Tom; +Cc: emacs-devel Tom <levelhalom@gmail.com> writes: > Lennart Borgman <lennart.borgman <at> gmail.com> writes: >> >> Looks indeed like a useful idea. I suggest adding a new function >> argument PREDICATE to query-replace-regexp etc. (Think of the argument >> PREDICATE in completing-read.) >> > > It can be a good start, but the feature in the E editor is more general > than search and replace. You can perform any operation on the selected > text. It's sort of like working on the narrowed part of a buffer, only > the narrowed part in this case consists of several separate ranges of > the same buffer (like all comments, etc.). Would generalizing the narrowing behavior to arbitrarily many ranges in a buffer instead of a single range have extensive ramifications? Would this be an easy or difficult thing to implement? If it's not too difficult then providing behavior like that mentioned in the article above should be trivial. Cheers -- Eric hmm, it seems that `narrow-to-region' works by changing the bounds (min and max indices) of the current buffer, not something that naturally generalizes to multiple regions. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-08 1:13 ` Eric Schulte @ 2010-09-08 8:46 ` Stefan Monnier 2010-09-08 9:20 ` Lawrence Mitchell 2010-09-09 15:51 ` Tom 1 sibling, 1 reply; 42+ messages in thread From: Stefan Monnier @ 2010-09-08 8:46 UTC (permalink / raw) To: Eric Schulte; +Cc: Tom, emacs-devel > Would generalizing the narrowing behavior to arbitrarily many ranges in > a buffer instead of a single range have extensive ramifications? Would > this be an easy or difficult thing to implement? Since the non-narrowed part is not displayed at all, it wouldn't be quite what we want anyway. We'd need to add something new, tho it could be based on something pre-existing (e.g. it could rely on text properties like to `invisible' and/or `intangible'). > If it's not too difficult then providing behavior like that mentioned in > the article above should be trivial. Nothing's trivial when you have to ensure some amount of backward compatibility with code written many years ago ;-) But of course, it would be OK to start with something that may break pre-existing code, as long as it's only broken when you use the new feature. And I agree with Lennart, that such a new tool, if done right, could be a good basis for better multi-mode support. Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-08 8:46 ` Stefan Monnier @ 2010-09-08 9:20 ` Lawrence Mitchell 2010-09-08 10:30 ` Kan-Ru Chen 2010-09-08 14:29 ` Stefan Monnier 0 siblings, 2 replies; 42+ messages in thread From: Lawrence Mitchell @ 2010-09-08 9:20 UTC (permalink / raw) To: emacs-devel Stefan Monnier wrote: >> Would generalizing the narrowing behavior to arbitrarily many ranges in >> a buffer instead of a single range have extensive ramifications? Would >> this be an easy or difficult thing to implement? > Since the non-narrowed part is not displayed at all, it wouldn't be > quite what we want anyway. > We'd need to add something new, tho it could be based on something > pre-existing (e.g. it could rely on text properties like to `invisible' > and/or `intangible'). >> If it's not too difficult then providing behavior like that mentioned in >> the article above should be trivial. > Nothing's trivial when you have to ensure some amount of backward > compatibility with code written many years ago ;-) > But of course, it would be OK to start with something that may break > pre-existing code, as long as it's only broken when you use the > new feature. > And I agree with Lennart, that such a new tool, if done right, could be > a good basis for better multi-mode support. A halfway house, similar to that suggested by Drew, would be something like http://www2.ph.ed.ac.uk/~s0198183/multi-region.el. ISTR some discussion when it was posted in g.e.sources, *grovels through mail*: http://thread.gmane.org/gmane.emacs.sources/1390 Maybe this is a useful feature to now think about incorporating :P. Lawrence -- Lawrence Mitchell <wence@gmx.li> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-08 9:20 ` Lawrence Mitchell @ 2010-09-08 10:30 ` Kan-Ru Chen 2010-09-09 6:34 ` Harald Hanche-Olsen 2010-09-08 14:29 ` Stefan Monnier 1 sibling, 1 reply; 42+ messages in thread From: Kan-Ru Chen @ 2010-09-08 10:30 UTC (permalink / raw) To: emacs-devel On Wed, Sep 8, 2010 at 5:20 PM, Lawrence Mitchell <wence@gmx.li> wrote: > Stefan Monnier wrote: >>> Would generalizing the narrowing behavior to arbitrarily many ranges in >>> a buffer instead of a single range have extensive ramifications? Would >>> this be an easy or difficult thing to implement? > >> Since the non-narrowed part is not displayed at all, it wouldn't be >> quite what we want anyway. >> We'd need to add something new, tho it could be based on something >> pre-existing (e.g. it could rely on text properties like to `invisible' >> and/or `intangible'). > >>> If it's not too difficult then providing behavior like that mentioned in >>> the article above should be trivial. > >> Nothing's trivial when you have to ensure some amount of backward >> compatibility with code written many years ago ;-) > >> But of course, it would be OK to start with something that may break >> pre-existing code, as long as it's only broken when you use the >> new feature. > >> And I agree with Lennart, that such a new tool, if done right, could be >> a good basis for better multi-mode support. > > A halfway house, similar to that suggested by Drew, would be > something like > http://www2.ph.ed.ac.uk/~s0198183/multi-region.el. ISTR some > discussion when it was posted in g.e.sources, *grovels through > mail*: > http://thread.gmane.org/gmane.emacs.sources/1390 > > Maybe this is a useful feature to now think about incorporating > :P. Could this be implemented like a `virtual-buffer'. (with-virtual-buffer LABEL &rest BODY) From the virtual-buffer point of view, the multiple regions marked by LABEL are as a whole, connected buffer. Then legacy code could work on this buffer without change. - Kanru ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-08 10:30 ` Kan-Ru Chen @ 2010-09-09 6:34 ` Harald Hanche-Olsen 0 siblings, 0 replies; 42+ messages in thread From: Harald Hanche-Olsen @ 2010-09-09 6:34 UTC (permalink / raw) To: emacs-devel + Kan-Ru Chen <kanru@kanru.info>: > Could this be implemented like a `virtual-buffer'. > > (with-virtual-buffer LABEL &rest BODY) > > From the virtual-buffer point of view, the multiple regions marked by > LABEL are as a whole, connected buffer. Then legacy code could work on > this buffer without change. And you might get very surprised when a search-and-replace replaced some text spanning more than one of the regions. - Harald ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-08 9:20 ` Lawrence Mitchell 2010-09-08 10:30 ` Kan-Ru Chen @ 2010-09-08 14:29 ` Stefan Monnier 2010-09-08 15:52 ` Lawrence Mitchell 2010-09-09 20:47 ` Davis Herring 1 sibling, 2 replies; 42+ messages in thread From: Stefan Monnier @ 2010-09-08 14:29 UTC (permalink / raw) To: Lawrence Mitchell; +Cc: emacs-devel > A halfway house, similar to that suggested by Drew, would be something > like http://www2.ph.ed.ac.uk/~s0198183/multi-region.el. ISTR some > discussion when it was posted in g.e.sources, *grovels through mail*: > http://thread.gmane.org/gmane.emacs.sources/1390 > Maybe this is a useful feature to now think about incorporating Indeed, we could probably go a long way by simply extending our notion of region so as to allow it to be non-contiguous. Patches welcome, Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-08 14:29 ` Stefan Monnier @ 2010-09-08 15:52 ` Lawrence Mitchell 2010-09-08 22:46 ` Stefan Monnier 2010-09-09 20:47 ` Davis Herring 1 sibling, 1 reply; 42+ messages in thread From: Lawrence Mitchell @ 2010-09-08 15:52 UTC (permalink / raw) To: emacs-devel Stefan Monnier wrote: >> A halfway house, similar to that suggested by Drew, would be something >> like http://www2.ph.ed.ac.uk/~s0198183/multi-region.el. ISTR some >> discussion when it was posted in g.e.sources, *grovels through mail*: >> http://thread.gmane.org/gmane.emacs.sources/1390 >> Maybe this is a useful feature to now think about incorporating > Indeed, we could probably go a long way by simply This must be a new and exciting definition of simply that I am not previously aware of. Else I'm being particularly dense. > extending our notion of region so as to allow it to be > non-contiguous. Glancing through the source, this seems like it would be a pretty major change. I guess BEGV and ZV would have to be changed from buffer positions to lists of buffer positions. Then everything that looked at them would be updated to respect this change. And so forth. Or do I have the wrong end of the stick? Lawrence -- Lawrence Mitchell <wence@gmx.li> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-08 15:52 ` Lawrence Mitchell @ 2010-09-08 22:46 ` Stefan Monnier 2010-09-09 7:07 ` David Kastrup 0 siblings, 1 reply; 42+ messages in thread From: Stefan Monnier @ 2010-09-08 22:46 UTC (permalink / raw) To: Lawrence Mitchell; +Cc: emacs-devel >> extending our notion of region so as to allow it to be >> non-contiguous. > Glancing through the source, this seems like it would be a pretty > major change. I guess BEGV and ZV would have to be changed from > buffer positions to lists of buffer positions. Then everything > that looked at them would be updated to respect this change. And > so forth. Or do I have the wrong end of the stick? Yes, you're confusing the region with the visible part of the buffer: BEGV and ZV have to do with narrowing and extending them to discontinuous areas would indeed be a major undertaking, whereas the region is just the part of the buffer between point and mark. Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-08 22:46 ` Stefan Monnier @ 2010-09-09 7:07 ` David Kastrup 2010-09-09 17:03 ` Stefan Monnier 0 siblings, 1 reply; 42+ messages in thread From: David Kastrup @ 2010-09-09 7:07 UTC (permalink / raw) To: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >>> extending our notion of region so as to allow it to be >>> non-contiguous. > >> Glancing through the source, this seems like it would be a pretty >> major change. I guess BEGV and ZV would have to be changed from >> buffer positions to lists of buffer positions. Then everything >> that looked at them would be updated to respect this change. And >> so forth. Or do I have the wrong end of the stick? > > Yes, you're confusing the region with the visible part of the buffer: > BEGV and ZV have to do with narrowing and extending them to discontinuous > areas would indeed be a major undertaking, whereas the region is just the > part of the buffer between point and mark. And what is narrow-to-region supposed to do then? -- David Kastrup ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 7:07 ` David Kastrup @ 2010-09-09 17:03 ` Stefan Monnier 2010-09-10 12:23 ` David Kastrup 0 siblings, 1 reply; 42+ messages in thread From: Stefan Monnier @ 2010-09-09 17:03 UTC (permalink / raw) To: David Kastrup; +Cc: emacs-devel > And what is narrow-to-region supposed to do then? Signal an error when the region is not contiguous? Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 17:03 ` Stefan Monnier @ 2010-09-10 12:23 ` David Kastrup 2010-09-10 13:31 ` Stefan Monnier 0 siblings, 1 reply; 42+ messages in thread From: David Kastrup @ 2010-09-10 12:23 UTC (permalink / raw) To: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >> And what is narrow-to-region supposed to do then? > > Signal an error when the region is not contiguous? Opens another can of worms because quite a number of commands accepting a region argument implement it internally using narrow-to-region. -- David Kastrup ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-10 12:23 ` David Kastrup @ 2010-09-10 13:31 ` Stefan Monnier 0 siblings, 0 replies; 42+ messages in thread From: Stefan Monnier @ 2010-09-10 13:31 UTC (permalink / raw) To: David Kastrup; +Cc: emacs-devel >>> And what is narrow-to-region supposed to do then? >> Signal an error when the region is not contiguous? > Opens another can of worms because quite a number of commands accepting > a region argument implement it internally using narrow-to-region. What can of worms? Old uses will still work just as well as before. And as explained in earlier threads, some of those commands could be magically made to work by letting an "r" in the interactive spec mean "apply once per contiguous region segment". I haven't experimented with such a change, so it may or may not be an acceptable heuristic, but in any case I don't see it as a problem that commands need to be adjusted in order to work in the case that the region is split into more than 1 chunk. Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-08 14:29 ` Stefan Monnier 2010-09-08 15:52 ` Lawrence Mitchell @ 2010-09-09 20:47 ` Davis Herring 2010-09-09 22:52 ` Lennart Borgman 1 sibling, 1 reply; 42+ messages in thread From: Davis Herring @ 2010-09-09 20:47 UTC (permalink / raw) To: Stefan Monnier; +Cc: Lawrence Mitchell, emacs-devel > Indeed, we could probably go a long way by simply extending our notion > of region so as to allow it to be non-contiguous. > > Patches welcome, This is no patch, but I had an idea for the interface for this: Definition: simple region The interval (possibly empty) between point and mark, exactly as it is now. Variable: region-list A set of non-empty, disjoint intervals, always local to each buffer. Each is a cons of two markers. Typically each is highlighted in a subtle fashion, even outside Transient Mark Mode. Function: multi-region Returns the union of the region list and the simple region (using `point-marker' and/or `mark-marker' as needed). (If the simple region is empty and the region list is not, the simple region is ignored and the return value equals `region-list'.) This is the user-visible possibly-disconnected upgrade to the region concept. User option: multi-region-separator (default: "\n") String to insert between separate intervals of the multi-region when concatenated. (defun multi-region-string (&optional sep) "Return the contents of the multi-region. Separate intervals with SEP (or `multi-region-separator' if omitted)." (mapconcat (lambda (c) (buffer-substring (car c) (cdr c))) (multi-region) (or sep multi-region-separator))) Rule: (interactive "r") maps over the multi-region. Perhaps with some way to disable it (prefix command, or just a quick way to suppress/restore the region list while leaving the simple region alone), `call-interactively' would handle an interactive spec once (including any prompting), then repeatedly call the function with the start and end set to the start and end of each interval in the multi-region in turn, in buffer order. Rationale: This is a very intrusive change! But it's often the right thing (delete-region, upcase-region, ispell-region, translate-region, underline-region, indent-region, count-lines-region, expand-region-abbrevs, and probably eval-region) and is one of very few ways of letting existing code apply in any sense to multi-regions. (If doing it by default is too much, a prefix "mutlify" command could be provided instead, and all of this could be optional.) Another spec ("R"?) could be added for commands like `narrow-to-region' that should either operate only on the simple region (or fail if the region isn't simple?). Yet another spec might pass all of the multi-region at once so that commands like `kill-region' and `write-region' could use `multi-region-string' or otherwise act on them coherently. Command: keep-region Unions the current simple region into the region list (may coalesce existing intervals). Immediately afterwards, the simple region is entirely redundant and has no effect (until point or mark moves). Command: drop-region Removes the current simple region from the region list (may split existing intervals). Immediately afterwards, the multi-region is no different! Command: drop-this-region Remove the interval that contains point from the region list. Command: drop-multi-region Clears the region list (causing the multi-region to equal the simple region). These low-level commands would be too tedious to be the principal user mechanism for manipulating the multi-region. So we add: Command: mark-regexp Add to the region list all matches for a regexp (following point, for consistency with `how-many' and `keep-lines'). Framing the regexp with ^.*....*$ allows this command to mark lines (or a separate command could do that for you). Even when lines are marked in that fashion, the newlines between them are not, so each line is a separate interval. Command: unmark-regexp Delete from the region list all regions within which a match for a regexp exists. These are analogous to the "highlight all" feature in Firefox, for instance. Then we can navigate among them: Command: next-region Move point to the closest following beginning of a region list interval. This could be used in macros. Command: count-regions Display in the echo area how many intervals are in the region list and the multi-region (which may be one more or many fewer). Since region lists are complicated things, the user might want to save them and reuse them later, so letting registers hold them would be good. (Should they store the region list or the multi-region?) WDOT? Davis -- This product is sold by volume, not by mass. If it appears too dense or too sparse, it is because mass-energy conversion has occurred during shipping. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 20:47 ` Davis Herring @ 2010-09-09 22:52 ` Lennart Borgman 2010-09-10 10:48 ` Stefan Monnier 2010-09-10 15:43 ` Richard Stallman 0 siblings, 2 replies; 42+ messages in thread From: Lennart Borgman @ 2010-09-09 22:52 UTC (permalink / raw) To: herring; +Cc: Lawrence Mitchell, Stefan Monnier, emacs-devel On Thu, Sep 9, 2010 at 10:47 PM, Davis Herring <herring@lanl.gov> wrote: >> Indeed, we could probably go a long way by simply extending our notion >> of region so as to allow it to be non-contiguous. >> >> Patches welcome, > > This is no patch, but I had an idea for the interface for this: > > Definition: simple region > The interval (possibly empty) between point and mark, exactly as it is now. > > Variable: region-list > A set of non-empty, disjoint intervals, always local to each buffer. Each > is a cons of two markers. Typically each is highlighted in a subtle > fashion, even outside Transient Mark Mode. > > Function: multi-region > Returns the union of the region list and the simple region (using > `point-marker' and/or `mark-marker' as needed). (If the simple region is > empty and the region list is not, the simple region is ignored and the > return value equals `region-list'.) This is the user-visible > possibly-disconnected upgrade to the region concept. > > User option: multi-region-separator (default: "\n") > String to insert between separate intervals of the multi-region when > concatenated. > > (defun multi-region-string (&optional sep) > "Return the contents of the multi-region. > Separate intervals with SEP (or `multi-region-separator' if omitted)." ... > > WDOT? I think that kind of interface could be built upon a low level interface, but the important thing to discus at this point is rather the low level interface. Otherwise I think we might soon has multiple ways of doing this. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 22:52 ` Lennart Borgman @ 2010-09-10 10:48 ` Stefan Monnier 2010-09-10 15:43 ` Richard Stallman 1 sibling, 0 replies; 42+ messages in thread From: Stefan Monnier @ 2010-09-10 10:48 UTC (permalink / raw) To: Lennart Borgman; +Cc: Lawrence Mitchell, emacs-devel > I think that kind of interface could be built upon a low level > interface, but the important thing to discus at this point is rather > the low level interface. Otherwise I think we might soon has multiple > ways of doing this. The proposal is to completely avoid any low-level changes, and only work at the level of regions. Actually, there might be some changes at a lowish level to handle highlighting, but that's about it. That would be of no help for multi-mode buffers, of course. Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 22:52 ` Lennart Borgman 2010-09-10 10:48 ` Stefan Monnier @ 2010-09-10 15:43 ` Richard Stallman 2010-09-10 17:03 ` David House [not found] ` <AANLkTi=dv8n40x-rTtz@mail.gmail.com> 1 sibling, 2 replies; 42+ messages in thread From: Richard Stallman @ 2010-09-10 15:43 UTC (permalink / raw) To: Lennart Borgman; +Cc: wence, monnier, emacs-devel Could someone please explain what a "structural regular expression" means? The message that started the thread did not say. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-10 15:43 ` Richard Stallman @ 2010-09-10 17:03 ` David House [not found] ` <AANLkTi=dv8n40x-rTtz@mail.gmail.com> 1 sibling, 0 replies; 42+ messages in thread From: David House @ 2010-09-10 17:03 UTC (permalink / raw) To: rms; +Cc: wence, Lennart Borgman, monnier, emacs-devel On 10 September 2010 16:43, Richard Stallman <rms@gnu.org> wrote: > Could someone please explain what a "structural regular expression" > means? The message that started the thread did not say. It is not a property of the regexps themselves, but pertains to functions that use regexps: namely that they only apply to a subset of your buffer. For example, you might do a query-replace-regexp on only the comments of a C file, or an isearch-regexp on only the strings. So note that the subsets they apply to are non-contiguous in general. It has been proposed to support this by generalizing the concept of the region to actually be a list of (contiguous) regions. Another idea further up was to use special text properties. ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <AANLkTi=dv8n40x-rTtz@mail.gmail.com>]
* Re: Structural regular expressions [not found] ` <AANLkTi=dv8n40x-rTtz@mail.gmail.com> @ 2010-09-10 20:29 ` Tom 2010-09-10 23:50 ` Drew Adams 2010-09-11 15:49 ` Richard Stallman 0 siblings, 2 replies; 42+ messages in thread From: Tom @ 2010-09-10 20:29 UTC (permalink / raw) To: emacs-devel David House <dmhouse <at> gmail.com> writes: > > On 10 September 2010 16:43, Richard Stallman <rms <at> gnu.org> wrote: > > Could someone please explain what a "structural regular expression" > > means? The message that started the thread did not say. > > It is not a property of the regexps themselves, but pertains to > functions that use regexps: namely that they only apply to a subset of > your buffer. For example, you might do a query-replace-regexp on only > the comments of a C file, or an isearch-regexp on only the strings. So > note that the subsets they apply to are non-contiguous in general. > It is the property of the regexps, because the main point of the feature is there are enhanced regexps which are aware of the syntax of the buffer contents, so you can select comments, strings, scopes, etc. Examples for the mentioned blog post: V/pattern select all matches V|pattern select all lines with match V{scope select all matching scopes Vatype select all objects (inclusive) Vttype select all objects (exclusive) Y/pattern select everything but matches Y|pattern select all lines without match Y{scope select everything but scope Yatype select everything but objects (inclusive) Yttype select everything but objects (exclusive) And you can perform further selections after the first selection recursively, so you can select comments in scopes, etc. The document that inspired the above feature of the E editor: "The current UNIX® text processing tools are weakened by the built-in concept of a line. There is a simple notation that can describe the `shape' of files when the typical array-of-lines picture is inadequate. That notation is regular expressions. Using regular expressions to describe the structure in addition to the contents of files has interesting applications, and yields elegant methods for dealing with some problems the current tools handle clumsily. When operations using these expressions are composed, the result is reminiscent of shell pipelines." http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf > It has been proposed to support this by generalizing the concept of > the region to actually be a list of (contiguous) regions. Another idea > further up was to use special text properties. I wonder if there is a simpler solution. For example, during the selection process a separate buffer could display interactively the current selection made by the user and this buffer could be set up with text properties and such, so that it is known where the individual ranges start and end. After the user done his work in this temporary buffer the resulting ranges could be copied back to the appropriate sections of the original buffer thereby committing the changes. This way nothing has to be changed in Emacs core. ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: Structural regular expressions 2010-09-10 20:29 ` Tom @ 2010-09-10 23:50 ` Drew Adams 2010-09-11 2:23 ` Miles Bader 2010-09-11 15:49 ` Richard Stallman 1 sibling, 1 reply; 42+ messages in thread From: Drew Adams @ 2010-09-10 23:50 UTC (permalink / raw) To: 'Tom', emacs-devel > the main point of the feature is there are enhanced regexps > which are aware of the syntax of the buffer contents, so you > can select comments, strings, scopes, etc. > > Examples for the mentioned blog post: > > V/pattern select all matches > V|pattern select all lines with match > V{scope select all matching scopes > Vatype select all objects (inclusive) > Vttype select all objects (exclusive) > Y/pattern select everything but matches > Y|pattern select all lines without match > Y{scope select everything but scope > Yatype select everything but objects (inclusive) > Yttype select everything but objects (exclusive) > > And you can perform further selections after the first selection > recursively, so you can select comments in scopes, etc. > > The document that inspired the above feature of the E editor: > > "The current UNIXR text processing tools are weakened by the > built-in concept of a line. There is a simple notation that can > describe the `shape' of files when the typical array-of-lines > picture is inadequate. That notation is regular > expressions. Using regular expressions to describe the structure > in addition to the contents of files has interesting > applications, and yields elegant methods for dealing with some > problems the current tools handle clumsily. When operations using > these expressions are composed, the result is reminiscent of > shell pipelines." > > http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf Some more from the paper you cite: "In these programs, regular expressions are being used to do more than just select the input, the way they are used in all the traditional UNIX tools. Instead, the expressions are doing a simple parsing (or at least a breaking into lexical tokens) of the input. Such expressions are called structural regular expressions or just structural expressions." And: [these programs] "benefit from an additional regular expression to define the structure of [their] input." That's the real point, I believe: the paper touts the use of regexps to divide text into chunks that match - chunks that are not necessarily lines, in order to then act on those chunks in some way. This is just what Icicles search does. You can provide an initial regexp that parses the buffer to define a set of search contexts. The regexp .* just parses it into all of its lines. Regexp \([^\f]*[\f]\|[^\f]+$\) parses it into pages; \(.+\n\)+ into paragraphs; [A-Z][^.?!]+[.?!] into sentences; and so on. You can provide such a regexp interactively, or define different commands that encapsulate different context-defining regexps (e.g. search-lines (occur/grep), search-paragraphs, search-sentences). In general, a regexp used this way does not necessarily _partition_ the buffer - there can be areas (gaps) that do not match at all. Hence the mention by others of possibly non-contiguous areas ("regions" or multi-part region). The regexp `(concat comint-prompt-regexp "\S-.*")' selects comint prompt lines, for instance; and using an imenu generic regexp selects just function etc. definitions for the current mode (just their first lines, typically). But while a regexp is one handy way to parse a buffer, there is no reason to limit the idea to using a regexp. In spite of the fancy name "structural regexp", _any_ way of dividing the buffer into a set of areas of interest can be useful in the same way (e.g. as search contexts). The real argument is that lines are not the only way to go - grep/occur is not the only search tool (which is not really news). And it is misleading to say that regexps "describe the `shape' of files when the typical array of lines picture is inadequate." It is not about some file "shape" or an inherent "structure" of the file content (e.g. code structure). It is about being able to shape the parts of interest as you want and not always be limited to lines as parts. Use any regexp or any other pattern or algorithm to define the _parts you want_ (e.g as search contexts). _You_ define the shapes of interest. Can you use regexps to mimic/follow the "shape" of code? Sure. But you can also use them to shape text (including code parts) in other ways. Generalize the shaping by regexps, and generalize the tools of shaping beyond just regexps. And there is not even any need to limit this to areas of a buffer or file. What this is really about (IMO) is these features: 1. Some way to come up with a set of strings as defined by pairs of buffer positions. The strings need not be associated with buffer positions, but that is the typical case discussed. 2. Some way to filter those strings as a set. 3. Some way to act on the (filtered) strings, individually and perhaps also as a set. Search is one such action. For the "structural regexp" fan, #1 is a regexp. But a regexp is only one tool you might use to parse a buffer into such a set. For Icicles, #1 is often a regexp, but it need not be. Font-lock provides another #1. Font-lock typically uses an ordered combination of regexps, but in the general case it allows any parsing functions. There are any number of other #1's that could prove interesting. A sophisticated parser can be just as useful for #1 as is a simple regexp. As another #1, Icicles can treat bookmarked regions as a search set. (This assumes an ability, as in Bookmark+, to bookmark regions: 2 positions, not 1.) IOW, the strings ("regions") to be searched need not even be in the same buffer or file. A tags file could be used similarly, to "parse" a set of source files into strings that represent function etc. definitions. All that's needed is some way to define a set of strings and their locations. For #2, Icicles lets you type an input pattern that filters the set dynamically (incrementally). Pattern matching here can use regexps, fuzzy matching, whatever. You can "pipe-filter": progressively apply multiple patterns to narrow the set. And you can complement the set of matches (complement the current set wrt the previous filtering). For #3, search has been mentioned as an obvious action for individual matches. Likewise search-and-replace. (Those are what Icicles search provides by default.) But in general any action might be applicable. A final comment. There is nothing earth-shaking about using a regexp in this way, to define a set of strings/areas to act on. It hardly merits special trumpeting. And in spite of the usefulness of not being _limited_ to a hard-coded parsing into lines, it is also true that (partly because much in the way of programming does involve lines) acting on the lines of a file or buffer or command-line input stream or error log _is_ often useful. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-10 23:50 ` Drew Adams @ 2010-09-11 2:23 ` Miles Bader 2010-09-11 7:44 ` Tom ` (2 more replies) 0 siblings, 3 replies; 42+ messages in thread From: Miles Bader @ 2010-09-11 2:23 UTC (permalink / raw) To: emacs-devel "Drew Adams" <drew.adams@Oracle.Com> writes: > That's the real point, I believe: the paper touts the use of regexps > to divide text into chunks that match - chunks that are not > necessarily lines, in order to then act on those chunks in some way. Not a good base, I think -- regexps are not really powerful enough to do the job well. -Miles -- Happiness, n. An agreeable sensation arising from contemplating the misery of another. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-11 2:23 ` Miles Bader @ 2010-09-11 7:44 ` Tom 2010-09-11 7:58 ` Wojciech Meyer 2010-09-11 15:04 ` Drew Adams 2 siblings, 0 replies; 42+ messages in thread From: Tom @ 2010-09-11 7:44 UTC (permalink / raw) To: emacs-devel Miles Bader <miles <at> gnu.org> writes: > > "Drew Adams" <drew.adams <at> Oracle.Com> writes: > > That's the real point, I believe: the paper touts the use of regexps > > to divide text into chunks that match - chunks that are not > > necessarily lines, in order to then act on those chunks in some way. > > Not a good base, I think -- regexps are not really powerful enough to do > the job well. > Well, it doesn't have to be implemented with regexps, but the concept itself seems useful that you can address syntactical blocks intelligently in the buffer. So it's a selection mechanism which can utilize both standard, line oriented regexps and syntax-aware, multi line oriented patterns in a recursive fashion to efficiently select parts of the buffer to perform an operation on. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-11 2:23 ` Miles Bader 2010-09-11 7:44 ` Tom @ 2010-09-11 7:58 ` Wojciech Meyer 2010-09-11 8:33 ` tomas 2010-09-11 15:04 ` Drew Adams 2 siblings, 1 reply; 42+ messages in thread From: Wojciech Meyer @ 2010-09-11 7:58 UTC (permalink / raw) To: Miles Bader; +Cc: emacs-devel Miles Bader <miles@gnu.org> writes: > "Drew Adams" <drew.adams@Oracle.Com> writes: >> That's the real point, I believe: the paper touts the use of regexps >> to divide text into chunks that match - chunks that are not >> necessarily lines, in order to then act on those chunks in some way. > > Not a good base, I think -- regexps are not really powerful enough to do > the job well. Yes regexp are quite limited. Maybe a simple PEG parser based on packrat, with a syntax sugar for defining one line set of rules? > > -Miles Wojciech ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-11 7:58 ` Wojciech Meyer @ 2010-09-11 8:33 ` tomas 0 siblings, 0 replies; 42+ messages in thread From: tomas @ 2010-09-11 8:33 UTC (permalink / raw) To: Wojciech Meyer; +Cc: emacs-devel, Miles Bader -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, Sep 11, 2010 at 08:58:04AM +0100, Wojciech Meyer wrote: > Miles Bader <miles@gnu.org> writes: > > > "Drew Adams" <drew.adams@Oracle.Com> writes: > >> That's the real point, I believe: the paper touts the use of regexps > >> to divide text into chunks that match - chunks that are not > >> necessarily lines, in order to then act on those chunks in some way. > > > > Not a good base, I think -- regexps are not really powerful enough to do > > the job well. > > Yes regexp are quite limited. > Maybe a simple PEG parser based on packrat, with a syntax sugar for > defining one line set of rules? While PEG is interesting in itself (and I think Emacs should have something like that, just to test its strengths/weaknesses wrt regex), I think Drew is right: A way, *any* way to define a "buffer subset", maybe partitioned into "chunks" is useful here. So at this level, I'd think concentrating on interface design (user & programmer) makes most sense, abstracting from possible implementations (regex, peg, font-lock, hand-built parser). The (possible) implementations should (I think) just guide the design of the interfaces (as examples). In the ideal case, it should be possible to use whatever implementation is most helpful (or combine them: union, intersection, symmetric difference). Just dreaming? Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFMiz7OBcgs9XrR2kYRAjZvAJ9Hzc4Dk2Z4t3wohMQJX/8544MvIQCffrxr WKNM0E3e/fJ3UF61J4Ez7c4= =tDCG -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: Structural regular expressions 2010-09-11 2:23 ` Miles Bader 2010-09-11 7:44 ` Tom 2010-09-11 7:58 ` Wojciech Meyer @ 2010-09-11 15:04 ` Drew Adams 2 siblings, 0 replies; 42+ messages in thread From: Drew Adams @ 2010-09-11 15:04 UTC (permalink / raw) To: 'Miles Bader', emacs-devel > > That's the real point, I believe: the paper touts the use of regexps > > to divide text into chunks that match - chunks that are not > > necessarily lines, in order to then act on those chunks in some way. > > Not a good base, I think -- regexps are not really powerful > enough to do the job well. That's too vague. Good base (= ?) for what? Do what job? How well is well? We use regexps to select chunks of text all the time in Emacs. Regexps are not sufficiently powerful to select an _arbitrary_ chunk, but so what? You can use them to select lots of kinds of chunks (reg langs + Emacs "regexp" extensions) - certainly more than just lines. Nothing limits us to regexps (= one of my points), but regexps can be useful in selecting chunks of text. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-10 20:29 ` Tom 2010-09-10 23:50 ` Drew Adams @ 2010-09-11 15:49 ` Richard Stallman 2010-09-12 13:48 ` Stefan Monnier 1 sibling, 1 reply; 42+ messages in thread From: Richard Stallman @ 2010-09-11 15:49 UTC (permalink / raw) To: Tom; +Cc: emacs-devel Thanks for the explanation. I think the term "structural regular expressions" is misleading because its grammatical construction implies a different kind of regexp, rather than a different way of applying them. V/pattern select all matches V|pattern select all lines with match V{scope select all matching scopes Vatype select all objects (inclusive) Vttype select all objects (exclusive) Y/pattern select everything but matches Y|pattern select all lines without match Y{scope select everything but scope Yatype select everything but objects (inclusive) Yttype select everything but objects (exclusive) Are `V/' etc. literal, or do they stand for some other text? Where would this syntax be used? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-11 15:49 ` Richard Stallman @ 2010-09-12 13:48 ` Stefan Monnier 2010-09-12 14:09 ` Lennart Borgman 0 siblings, 1 reply; 42+ messages in thread From: Stefan Monnier @ 2010-09-12 13:48 UTC (permalink / raw) To: rms; +Cc: Tom, emacs-devel > Thanks for the explanation. I think the term "structural regular > expressions" is misleading because its grammatical construction > implies a different kind of regexp, rather than a different way of > applying them. > V/pattern select all matches > V|pattern select all lines with match > V{scope select all matching scopes > Vatype select all objects (inclusive) > Vttype select all objects (exclusive) > Y/pattern select everything but matches > Y|pattern select all lines without match > Y{scope select everything but scope > Yatype select everything but objects (inclusive) > Yttype select everything but objects (exclusive) > Are `V/' etc. literal, or do they stand for some other text? > Where would this syntax be used? The term "structural regular expression" is indeed misleading, I think. They use it to refer to the combination of 2 things: 1- the ability to select particular kinds of elements in the text (which we could do in Emacs with non-contiguous regions). The main example being commands that select "all the strings" or "all the comments" or that inverts the selection (select everything that wasn't selected before). 2- the ability to apply regexp-operations to only those selected parts of the text (to the extent that we already have commands that apply only to the active region, we already have that, although it would probably require several tweaks to make it work right in the face of non-contiguous regions). Together this allows you to do things like apply query-replace to all non-string non-comment parts of the buffer, which is why they call it "structural" regexps. Stefan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-12 13:48 ` Stefan Monnier @ 2010-09-12 14:09 ` Lennart Borgman 2010-09-12 16:43 ` Drew Adams 0 siblings, 1 reply; 42+ messages in thread From: Lennart Borgman @ 2010-09-12 14:09 UTC (permalink / raw) To: Stefan Monnier; +Cc: Tom, rms, emacs-devel On Sun, Sep 12, 2010 at 3:48 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > > The term "structural regular expression" is indeed misleading, I think. > They use it to refer to the combination of 2 things: > 1- the ability to select particular kinds of elements in the text > (which we could do in Emacs with non-contiguous regions). The main > example being commands that select "all the strings" or "all the > comments" or that inverts the selection (select everything that > wasn't selected before). > 2- the ability to apply regexp-operations to only those selected parts > of the text (to the extent that we already have commands that apply > only to the active region, we already have that, although it would > probably require several tweaks to make it work right in the face of > non-contiguous regions). > > Together this allows you to do things like apply query-replace to all > non-string non-comment parts of the buffer, which is why they call it > "structural" regexps. There is a related need for searching that could be built on such capability: AND. Quite often I find myself searching for a node/a tree in a big .org file containing both word a and word b. ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: Structural regular expressions 2010-09-12 14:09 ` Lennart Borgman @ 2010-09-12 16:43 ` Drew Adams 2010-09-12 17:03 ` Lennart Borgman 0 siblings, 1 reply; 42+ messages in thread From: Drew Adams @ 2010-09-12 16:43 UTC (permalink / raw) To: 'Lennart Borgman', 'Stefan Monnier' Cc: 'Tom', rms, emacs-devel > There is a related need for searching that could be built on such > capability: AND. > > Quite often I find myself searching for a node/a tree in a big .org > file containing both word a and word b. And in an unspecified order, no doubt. In vanilla Emacs the closest we have for this is `apropos' with keyword input. The behavior is special-built for this particular command; it is not a general feature. In Icicles you can hit `S-SPC' to get such `AND' filtering during completion (of any input). You can add patterns on the fly, preceding each by `S-SPC', thus narrowing down the choices progressively, as you see fit. I call this "progressive" completion. http://www.emacswiki.org/emacs/Icicles_-_Nutshell_View#toc10 And you can hit `C-~' to get the complement (`AND NOT') after seeing what `S-SPC' (`AND') shows. Example: `M-x for' shows command names that have substring `for' (similarly `C-h f for', `C-h v for',...). `M-x for S-SPC ediff' shows the subset of those `for' commands that also have substring `ediff'. `M-x for S-SPC ediff C-~' shows the names that have substring `for' but do not have substring `ediff'. http://www.emacswiki.org/emacs/Icicles_-_Nutshell_View#toc11 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-12 16:43 ` Drew Adams @ 2010-09-12 17:03 ` Lennart Borgman 2010-09-12 21:31 ` Drew Adams 0 siblings, 1 reply; 42+ messages in thread From: Lennart Borgman @ 2010-09-12 17:03 UTC (permalink / raw) To: Drew Adams; +Cc: Tom, emacs-devel, Stefan Monnier, rms On Sun, Sep 12, 2010 at 6:43 PM, Drew Adams <drew.adams@oracle.com> wrote: >> There is a related need for searching that could be built on such >> capability: AND. >> >> Quite often I find myself searching for a node/a tree in a big .org >> file containing both word a and word b. > > And in an unspecified order, no doubt. In vanilla Emacs the closest we have for > this is `apropos' with keyword input. The behavior is special-built for this > particular command; it is not a general feature. > > In Icicles you can hit `S-SPC' to get such `AND' filtering during completion (of > any input). You can add patterns on the fly, preceding each by `S-SPC', thus > narrowing down the choices progressively, as you see fit. I call this > "progressive" completion. > http://www.emacswiki.org/emacs/Icicles_-_Nutshell_View#toc10 It does not look to me like it I can search for org nodes containing both word a and word b with the progressive completion in Icicles. Can I? ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: Structural regular expressions 2010-09-12 17:03 ` Lennart Borgman @ 2010-09-12 21:31 ` Drew Adams 0 siblings, 0 replies; 42+ messages in thread From: Drew Adams @ 2010-09-12 21:31 UTC (permalink / raw) To: 'Lennart Borgman' Cc: 'Tom', emacs-devel, 'Stefan Monnier', rms > It does not look to me like it I can search for org nodes containing > both word a and word b with the progressive completion in Icicles. Can I? I have no idea whether you can. If you think something does not work as documented, you can file an Icicles bug using `M-x icicle-send-bug-report', being sure to provide a concrete recipe, preferably starting from `emacs -Q'. But whenever input completion is available in Org mode you should be able to use Icicles progressive completion. (I do not use Org mode, myself. I am convinced that it is a Very Good Thing (TM), but I do not have any particular use for it.) And as for search, yes, you can use progressive completion during Icicles search. You could, for example, (1) use a context regexp that defines the search space to be the Org nodes (strings), and then (2) type your first word `a', `S-SPC', and your second word `b', to narrow the search space to those Org nodes that contain both words `a' and `b' (in either order). It is up to you to come up with the regexp needed to do #1. If for some reason Org nodes cannot be selected using just a regexp (dunno), then you will need to define a function that parses the buffer(s) and creates an alist of Org nodes (buffer substrings and their positions). How so? You can use function `icicle-search' to define your own search command. If you need something other than a regexp to parse your text into the set of search contexts (e.g. Org nodes), then pass a parsing function as the second arg to `icicle-search', `SCAN-FN-OR-REGEXP'. The function needs to fill variable `icicle-candidates-alist': Each alist entry has a search-context string as car and the string end's buffer position as cdr. For an example of a function that serves as arg `SCAN-FN-OR-REGEXP' see `icicle-search-char-property-scan'. It parses a buffer into the strings that are determined by a text or overlay property (e.g. `face') with a given value (e.g. `font-lock-string-face'). These are the `icicle-search' args (from the doc string): --- (icicle-search BEG END SCAN-FN-OR-REGEXP REQUIRE-MATCH &optional WHERE &rest ARGS) BEG is the beginning of the region to search; END is the end. SCAN-FN-OR-REGEXP: Regexp or function that determines the set of initial candidates (match zones). If a function, it is passed, as arguments, the buffer to search, the beginning and end of the search region in that buffer, and ARGS. REQUIRE-MATCH is passed to `completing-read'. Optional arg WHERE is a list of bookmarks, buffers, or files to be searched. If nil, then search only the current buffer or region. (To search bookmarks you must also use library `bookmark+.el'). ARGS are arguments that are passed to function SCAN-FN-OR-REGEXP. Note that if SCAN-FN-OR-REGEXP is a regexp string, then function `icicle-search-regexp-scan' is used to determine the set of match zones. You can limit hits to regexp matches that also satisfy a predicate, by using `(PREDICATE)' as ARGS: PREDICATE is then passed to `icicle-search-regexp-scan' as its PREDICATE argument. --- So if you have a simple regexp that selects the Org nodes, then just use command `icicle-search' interactively (`C-c `'): type that regexp followed by `RET', then `a S-SPC b'. If the regexp is complex and you don't want to type it interactively, then define a search command `foo' like this: (defun foo () (interactive) (icicle-search nil nil org-regexp t)) (defconst org-regexp "HAIRY-ORG-NODE-IDENTIFYING-REGEXP") If you do not have such a regexp - e.g. Org-node parsing is too complex for a regexp, then define a search command `foo' like this: (defun foo () (interactive) (icicle-search nil nil 'org-parser t)) (defun org-parser (buffer beg end) "Fill `icicle-candidates-alist' with Org nodes and their positions." ... ; Parsing magic (setq icicle-candidates-alist ...)) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-08 1:13 ` Eric Schulte 2010-09-08 8:46 ` Stefan Monnier @ 2010-09-09 15:51 ` Tom 2010-09-09 16:01 ` Lennart Borgman 1 sibling, 1 reply; 42+ messages in thread From: Tom @ 2010-09-09 15:51 UTC (permalink / raw) To: emacs-devel Eric Schulte <schulte.eric <at> gmail.com> writes: > > > > It can be a good start, but the feature in the E editor is more general > > than search and replace. You can perform any operation on the selected > > text. It's sort of like working on the narrowed part of a buffer, only > > the narrowed part in this case consists of several separate ranges of > > the same buffer (like all comments, etc.). > > Would generalizing the narrowing behavior to arbitrarily many ranges in > a buffer instead of a single range have extensive ramifications? The mentioned E feature is sort of like narrowing, but is not the same if I understand it correctly. For example, if I want to replace the word "formatted" to "structured" in all comments then considering the following case (<> indicates comment range boundaries): <.... is the format> <tedious work is done here> the word "format" at the end of the first range and the word "tedious" at the beginning of the next should not be handled as a contiguous text, because in that case the text "formattedious" would match the word to be replaced ("formatted") and it's clearly not correct behavior. So if such "multiple narrowing" is implemented it must maintain the boundaries between the different ranges and shouldn't simply handle it as contiguous text. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 15:51 ` Tom @ 2010-09-09 16:01 ` Lennart Borgman 2010-09-09 16:23 ` Tom 0 siblings, 1 reply; 42+ messages in thread From: Lennart Borgman @ 2010-09-09 16:01 UTC (permalink / raw) To: Tom; +Cc: emacs-devel On Thu, Sep 9, 2010 at 5:51 PM, Tom <levelhalom@gmail.com> wrote: > > So if such "multiple narrowing" is implemented it must maintain the boundaries > between the different ranges and shouldn't simply handle it as contiguous text. Or handle the text outside the multiple narrowing as whitespace. I think that maybe would make it easier to implement. Then it can be implemented in the low level routines that access the buffer contents. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 16:01 ` Lennart Borgman @ 2010-09-09 16:23 ` Tom 2010-09-09 16:44 ` Lennart Borgman 2010-09-09 19:27 ` Daniel Colascione 0 siblings, 2 replies; 42+ messages in thread From: Tom @ 2010-09-09 16:23 UTC (permalink / raw) To: emacs-devel Lennart Borgman <lennart.borgman <at> gmail.com> writes: > > On Thu, Sep 9, 2010 at 5:51 PM, Tom <levelhalom <at> gmail.com> wrote: > > > > So if such "multiple narrowing" is implemented it must maintain the boundaries > > between the different ranges and shouldn't simply handle it as contiguous text. > > Or handle the text outside the multiple narrowing as whitespace. > And what happens then if I want to regexp replace "foo\s-*bar"? It would still be susceptible to the above mentioned boundary problem, so it's not a robust workaround. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 16:23 ` Tom @ 2010-09-09 16:44 ` Lennart Borgman 2010-09-09 16:53 ` Tom 2010-09-09 19:27 ` Daniel Colascione 1 sibling, 1 reply; 42+ messages in thread From: Lennart Borgman @ 2010-09-09 16:44 UTC (permalink / raw) To: Tom; +Cc: emacs-devel On Thu, Sep 9, 2010 at 6:23 PM, Tom <levelhalom@gmail.com> wrote: > Lennart Borgman <lennart.borgman <at> gmail.com> writes: > >> >> On Thu, Sep 9, 2010 at 5:51 PM, Tom <levelhalom <at> gmail.com> wrote: >> > >> > So if such "multiple narrowing" is implemented it must maintain the > boundaries >> > between the different ranges and shouldn't simply handle it as > contiguous text. >> >> Or handle the text outside the multiple narrowing as whitespace. >> > > And what happens then if I want to regexp replace "foo\s-*bar"? It would > still be susceptible to the above mentioned boundary problem, so it's > not a robust workaround. It does not look to me like it would be susceptible to that problem. Maybe I am misunderstanding you. Can you explain more in detail why you think it would be a problem with the solution I suggested? (Please note that I said the parts outside of the multiple narrowing should be treated as "whitespace", not "invisible" or "non-existent".) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 16:44 ` Lennart Borgman @ 2010-09-09 16:53 ` Tom 2010-09-09 17:02 ` Lennart Borgman 0 siblings, 1 reply; 42+ messages in thread From: Tom @ 2010-09-09 16:53 UTC (permalink / raw) To: emacs-devel Lennart Borgman <lennart.borgman <at> gmail.com> writes: > > And what happens then if I want to regexp replace "foo\s-*bar"? It would > > still be susceptible to the above mentioned boundary problem, so it's > > not a robust workaround. > > It does not look to me like it would be susceptible to that problem. > Maybe I am misunderstanding you. Can you explain more in detail why > you think it would be a problem with the solution I suggested? (Please > note that I said the parts outside of the multiple narrowing should be > treated as "whitespace", not "invisible" or "non-existent".) Maybe I am misunderstanding you. As I understood your suggestion: <.....foo> ... whitespace ... <bar ... > Since \s- as a regexp matches whitespace the regexp "foo\s-*bar" would match the end of the first range and the beginning of the second range separated by whitespace. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 16:53 ` Tom @ 2010-09-09 17:02 ` Lennart Borgman 0 siblings, 0 replies; 42+ messages in thread From: Lennart Borgman @ 2010-09-09 17:02 UTC (permalink / raw) To: Tom; +Cc: emacs-devel On Thu, Sep 9, 2010 at 6:53 PM, Tom <levelhalom@gmail.com> wrote: > Lennart Borgman <lennart.borgman <at> gmail.com> writes: > >> > And what happens then if I want to regexp replace "foo\s-*bar"? It would >> > still be susceptible to the above mentioned boundary problem, so it's >> > not a robust workaround. >> >> It does not look to me like it would be susceptible to that problem. >> Maybe I am misunderstanding you. Can you explain more in detail why >> you think it would be a problem with the solution I suggested? (Please >> note that I said the parts outside of the multiple narrowing should be >> treated as "whitespace", not "invisible" or "non-existent".) > > Maybe I am misunderstanding you. > > As I understood your suggestion: > > <.....foo> ... whitespace ... <bar ... > > > Since \s- as a regexp matches whitespace the regexp "foo\s-*bar" would match > the end of the first range and the beginning of the second range separated > by whitespace. Ah, I see. Yes, it could be a problem in an example like that. So if something like my suggestion was implemented then perhaps we have to distinguish this "whitespace" from other whitespace. However I think that it would still be useful to let it behave as whitespace in many situations. I am thinking about parsers in multi major mode buffers for example. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Structural regular expressions 2010-09-09 16:23 ` Tom 2010-09-09 16:44 ` Lennart Borgman @ 2010-09-09 19:27 ` Daniel Colascione 1 sibling, 0 replies; 42+ messages in thread From: Daniel Colascione @ 2010-09-09 19:27 UTC (permalink / raw) To: Tom; +Cc: emacs-devel On Thu, Sep 9, 2010 at 9:23 AM, Tom <levelhalom@gmail.com> wrote: > And what happens then if I want to regexp replace "foo\s-*bar"? It would > still be susceptible to the above mentioned boundary problem, so it's > not a robust workaround. What about a brand new syntax class? ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: Structural regular expressions 2010-09-07 19:25 Structural regular expressions Tom 2010-09-07 20:08 ` Lennart Borgman @ 2010-09-08 0:00 ` Drew Adams 1 sibling, 0 replies; 42+ messages in thread From: Drew Adams @ 2010-09-08 0:00 UTC (permalink / raw) To: 'Tom', emacs-devel > This structural regex thing is interesting. You can perform operations > (e.g. replace text) on all strings in the file, or everywhere except > in strings and comments, etc. Here's the description of the feature > on the E editor blog if someone wants to implement something like > this for emacs: http://e-texteditor.com/blog/2010/beyond-vi FWIW - Not to pretend that this is exactly the same thing, but you can use Icicles to do that. A similar approach could be adopted by vanilla Emacs. Icicles can use a text property to identify the parts of the buffer to search. Those parts then act as completion candidates that you can match using a regexp or other pattern (which you can change on the fly, to dynamically filter the search hits). Font-locking already provides such labeling-using-a-property, for free. It was designed with another purpose in mind, of course, so the buffer parts identified by font-lock might not always be those most pertinent for the job at hand. Depends on just what "structures" you need - those provided by font-lock are pretty basic. Anyway, as an example, using the identification provided by font-lock, you can use `C-c "' (`icicle-search-text-property') to search (e.g. using regexps) among only the strings or only the comments, etc. of a buffer (or of multiple buffers or files) - based on their different font-lock faces. (You cannot, however, search among the complementary parts - e.g. the non-comments, without defining a new Icicles search command.) Font-lock faces can be used this way to do what you describe, provided the "structural" parts of the buffer you are interested in are font-locked using different faces. This feature does not depend on font-lock, however. The text property that is used to divide the buffer into searchable parts need not be `face' - any property will do. So if you have a function that parses buffer parts (code structures) in a more meaningful way (in some sense) than font-locking does, it can add a text property with different values to identify the parts, and Icicles search can exploit that labeling immediately. And the property could be an overlay property instead of a text property. And you can replace matches while you search, on-demand. And you could easily define a specialized search command that allows other actions besides replacement (e.g. a popup menu of alternative actions). http://www.emacswiki.org/emacs/Icicles_-_Other_Search_Commands#toc2 In addition, it looks like the "structure" described in the blog post you cited is in fact defined just by a set of regexp matches (but I'm no expert on reading vi-ese): Y/^\n/ V%A.*Pike<enter> \ V|^%T It looks as though a few simple patterns do the trick to select the target lines, for the example given. If true, then for that simple kind of structure definition you can just use ordinary Icicles search - no need for any fancy (non-regexp) parsing or the application of a text property. Ordinary Icicles search (like the text-property search) lets you combine the filtering of any number of input patterns (e.g. regexps). And if you have a hairy pattern or set of patterns that you want to reuse, instead of typing it interactively each time (as would seem to be the case for the bibtex/refer references, though the blob touts the "effortlessness" of typing such incantations), then you can define a command that incorporates that info for the initial Icicle-search parse. `C-c =' (`icicle-imenu') does that, for instance: it just passes the hairy imenu regexp to `icicle-search'. Any additional, dynamic pattern you then type just filters the imenu candidates (e.g. function definitions). ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2010-09-12 21:31 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-09-07 19:25 Structural regular expressions Tom 2010-09-07 20:08 ` Lennart Borgman 2010-09-07 20:27 ` Tom 2010-09-07 21:08 ` Lennart Borgman 2010-09-08 1:13 ` Eric Schulte 2010-09-08 8:46 ` Stefan Monnier 2010-09-08 9:20 ` Lawrence Mitchell 2010-09-08 10:30 ` Kan-Ru Chen 2010-09-09 6:34 ` Harald Hanche-Olsen 2010-09-08 14:29 ` Stefan Monnier 2010-09-08 15:52 ` Lawrence Mitchell 2010-09-08 22:46 ` Stefan Monnier 2010-09-09 7:07 ` David Kastrup 2010-09-09 17:03 ` Stefan Monnier 2010-09-10 12:23 ` David Kastrup 2010-09-10 13:31 ` Stefan Monnier 2010-09-09 20:47 ` Davis Herring 2010-09-09 22:52 ` Lennart Borgman 2010-09-10 10:48 ` Stefan Monnier 2010-09-10 15:43 ` Richard Stallman 2010-09-10 17:03 ` David House [not found] ` <AANLkTi=dv8n40x-rTtz@mail.gmail.com> 2010-09-10 20:29 ` Tom 2010-09-10 23:50 ` Drew Adams 2010-09-11 2:23 ` Miles Bader 2010-09-11 7:44 ` Tom 2010-09-11 7:58 ` Wojciech Meyer 2010-09-11 8:33 ` tomas 2010-09-11 15:04 ` Drew Adams 2010-09-11 15:49 ` Richard Stallman 2010-09-12 13:48 ` Stefan Monnier 2010-09-12 14:09 ` Lennart Borgman 2010-09-12 16:43 ` Drew Adams 2010-09-12 17:03 ` Lennart Borgman 2010-09-12 21:31 ` Drew Adams 2010-09-09 15:51 ` Tom 2010-09-09 16:01 ` Lennart Borgman 2010-09-09 16:23 ` Tom 2010-09-09 16:44 ` Lennart Borgman 2010-09-09 16:53 ` Tom 2010-09-09 17:02 ` Lennart Borgman 2010-09-09 19:27 ` Daniel Colascione 2010-09-08 0:00 ` Drew Adams
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.