unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Newbie regexp question
@ 2002-10-30 15:07 Paul Cohen
  2002-10-30 15:33 ` Friedrich Dominicus
  2002-10-31 14:45 ` kgold
  0 siblings, 2 replies; 17+ messages in thread
From: Paul Cohen @ 2002-10-30 15:07 UTC (permalink / raw)


Hi

I want to do a Emacs regexp search and replace on a HTML file containing
patterns like this:

<!--Test-->
...
<!--End of Test-->

Where "..." denotes a variable number of lines of HTML text.

I want to search for all occurrences of the above pattern and then
remove them from the HTML file!

I've tried a number of variants without any success. For example the
following regexp doesn't work:

<!--Test-->\(.*\n\)*<!--End of Test-->

/Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 15:07 Newbie regexp question Paul Cohen
@ 2002-10-30 15:33 ` Friedrich Dominicus
  2002-10-30 16:46   ` Paul Cohen
                     ` (2 more replies)
  2002-10-31 14:45 ` kgold
  1 sibling, 3 replies; 17+ messages in thread
From: Friedrich Dominicus @ 2002-10-30 15:33 UTC (permalink / raw)


Paul Cohen <paco@enea.se> writes:

> Hi
> 
> I want to do a Emacs regexp search and replace on a HTML file containing
> patterns like this:
> 
> <!--Test-->
> ...
> <!--End of Test-->
> 
> Where "..." denotes a variable number of lines of HTML text.
> 
> I want to search for all occurrences of the above pattern and then
> remove them from the HTML file!
> 
> I've tried a number of variants without any success. For example the
> following regexp doesn't work:
> 
> <!--Test-->\(.*\n\)*<!--End of Test-->
I would restate the problem. It does not make much sense to me to
match over a bunch of lines you do not want to handle. 

So how about
M-C-% ^[ \t]*<!--.*Test.*--> with: RET

Or even better if you kow exactly what you are looking for
using replace-string?

Regards
Friedrich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 15:33 ` Friedrich Dominicus
@ 2002-10-30 16:46   ` Paul Cohen
  2002-10-30 17:19     ` Friedrich Dominicus
  2002-10-30 17:24     ` Michael Slass
  2002-10-30 16:49   ` Barry Margolin
  2002-10-30 18:48   ` Stefan Monnier <foo@acm.com>
  2 siblings, 2 replies; 17+ messages in thread
From: Paul Cohen @ 2002-10-30 16:46 UTC (permalink / raw)


Hi Fridrich,

Friedrich Dominicus wrote:

> Paul Cohen <paco@enea.se> writes:
> > I want to do a Emacs regexp search and replace on a HTML file containing
> > patterns like this:
> >
> > <!--Test-->
> > ...
> > <!--End of Test-->
> >
> > Where "..." denotes a variable number of lines of HTML text.
> >
> > I want to search for all occurrences of the above pattern and then
> > remove them from the HTML file!
> >
> > I've tried a number of variants without any success. For example the
> > following regexp doesn't work:
> >
> > <!--Test-->\(.*\n\)*<!--End of Test-->
> I would restate the problem. It does not make much sense to me to
> match over a bunch of lines you do not want to handle.
>
> So how about
> M-C-% ^[ \t]*<!--.*Test.*--> with: RET

No that's not what I want. Let me rephrase in more general terms. I want to
remove a number of character sequences for which the following holds:

1) They run over multiple lines.
2) They begin and end with well defined sequences of characters. (In my case
with "<!--Test-->"  and "<!--End of Test-->"). Let's call the delimiting
character sequences for the start and and end token.
3) They may contain any number of unknown (printable) characters between the
start and end token.
4) There may exist multiple instances of these character sequences in the
file.

/Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 15:33 ` Friedrich Dominicus
  2002-10-30 16:46   ` Paul Cohen
@ 2002-10-30 16:49   ` Barry Margolin
  2002-10-30 18:48   ` Stefan Monnier <foo@acm.com>
  2 siblings, 0 replies; 17+ messages in thread
From: Barry Margolin @ 2002-10-30 16:49 UTC (permalink / raw)


In article <8765vkkkko.fsf@fbigm.here>,
Friedrich Dominicus  <frido@q-software-solutions.com> wrote:
>Paul Cohen <paco@enea.se> writes:
>
>> Hi
>> 
>> I want to do a Emacs regexp search and replace on a HTML file containing
>> patterns like this:
>> 
>> <!--Test-->
>> ...
>> <!--End of Test-->
>> 
>> Where "..." denotes a variable number of lines of HTML text.
>> 
>> I want to search for all occurrences of the above pattern and then
>> remove them from the HTML file!
>> 
>> I've tried a number of variants without any success. For example the
>> following regexp doesn't work:
>> 
>> <!--Test-->\(.*\n\)*<!--End of Test-->
>I would restate the problem. It does not make much sense to me to
>match over a bunch of lines you do not want to handle. 
>
>So how about
>M-C-% ^[ \t]*<!--.*Test.*--> with: RET

This removed the <!--Test--> and <!--End of Test--> lines, but it doesn't
remove all the lines in between, which I think is his real goal.

The problem with the OP's attempted solution is that * is greedy.  So it
will match everything from the first <!--Test--> to the last <!--End of
Test-->, including all the non-test stuff in between.

I would do this using a keyboard macro that searches for <!--Test-->, sets
a mark, searches for <!--End of Test-->, and then kills the region.

-- 
Barry Margolin, barmar@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Newbie regexp question
@ 2002-10-30 16:57 Bingham, Jay
  0 siblings, 0 replies; 17+ messages in thread
From: Bingham, Jay @ 2002-10-30 16:57 UTC (permalink / raw)


Fredrich,

It is too bad that you do not understand what Paul wants to do.

Paul,

Here is what I understand you want to do: 
find the next occurrence of <!--Test-->, find the next occurrence <!--End of Test-->, delete everything between the start of <!--Test--> and end of <!--End of Test-->, do this repeatedly in the buffer.

There are some problems with your approach to this.
First, you did not end the pattern <!--Test--> with an end of line (you should probably also end the pattern <!--End of Test--> with an end of line, unless you want a blank line where these were found).
Second, regexp does not recognize "\n" as a new line it just puts an "n" in the pattern.  In order to match new lines you have to put literal new line characters in the pattern.  This is done with the C-q C-j sequence.
Third, even if you make the above corrections and produce the following pattern:
^[ \t]*<!--Test-->
\(.*
\)*[ \t]*<!--End of Test-->

it still will not do what you want it to do.  The reason that it won't is that the asterisk at the end of the sub-pattern \(.*
\)* tells emacs to match as many as possible of the preceding sub-pattern.  So this will match from the start of the first <!--Test--> to the end of the last <!--End of Test-->.  Not exactly what I think you had in mind.

The only way that I know to do what you want to do is to write a function to do it.  This function would prompt for the first pattern, then prompt for the second pattern, then prompt for a replacement string.  It would then search for the first pattern, save the start location of the first pattern, search for the second pattern and replace the range between the start of the first pattern and the end of the second pattern with the replacement string.  (I say replacement string rather than pattern because the \DIGIT meta which is the only pattern meta that is of use in a replacement pattern would not work without doing some special coding in the function to simulate its operation).
I used to have a function that would delete everything between two patterns, but when I left my last employer I failed to capture a copy of it.  I kick my self quite often for not capturing the functions that I developed there.

-_
J_)
C_)ingham
.    HP - NonStop Austin Software & Services - Software Quality Assurance
.    Austin, TX
. Language is the apparel in which your thoughts parade in public.
. Never clothe them in vulgar and shoddy attire.          -Dr. George W. Crane-

 -----Original Message-----
From: 	Friedrich Dominicus [mailto:frido@q-software-solutions.com] 
Sent:	Wednesday, October 30, 2002 9:34 AM
To:	help-gnu-emacs@gnu.org
Subject:	Re: Newbie regexp question

Paul Cohen <paco@enea.se> writes:

> Hi
> 
> I want to do a Emacs regexp search and replace on a HTML file containing
> patterns like this:
> 
> <!--Test-->
> ...
> <!--End of Test-->
> 
> Where "..." denotes a variable number of lines of HTML text.
> 
> I want to search for all occurrences of the above pattern and then
> remove them from the HTML file!
> 
> I've tried a number of variants without any success. For example the
> following regexp doesn't work:
> 
> <!--Test-->\(.*\n\)*<!--End of Test-->
I would restate the problem. It does not make much sense to me to
match over a bunch of lines you do not want to handle. 

So how about
M-C-% ^[ \t]*<!--.*Test.*--> with: RET

Or even better if you kow exactly what you are looking for
using replace-string?

Regards
Friedrich
_______________________________________________
Help-gnu-emacs mailing list
Help-gnu-emacs@gnu.org
http://mail.gnu.org/mailman/listinfo/help-gnu-emacs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 16:46   ` Paul Cohen
@ 2002-10-30 17:19     ` Friedrich Dominicus
  2002-10-30 17:24     ` Michael Slass
  1 sibling, 0 replies; 17+ messages in thread
From: Friedrich Dominicus @ 2002-10-30 17:19 UTC (permalink / raw)


Paul Cohen <paco@enea.se> writes:

> 
> No that's not what I want. Let me rephrase in more general terms. I want to
> remove a number of character sequences for which the following
> holds:
well it was not clear to me that you want to cut the lines between the
tags. So doing you stuff is still not too difficult.

Search for the beginning Tags
save positon
Search for the closing Tag
remove the region between the saved positoin and point

In Emacs Lisp:
(defun remove-text-between (start-tag end-tag)
  (interactive "sStart Tag: \nsEnd Tag: \n")
  (search-forward start-tag)
  (forward-line)
  (let ((start (point)))
    (search-forward end-tag)
    (forward-line -1)
    (delete-region start (point))))

I assume that the Tags are on a line of their own.
<!-- Test --> 

If you are not fully sure about the names, you might try regular
expressions. Under the given circumstances are REGEXP IMHO
overkill. If you know your data you should pull out the information
for you needs.

It's easy to extend. if you want to let it run on the whole buffer,
but I leave this to you. 

Regards
Friedrich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 16:46   ` Paul Cohen
  2002-10-30 17:19     ` Friedrich Dominicus
@ 2002-10-30 17:24     ` Michael Slass
  2002-10-30 17:42       ` Friedrich Dominicus
  1 sibling, 1 reply; 17+ messages in thread
From: Michael Slass @ 2002-10-30 17:24 UTC (permalink / raw)


Paul Cohen <paco@enea.se> writes:

>
>No that's not what I want. Let me rephrase in more general terms. I want to
>remove a number of character sequences for which the following holds:
>
>1) They run over multiple lines.
>2) They begin and end with well defined sequences of characters. (In my case
>with "<!--Test-->"  and "<!--End of Test-->"). Let's call the delimiting
>character sequences for the start and and end token.
>3) They may contain any number of unknown (printable) characters between the
>start and end token.
>4) There may exist multiple instances of these character sequences in the
>file.
>
>/Paul
>


I think a lisp program would do better at this:

VERY LIGHTLY TESTED.  MAKE BACKUPS BEFORE EXPERIMENTING WITH THIS!

(defun paulc-purge-html-test-sections (buffer)
  "Delete all occurances of text between <!--Test--> and <!--End of Test-->, inclusive."
  (interactive "bPurge html test sections in buffer: ")
  (save-excursion
    (save-restriction
      (goto-char (point-min))
      (while (re-search-forward "<!--Test-->" nil t)
        (let ((beg (match-beginning 0))
              (end (progn (re-search-forward "<!--End of Test-->" nil t)
                          (match-end 0))))
          (if end
              (kill-region beg end)
            (error "Unmatched \"<!--Test-->\" sequence at position %d" beg)))))))
-- 
Mike Slass

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 17:24     ` Michael Slass
@ 2002-10-30 17:42       ` Friedrich Dominicus
  2002-10-30 17:50         ` Michael Slass
  0 siblings, 1 reply; 17+ messages in thread
From: Friedrich Dominicus @ 2002-10-30 17:42 UTC (permalink / raw)


Michael Slass <miknrene@drizzle.com> writes:

> 
> I think a lisp program would do better at this:
> 
> VERY LIGHTLY TESTED.  MAKE BACKUPS BEFORE EXPERIMENTING WITH THIS!
> 
> (defun paulc-purge-html-test-sections (buffer)
>   "Delete all occurances of text between <!--Test--> and <!--End of Test-->, inclusive."
>   (interactive "bPurge html test sections in buffer: ")
>   (save-excursion
>     (save-restriction
>       (goto-char (point-min))
>       (while (re-search-forward "<!--Test-->" nil t)
>         (let ((beg (match-beginning 0))
>               (end (progn (re-search-forward "<!--End of Test-->" nil t)
>                           (match-end 0))))
>           (if end
>               (kill-region beg end)
>             (error "Unmatched \"<!--Test-->\" sequence at position
%d" beg)))))))
Well this code is better in some areas, but Mike you missed a big
opportunity ;-) To let the user choose what the tags are and as
mentioned before regular expressions are overkill if you know your
data.

However a really nice solution anyway I think there is a problem with
the end stuff. 

The info pages say:
  Search forward from point for regular expression REGEXP.
  Set point to the end of the occurrence found, and return point.

That means you will return the End tags too, if I got that right,
which is not sure I'm a tired and had an unpleasant quarrel with
someone I really appriciate. 

So good night
Friedrich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 17:42       ` Friedrich Dominicus
@ 2002-10-30 17:50         ` Michael Slass
  2002-10-30 21:37           ` Michael Slass
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Slass @ 2002-10-30 17:50 UTC (permalink / raw)


Friedrich Dominicus <frido@q-software-solutions.com> writes:

>Michael Slass <miknrene@drizzle.com> writes:
>
>> 
>> I think a lisp program would do better at this:
>> 
>> VERY LIGHTLY TESTED.  MAKE BACKUPS BEFORE EXPERIMENTING WITH THIS!
>> 
>> (defun paulc-purge-html-test-sections (buffer)
>>   "Delete all occurances of text between <!--Test--> and <!--End of Test-->, inclusive."
>>   (interactive "bPurge html test sections in buffer: ")
>>   (save-excursion
>>     (save-restriction
>>       (goto-char (point-min))
>>       (while (re-search-forward "<!--Test-->" nil t)
>>         (let ((beg (match-beginning 0))
>>               (end (progn (re-search-forward "<!--End of Test-->" nil t)
>>                           (match-end 0))))
>>           (if end
>>               (kill-region beg end)
>>             (error "Unmatched \"<!--Test-->\" sequence at position
>%d" beg)))))))
>Well this code is better in some areas, but Mike you missed a big
>opportunity ;-) To let the user choose what the tags are and as
>mentioned before regular expressions are overkill if you know your
>data.
>
>However a really nice solution anyway I think there is a problem with
>the end stuff. 
>
>The info pages say:
>  Search forward from point for regular expression REGEXP.
>  Set point to the end of the occurrence found, and return point.
>
>That means you will return the End tags too, if I got that right,
>which is not sure I'm a tired and had an unpleasant quarrel with
>someone I really appriciate. 

Friedrich:

Your point about the re-search is well-taken; just search-forward
would be better.

The flexibilty of your solution which lets the user choose the start
and end tags is balanced against the convenience of my solution where
he doesn't have to type that information in.

My function doesn't have a meaningful return value, so I'm not sure
what your question is.  If you're concerned that I'm not setting beg
or end correctly, please note that they are set to the return value of
the functions (match-beginning 0) and (match-end 0) which behave as
you'd imagine from the names.

I think the OP wanted the tags killed, as well as all the stuff
inbetween them, but if that's not so, switching the (match-end 0) and
the (match-beginning 0) in the defun will fix that.


-- 
Mike Slass

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 15:33 ` Friedrich Dominicus
  2002-10-30 16:46   ` Paul Cohen
  2002-10-30 16:49   ` Barry Margolin
@ 2002-10-30 18:48   ` Stefan Monnier <foo@acm.com>
  2002-10-30 19:29     ` Barry Margolin
  2 siblings, 1 reply; 17+ messages in thread
From: Stefan Monnier <foo@acm.com> @ 2002-10-30 18:48 UTC (permalink / raw)


> Paul Cohen <paco@enea.se> writes:
>> I've tried a number of variants without any success. For example the
>> following regexp doesn't work:
>> 
>> <!--Test-->\(.*\n\)*<!--End of Test-->

It works for me, so you'll have to give us more information
about what you did and what you mean by "doesn't work".


        Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 18:48   ` Stefan Monnier <foo@acm.com>
@ 2002-10-30 19:29     ` Barry Margolin
  0 siblings, 0 replies; 17+ messages in thread
From: Barry Margolin @ 2002-10-30 19:29 UTC (permalink / raw)


In article <5lbs5byd89.fsf@rum.cs.yale.edu>,
Stefan Monnier  <foo@acm.com> wrote:
>> Paul Cohen <paco@enea.se> writes:
>>> I've tried a number of variants without any success. For example the
>>> following regexp doesn't work:
>>> 
>>> <!--Test-->\(.*\n\)*<!--End of Test-->
>
>It works for me

If you started with a file like:

--------------------
<!--Test-->
This should be deleted
<!--End of Test-->

This should not be deleted

<!--Test-->
This should be deleted
<!--End of Test-->
--------------------

and replaced the regexp with "", was the part that says "This should not be
deleted" deleted?

-- 
Barry Margolin, barmar@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Newbie regexp question
@ 2002-10-30 21:12 Bingham, Jay
  0 siblings, 0 replies; 17+ messages in thread
From: Bingham, Jay @ 2002-10-30 21:12 UTC (permalink / raw)


Paul,

After seeing Mike and Friedrich's exchange on your question I decided to create my own solution, a general purpose function that allows the user to specify the start and end as regular expressions and supply a replacement for the text between them, and optionally by giving a numeric prefix argument to the function force it to replace the tags as well.  In the process I noticed that Mike's function has a logic flaw.  It will never find the case of a missing end tag and instead deletes that final start tag.

Here is my function if you want it.

(defun replace-between-regexp (start-re end-re repl-str &optional incl)
  "Replace the text between two regular expressions supplied as arguments.
With a numeric argument the regular expressions are included.
When called non interactively incl should be nil for non-inclusion and
non-nil for inclusion."
  (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s and %s with: \nP")
  (while (re-search-forward start-re nil t)
    (let ((beg (if incl (match-beginning 0) (match-end 0)))
	  (end
	   (progn
	     (if (re-search-forward end-re nil t)
		 (if incl (match-end 0) (match-beginning 0))
	       nil))))
      (if (not end)
	  (error "Unmatched \"%s\" sequence at position %d" start-re beg)
	(delete-region beg end)
	(insert repl-str)))))

-_
J_)
C_)ingham
.    HP - NonStop Austin Software & Services - Software Quality Assurance
.    Austin, TX
. Language is the apparel in which your thoughts parade in public.
. Never clothe them in vulgar and shoddy attire.          -Dr. George W. Crane-

 -----Original Message-----
From: 	Friedrich Dominicus [mailto:frido@q-software-solutions.com] 
Sent:	Wednesday, October 30, 2002 11:43 AM
To:	help-gnu-emacs@gnu.org
Subject:	Re: Newbie regexp question

Michael Slass <miknrene@drizzle.com> writes:

> 
> I think a lisp program would do better at this:
> 
> VERY LIGHTLY TESTED.  MAKE BACKUPS BEFORE EXPERIMENTING WITH THIS!
> 
> (defun paulc-purge-html-test-sections (buffer)
>   "Delete all occurances of text between <!--Test--> and <!--End of Test-->, inclusive."
>   (interactive "bPurge html test sections in buffer: ")
>   (save-excursion
>     (save-restriction
>       (goto-char (point-min))
>       (while (re-search-forward "<!--Test-->" nil t)
>         (let ((beg (match-beginning 0))
>               (end (progn (re-search-forward "<!--End of Test-->" nil t)
>                           (match-end 0))))
>           (if end
>               (kill-region beg end)
>             (error "Unmatched \"<!--Test-->\" sequence at position
%d" beg)))))))
Well this code is better in some areas, but Mike you missed a big
opportunity ;-) To let the user choose what the tags are and as
mentioned before regular expressions are overkill if you know your
data.

However a really nice solution anyway I think there is a problem with
the end stuff. 

The info pages say:
  Search forward from point for regular expression REGEXP.
  Set point to the end of the occurrence found, and return point.

That means you will return the End tags too, if I got that right,
which is not sure I'm a tired and had an unpleasant quarrel with
someone I really appriciate. 

So good night
Friedrich
_______________________________________________
Help-gnu-emacs mailing list
Help-gnu-emacs@gnu.org
http://mail.gnu.org/mailman/listinfo/help-gnu-emacs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 17:50         ` Michael Slass
@ 2002-10-30 21:37           ` Michael Slass
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Slass @ 2002-10-30 21:37 UTC (permalink / raw)


Michael Slass <miknrene@drizzle.com> writes:

>Friedrich Dominicus <frido@q-software-solutions.com> writes:
>
>>Michael Slass <miknrene@drizzle.com> writes:
>>
>>> 
>>> I think a lisp program would do better at this:
>>> 
>>> VERY LIGHTLY TESTED.  MAKE BACKUPS BEFORE EXPERIMENTING WITH THIS!
>>> 
>>> (defun paulc-purge-html-test-sections (buffer)
>>>   "Delete all occurances of text between <!--Test--> and <!--End of Test-->, inclusive."
>>>   (interactive "bPurge html test sections in buffer: ")
>>>   (save-excursion
>>>     (save-restriction
>>>       (goto-char (point-min))
>>>       (while (re-search-forward "<!--Test-->" nil t)
>>>         (let ((beg (match-beginning 0))
>>>               (end (progn (re-search-forward "<!--End of Test-->" nil t)
>>>                           (match-end 0))))
>>>           (if end
>>>               (kill-region beg end)
>>>             (error "Unmatched \"<!--Test-->\" sequence at position %d" beg)))))))

>>However a really nice solution anyway I think there is a problem with
>>the end stuff. 
>>
>>The info pages say:
>>  Search forward from point for regular expression REGEXP.
>>  Set point to the end of the occurrence found, and return point.
>>
>>That means you will return the End tags too,

There *is* a problem with the end -- the (progn ...) should be (and ...)
so that end will be nil if the ending tag isn't found.  I forgot that
(match-end) would keep the value of the last successful match.

         (let ((beg (match-beginning 0))
               (end (and (re-search-forward "<!--End of Test-->" nil t)
                         (match-end 0))))


-- 
Mike Slass

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
       [not found] <mailman.1036012442.21874.help-gnu-emacs@gnu.org>
@ 2002-10-31 13:56 ` Paul Cohen
  2002-10-31 14:41   ` Friedrich Dominicus
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Cohen @ 2002-10-31 13:56 UTC (permalink / raw)


Hi all,

Thanks to everyone who has been kind to take time to answer my question! Jay's answer is definitely closest to solving my problem.

"Bingham, Jay" wrote:

> After seeing Mike and Friedrich's exchange on your question I decided to create my own solution, a general purpose function that allows the user to specify the start and end as regular expressions and supply a replacement for the text between them, and optionally by giving a numeric prefix argument to the function force it to replace the tags as well.

Neat.

> In the process I noticed that Mike's function has a logic flaw.  It will never find the case of a missing end tag and instead deletes that final start tag.

Ok.

> Here is my function if you want it.

Yes I do! :-)

>
> (defun replace-between-regexp (start-re end-re repl-str &optional incl)
>   "Replace the text between two regular expressions supplied as arguments.
> With a numeric argument the regular expressions are included.
> When called non interactively incl should be nil for non-inclusion and
> non-nil for inclusion."
>   (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s and %s with: \nP")
>   (while (re-search-forward start-re nil t)
>     (let ((beg (if incl (match-beginning 0) (match-end 0)))
>           (end
>            (progn
>              (if (re-search-forward end-re nil t)
>                  (if incl (match-end 0) (match-beginning 0))
>                nil))))
>       (if (not end)
>           (error "Unmatched \"%s\" sequence at position %d" start-re beg)
>         (delete-region beg end)
>         (insert repl-str)))))

I have few comments/questions.

I tried the above function in my *scratch* buffer by writing it and then adding the following lines (with line numbers!):

19. (replace-between-regexp "<!--Test-->" "<!--End of Test-->" "" 1)
20.
21. Pub
22. <!--Test-->
23. Foo
24. <!--End of Test-->
25. Bar

I then evaluated the function with C-j. This resulted in:

19. (replace-between-regexp "<!--Test-->" "<!--End of Test-->" "" 1)
20.
21.
22. Pub
23. nil
24.
25. Bar

With the cursor on line 24. My comments/questions are:

1) I understand that the "nil" on line 23 comes from the value of the last item in the function list, in this case "(insert repl-str)". But there is also a newline character is inserted after "nil". But I don't want either the "nil" or the extra newline character!

2) Line 21 containing "pub" is moved forward to line 22. I guess this is just because I did C-j at the end of line 19 or?

3) It would be nice if the cursor would return to its original position after running the command. I tried adding the "save-excursion" command after the "interactive" line in the function but it didn't work. The cursor still ended up on line 24.

4) The idea to solve my problem with a lisp function is neat and I guess there are many situations where one would like to add ones own special purpose functions to Emacs. My question is: where is the suitable place to put them? In my .emacs file or in a separate file. What are the conventions?

Thanks again for the help!

/Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-31 13:56 ` Paul Cohen
@ 2002-10-31 14:41   ` Friedrich Dominicus
  0 siblings, 0 replies; 17+ messages in thread
From: Friedrich Dominicus @ 2002-10-31 14:41 UTC (permalink / raw)


Paul Cohen <paco@enea.se> writes:

> Hi all,
> 
> Thanks to everyone who has been kind to take time to answer my question! Jay's answer is definitely closest to solving my problem.
> 
> "Bingham, Jay" wrote:
> 
> > After seeing Mike and Friedrich's exchange on your question I decided to create my own solution, a general purpose function that allows the user to specify the start and end as regular expressions and supply a replacement for the text between them, and optionally by giving a numeric prefix argument to the function force it to replace the tags as well.
> 
> Neat.
> 
> > In the process I noticed that Mike's function has a logic flaw.  It will never find the case of a missing end tag and instead deletes that final start tag.
> 
> Ok.
> 
> > Here is my function if you want it.
> 
> Yes I do! :-)
> 
> >
> > (defun replace-between-regexp (start-re end-re repl-str &optional incl)
> >   "Replace the text between two regular expressions supplied as arguments.
> > With a numeric argument the regular expressions are included.
> > When called non interactively incl should be nil for non-inclusion and
> > non-nil for inclusion."
> >   (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s and %s with: \nP")
> >   (while (re-search-forward start-re nil t)
> >     (let ((beg (if incl (match-beginning 0) (match-end 0)))
> >           (end
> >            (progn
> >              (if (re-search-forward end-re nil t)
> >                  (if incl (match-end 0) (match-beginning 0))
> >                nil))))
> >       (if (not end)
> >           (error "Unmatched \"%s\" sequence at position %d" start-re beg)
> >         (delete-region beg end)
> >         (insert repl-str)))))
> 
> I have few comments/questions.
> 
> I tried the above function in my *scratch* buffer by writing it and then adding the following lines (with line numbers!):
> 
> 19. (replace-between-regexp "<!--Test-->" "<!--End of Test-->" "" 1)
> 20.
> 21. Pub
> 22. <!--Test-->
> 23. Foo
> 24. <!--End of Test-->
> 25. Bar
> 
> I then evaluated the function with C-j. This resulted in:
> 
> 19. (replace-between-regexp "<!--Test-->" "<!--End of Test-->" "" 1)
> 20.
> 21.
> 22. Pub
> 23. nil
> 24.
> 25. Bar
> 
> With the cursor on line 24. My comments/questions are:
> 
> 1) I understand that the "nil" on line 23 comes from the value of
> the last item in the function list, in this case "(insert
> repl-str)". But there is also a newline character is inserted after
> "nil". But I don't want either the "nil" or the extra newline
> character! 
I'm a bit lazy to answer all or suggest other things but here's are my
thoughts on that
Call it with (replace-betwe.... "<!-- End of Test" nil t)

than check before the output
(when repl-string (insert repl-string))


> 
> 2) Line 21 containing "pub" is moved forward to line 22. I guess
> this is just because I did C-j at the end of line 19 or?
Check the documentation of C-j
C-h k C-j gives it to you
> 
> 3) It would be nice if the cursor would return to its original
> position after running the command. I tried adding the
> "save-excursion" command after the "interactive" line in the function
>but it didn't work. The cursor still ended up on line 24.
You have to enclose it all in save-excursion

It looks like this than:
(defun replace-between-regexp (start-re end-re repl-str &optional incl)
  "Replace the text between two regular expressions supplied as arguments.
 With a numeric argument the regular expressions are included.
 When called non interactively incl should be nil for non-inclusion and
 non-nil for inclusion."
  (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s
and %s with: \nP")
  (save-excursion
    (while (re-search-forward start-re nil t)
      (let ((beg (if incl (match-beginning 0) (match-end 0)))
            (end
             (progn
               (if (re-search-forward end-re nil t)
                   (if incl (match-end 0) (match-beginning 0))
                 nil))))
        (if (not end)
            (error "Unmatched \"%s\" sequence at position %d" start-re beg)
          (delete-region beg end)
          (when repl-str (insert repl-str)))))))

Do not use C-j than but C-x C-e 

This is the documentation for C-j
Documentation:
Evaluate sexp before point; print value into current buffer.
So you will get nil after this run

C-x C-e runs


Documentation:
Evaluate sexp before point; print value in minibuffer.
With argument, print output into current buffer.

So the output will be put into the minibuffer.

> 
> 4) The idea to solve my problem with a lisp function is neat and I
> guess there are many situations where one would like to add ones own
> special purpose functions to Emacs. My question is: where is the
> suitable place to put them? In my .emacs file or in a separate
> file. What are the conventions? 
.emacs is change quite frequently. I don't think you self-written
functions will after development, therfor put them into an own file,
byte-compiler that file and load it from the .emacs file.

I've put all my stuff under .xemacs here and libraries I've installed
over time have found there way under ~/lib/elisp. It makes it quite
easy to update other systems such that XEmacs behaves like I want it
too.

It's IMHO nearly unavaiable that you customize your Emacs over time
and you feel like a fish on land while you are used to some things and
they are not there. 

Regards
Friedrich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Newbie regexp question
  2002-10-30 15:07 Newbie regexp question Paul Cohen
  2002-10-30 15:33 ` Friedrich Dominicus
@ 2002-10-31 14:45 ` kgold
  1 sibling, 0 replies; 17+ messages in thread
From: kgold @ 2002-10-31 14:45 UTC (permalink / raw)



Assuming a newbie doesn't want to start writing elisp ...

I do all sorts of repetitive editing like this using a keyboard macro.
Since they use commands you already know (search, cursor movement,
mark, kill), they're easy to create.  And since they execute as
they're being defined, there's less chance for error and debugging
than elisp.

In this case, the macro would be:

isearch-forward-regexp <!--Test-->
beginning-of-line
set-mark-command
isearch-forward-regexp <!--End of Test-->
beginning-of-line
next-line
kill-region

Paul Cohen <paco@enea.se> writes:
> 
> I want to do a Emacs regexp search and replace on a HTML file containing
> patterns like this:
> 
> <!--Test-->
> ...
> <!--End of Test-->
> 
> Where "..." denotes a variable number of lines of HTML text.
> 
> I want to search for all occurrences of the above pattern and then
> remove them from the HTML file!
> 
> I've tried a number of variants without any success. For example the
> following regexp doesn't work:
> 
> <!--Test-->\(.*\n\)*<!--End of Test-->

-- 
-- 
Ken Goldman   kgold@watson.ibm.com   914-784-7646

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Newbie regexp question
@ 2002-10-31 18:11 Bingham, Jay
  0 siblings, 0 replies; 17+ messages in thread
From: Bingham, Jay @ 2002-10-31 18:11 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 7012 bytes --]

Friedrich,

Thanks for answering Paul's questions about the replace-between-regexp function.
I have one comment regarding the what appears to be a suggestion to change the insert logic to (when repl-string (insert repl-string)).
I do not believe that it is necessary to enclose the insert construct in the when construct.  Without the when construct if repl-string is empty nothing gets inserted into the buffer, so it serves no purpose, the nil that magically appeared is totally a byproduct of using the C-j to invoke the function.

Regarding Paul's question about where to put the function I agree that it is best to put functions into a separate file and load or auto-load them into emacs in the .emacs file.
Here is the function in a file with instructions on how to load or auto-load it.
 <<repbetre.el>> 

-_
J_)
C_)ingham
.    HP - NonStop Austin Software & Services - Software Quality Assurance
.    Austin, TX
. Language is the apparel in which your thoughts parade in public.
. Never clothe them in vulgar and shoddy attire.          -Dr. George W. Crane-

 -----Original Message-----
From: 	Friedrich Dominicus [mailto:frido@q-software-solutions.com] 
Sent:	Thursday, October 31, 2002 8:41 AM
To:	help-gnu-emacs@gnu.org
Subject:	Re: Newbie regexp question

Paul Cohen <paco@enea.se> writes:

> Hi all,
> 
> Thanks to everyone who has been kind to take time to answer my question! Jay's answer is definitely closest to solving my problem.
> 
> "Bingham, Jay" wrote:
> 
> > After seeing Mike and Friedrich's exchange on your question I decided to create my own solution, a general purpose function that allows the user to specify the start and end as regular expressions and supply a replacement for the text between them, and optionally by giving a numeric prefix argument to the function force it to replace the tags as well.
> 
> Neat.
> 
> > In the process I noticed that Mike's function has a logic flaw.  It will never find the case of a missing end tag and instead deletes that final start tag.
> 
> Ok.
> 
> > Here is my function if you want it.
> 
> Yes I do! :-)
> 
> >
> > (defun replace-between-regexp (start-re end-re repl-str &optional incl)
> >   "Replace the text between two regular expressions supplied as arguments.
> > With a numeric argument the regular expressions are included.
> > When called non interactively incl should be nil for non-inclusion and
> > non-nil for inclusion."
> >   (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s and %s with: \nP")
> >   (while (re-search-forward start-re nil t)
> >     (let ((beg (if incl (match-beginning 0) (match-end 0)))
> >           (end
> >            (progn
> >              (if (re-search-forward end-re nil t)
> >                  (if incl (match-end 0) (match-beginning 0))
> >                nil))))
> >       (if (not end)
> >           (error "Unmatched \"%s\" sequence at position %d" start-re beg)
> >         (delete-region beg end)
> >         (insert repl-str)))))
> 
> I have few comments/questions.
> 
> I tried the above function in my *scratch* buffer by writing it and then adding the following lines (with line numbers!):
> 
> 19. (replace-between-regexp "<!--Test-->" "<!--End of Test-->" "" 1)
> 20.
> 21. Pub
> 22. <!--Test-->
> 23. Foo
> 24. <!--End of Test-->
> 25. Bar
> 
> I then evaluated the function with C-j. This resulted in:
> 
> 19. (replace-between-regexp "<!--Test-->" "<!--End of Test-->" "" 1)
> 20.
> 21.
> 22. Pub
> 23. nil
> 24.
> 25. Bar
> 
> With the cursor on line 24. My comments/questions are:
> 
> 1) I understand that the "nil" on line 23 comes from the value of
> the last item in the function list, in this case "(insert
> repl-str)". But there is also a newline character is inserted after
> "nil". But I don't want either the "nil" or the extra newline
> character! 
I'm a bit lazy to answer all or suggest other things but here's are my
thoughts on that
Call it with (replace-betwe.... "<!-- End of Test" nil t)

than check before the output
(when repl-string (insert repl-string))


> 
> 2) Line 21 containing "pub" is moved forward to line 22. I guess
> this is just because I did C-j at the end of line 19 or?
Check the documentation of C-j
C-h k C-j gives it to you
> 
> 3) It would be nice if the cursor would return to its original
> position after running the command. I tried adding the
> "save-excursion" command after the "interactive" line in the function
>but it didn't work. The cursor still ended up on line 24.
You have to enclose it all in save-excursion

It looks like this than:
(defun replace-between-regexp (start-re end-re repl-str &optional incl)
  "Replace the text between two regular expressions supplied as arguments.
 With a numeric argument the regular expressions are included.
 When called non interactively incl should be nil for non-inclusion and
 non-nil for inclusion."
  (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s
and %s with: \nP")
  (save-excursion
    (while (re-search-forward start-re nil t)
      (let ((beg (if incl (match-beginning 0) (match-end 0)))
            (end
             (progn
               (if (re-search-forward end-re nil t)
                   (if incl (match-end 0) (match-beginning 0))
                 nil))))
        (if (not end)
            (error "Unmatched \"%s\" sequence at position %d" start-re beg)
          (delete-region beg end)
          (when repl-str (insert repl-str)))))))

Do not use C-j than but C-x C-e 

This is the documentation for C-j
Documentation:
Evaluate sexp before point; print value into current buffer.
So you will get nil after this run

C-x C-e runs


Documentation:
Evaluate sexp before point; print value in minibuffer.
With argument, print output into current buffer.

So the output will be put into the minibuffer.

> 
> 4) The idea to solve my problem with a lisp function is neat and I
> guess there are many situations where one would like to add ones own
> special purpose functions to Emacs. My question is: where is the
> suitable place to put them? In my .emacs file or in a separate
> file. What are the conventions? 
.emacs is change quite frequently. I don't think you self-written
functions will after development, therfor put them into an own file,
byte-compiler that file and load it from the .emacs file.

I've put all my stuff under .xemacs here and libraries I've installed
over time have found there way under ~/lib/elisp. It makes it quite
easy to update other systems such that XEmacs behaves like I want it
too.

It's IMHO nearly unavaiable that you customize your Emacs over time
and you feel like a fish on land while you are used to some things and
they are not there. 

Regards
Friedrich
_______________________________________________
Help-gnu-emacs mailing list
Help-gnu-emacs@gnu.org
http://mail.gnu.org/mailman/listinfo/help-gnu-emacs


[-- Attachment #2: repbetre.el --]
[-- Type: application/octet-stream, Size: 4417 bytes --]

;;; repbetre.el --- Replace Between Regexp

;;-----------------------------------------------------------------------------
;; Last Modified Time-stamp: <31Oct2002 11:50:12 CST by JCBingham>
;;-----------------------------------------------------------------------------

;; Copyright 2002 JCBingham
;;
;; Author: jay.bingham@hp.com
;; Version: $Id: repbetre.el,v 0.01 2002/10/31 17:32:21 JCBingham Exp $
;; Keywords: replace regexp multiple lines
;; Requirements: None
;; Status: not intended to be distributed yet

;; This program is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 2, or (at your option)
;; any later version.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with this program; if not, write to the Free Software
;; Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


;;; Commentary:

;; This package contains a function that will replace multiple lines between
;; two delimiters which can be specified as regular expressions with text
;; supplied as an argument.
;; 
;; The interactive functions in this package are:
;;	replace-between-regexp - 
;;		A function that replaces the text between two regular 
;;		expressions supplied as arguments.  With a numeric argument
;;		the regular expressions are inculded.

;; To load this package in all your emacs sessions put this file into your 
;; load-path and put the following line (without the comment delimiter) 
;; into your ~/.emacs:
;;   (require 'repbetre)

;; To auto-load this package in your emacs sessions (loaded only when needed)
;; put this file into your load-path and put the following lines (without the
;; comment delimiters) into your ~/.emacs:
;;  (autoload 'replace-between-regexp "jcb-tools"
;;   "Replace the text between two regular expressions supplied as arguments."
;;   t nil)
;;  (autoload 'dired-copy-filename "jcb-tools"


;;;++ Module History ++

;; 31 Oct 2002 - JCBingham - 
;;	 Initial version containing the following -
;;	interactive functions:
;;	 replace-between-regexp

;;;-- Module History end --


;;; Code:

(provide 'repbetre)

\f
;;;;##########################################################################
;;;;  Interactive Functions
;;;;##########################################################################

;;;======<Interactive Function>===============================================
;;
;; Function: 
;;   Replace the text between two regular expressions supplied as arguments.
;;   With a numeric argument the regular expressions are inculded.
;;
;; Psuedo code:
;;  while search for the start-re is successful
;;    if incl specified
;;      let start be the beginning of the match location
;;    else
;;      let start be the end of the match location
;;    endif
;;    search for the end-re
;;    if incl is specified 
;;      let end be the end of the match location
;;    else
;;      let end be the start of the match location
;;    endif
;;    if end is not set
;;      issue an error message
;;    else
;;      delete the range speified by beg and end
;;      insert the text from the repl-str argument
;;  end while
;;
(defun replace-between-regexp (start-re end-re repl-str &optional incl)
  "Replace the text between two regular expressions supplied as arguments.
 With a numeric argument the regular expressions are included.
 When called non interactively incl should be nil for non-inclusion and
 non-nil for inclusion."
  (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s and %s with: \nP")
  (save-excursion
    (while (re-search-forward start-re nil t)
      (let ((beg (if incl (match-beginning 0) (match-end 0)))
            (end
	     (and (re-search-forward end-re nil t)
		  (if incl (match-end 0) (match-beginning 0)))))
        (if (not end)
            (error "Search failed for ending regexp \"%s\" after position %d" end-re beg)
          (delete-region beg end)
          (insert repl-str))))))

;;; END OF repbetre.el

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2002-10-31 18:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-30 15:07 Newbie regexp question Paul Cohen
2002-10-30 15:33 ` Friedrich Dominicus
2002-10-30 16:46   ` Paul Cohen
2002-10-30 17:19     ` Friedrich Dominicus
2002-10-30 17:24     ` Michael Slass
2002-10-30 17:42       ` Friedrich Dominicus
2002-10-30 17:50         ` Michael Slass
2002-10-30 21:37           ` Michael Slass
2002-10-30 16:49   ` Barry Margolin
2002-10-30 18:48   ` Stefan Monnier <foo@acm.com>
2002-10-30 19:29     ` Barry Margolin
2002-10-31 14:45 ` kgold
  -- strict thread matches above, loose matches on Subject: below --
2002-10-30 16:57 Bingham, Jay
2002-10-30 21:12 Bingham, Jay
     [not found] <mailman.1036012442.21874.help-gnu-emacs@gnu.org>
2002-10-31 13:56 ` Paul Cohen
2002-10-31 14:41   ` Friedrich Dominicus
2002-10-31 18:11 Bingham, Jay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).