all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Newbie regexp question
@ 2002-10-30 15:07 Paul Cohen
  2002-10-30 15:33 ` Friedrich Dominicus
  2002-10-31 14:45 ` kgold
  0 siblings, 2 replies; 17+ messages in thread
From: Paul Cohen @ 2002-10-30 15:07 UTC (permalink / raw)


Hi

I want to do a Emacs regexp search and replace on a HTML file containing
patterns like this:

<!--Test-->
...
<!--End of Test-->

Where "..." denotes a variable number of lines of HTML text.

I want to search for all occurrences of the above pattern and then
remove them from the HTML file!

I've tried a number of variants without any success. For example the
following regexp doesn't work:

<!--Test-->\(.*\n\)*<!--End of Test-->

/Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: Newbie regexp question
@ 2002-10-30 16:57 Bingham, Jay
  0 siblings, 0 replies; 17+ messages in thread
From: Bingham, Jay @ 2002-10-30 16:57 UTC (permalink / raw)


Fredrich,

It is too bad that you do not understand what Paul wants to do.

Paul,

Here is what I understand you want to do: 
find the next occurrence of <!--Test-->, find the next occurrence <!--End of Test-->, delete everything between the start of <!--Test--> and end of <!--End of Test-->, do this repeatedly in the buffer.

There are some problems with your approach to this.
First, you did not end the pattern <!--Test--> with an end of line (you should probably also end the pattern <!--End of Test--> with an end of line, unless you want a blank line where these were found).
Second, regexp does not recognize "\n" as a new line it just puts an "n" in the pattern.  In order to match new lines you have to put literal new line characters in the pattern.  This is done with the C-q C-j sequence.
Third, even if you make the above corrections and produce the following pattern:
^[ \t]*<!--Test-->
\(.*
\)*[ \t]*<!--End of Test-->

it still will not do what you want it to do.  The reason that it won't is that the asterisk at the end of the sub-pattern \(.*
\)* tells emacs to match as many as possible of the preceding sub-pattern.  So this will match from the start of the first <!--Test--> to the end of the last <!--End of Test-->.  Not exactly what I think you had in mind.

The only way that I know to do what you want to do is to write a function to do it.  This function would prompt for the first pattern, then prompt for the second pattern, then prompt for a replacement string.  It would then search for the first pattern, save the start location of the first pattern, search for the second pattern and replace the range between the start of the first pattern and the end of the second pattern with the replacement string.  (I say replacement string rather than pattern because the \DIGIT meta which is the only pattern meta that is of use in a replacement pattern would not work without doing some special coding in the function to simulate its operation).
I used to have a function that would delete everything between two patterns, but when I left my last employer I failed to capture a copy of it.  I kick my self quite often for not capturing the functions that I developed there.

-_
J_)
C_)ingham
.    HP - NonStop Austin Software & Services - Software Quality Assurance
.    Austin, TX
. Language is the apparel in which your thoughts parade in public.
. Never clothe them in vulgar and shoddy attire.          -Dr. George W. Crane-

 -----Original Message-----
From: 	Friedrich Dominicus [mailto:frido@q-software-solutions.com] 
Sent:	Wednesday, October 30, 2002 9:34 AM
To:	help-gnu-emacs@gnu.org
Subject:	Re: Newbie regexp question

Paul Cohen <paco@enea.se> writes:

> Hi
> 
> I want to do a Emacs regexp search and replace on a HTML file containing
> patterns like this:
> 
> <!--Test-->
> ...
> <!--End of Test-->
> 
> Where "..." denotes a variable number of lines of HTML text.
> 
> I want to search for all occurrences of the above pattern and then
> remove them from the HTML file!
> 
> I've tried a number of variants without any success. For example the
> following regexp doesn't work:
> 
> <!--Test-->\(.*\n\)*<!--End of Test-->
I would restate the problem. It does not make much sense to me to
match over a bunch of lines you do not want to handle. 

So how about
M-C-% ^[ \t]*<!--.*Test.*--> with: RET

Or even better if you kow exactly what you are looking for
using replace-string?

Regards
Friedrich
_______________________________________________
Help-gnu-emacs mailing list
Help-gnu-emacs@gnu.org
http://mail.gnu.org/mailman/listinfo/help-gnu-emacs

^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: Newbie regexp question
@ 2002-10-30 21:12 Bingham, Jay
  0 siblings, 0 replies; 17+ messages in thread
From: Bingham, Jay @ 2002-10-30 21:12 UTC (permalink / raw)


Paul,

After seeing Mike and Friedrich's exchange on your question I decided to create my own solution, a general purpose function that allows the user to specify the start and end as regular expressions and supply a replacement for the text between them, and optionally by giving a numeric prefix argument to the function force it to replace the tags as well.  In the process I noticed that Mike's function has a logic flaw.  It will never find the case of a missing end tag and instead deletes that final start tag.

Here is my function if you want it.

(defun replace-between-regexp (start-re end-re repl-str &optional incl)
  "Replace the text between two regular expressions supplied as arguments.
With a numeric argument the regular expressions are included.
When called non interactively incl should be nil for non-inclusion and
non-nil for inclusion."
  (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s and %s with: \nP")
  (while (re-search-forward start-re nil t)
    (let ((beg (if incl (match-beginning 0) (match-end 0)))
	  (end
	   (progn
	     (if (re-search-forward end-re nil t)
		 (if incl (match-end 0) (match-beginning 0))
	       nil))))
      (if (not end)
	  (error "Unmatched \"%s\" sequence at position %d" start-re beg)
	(delete-region beg end)
	(insert repl-str)))))

-_
J_)
C_)ingham
.    HP - NonStop Austin Software & Services - Software Quality Assurance
.    Austin, TX
. Language is the apparel in which your thoughts parade in public.
. Never clothe them in vulgar and shoddy attire.          -Dr. George W. Crane-

 -----Original Message-----
From: 	Friedrich Dominicus [mailto:frido@q-software-solutions.com] 
Sent:	Wednesday, October 30, 2002 11:43 AM
To:	help-gnu-emacs@gnu.org
Subject:	Re: Newbie regexp question

Michael Slass <miknrene@drizzle.com> writes:

> 
> I think a lisp program would do better at this:
> 
> VERY LIGHTLY TESTED.  MAKE BACKUPS BEFORE EXPERIMENTING WITH THIS!
> 
> (defun paulc-purge-html-test-sections (buffer)
>   "Delete all occurances of text between <!--Test--> and <!--End of Test-->, inclusive."
>   (interactive "bPurge html test sections in buffer: ")
>   (save-excursion
>     (save-restriction
>       (goto-char (point-min))
>       (while (re-search-forward "<!--Test-->" nil t)
>         (let ((beg (match-beginning 0))
>               (end (progn (re-search-forward "<!--End of Test-->" nil t)
>                           (match-end 0))))
>           (if end
>               (kill-region beg end)
>             (error "Unmatched \"<!--Test-->\" sequence at position
%d" beg)))))))
Well this code is better in some areas, but Mike you missed a big
opportunity ;-) To let the user choose what the tags are and as
mentioned before regular expressions are overkill if you know your
data.

However a really nice solution anyway I think there is a problem with
the end stuff. 

The info pages say:
  Search forward from point for regular expression REGEXP.
  Set point to the end of the occurrence found, and return point.

That means you will return the End tags too, if I got that right,
which is not sure I'm a tired and had an unpleasant quarrel with
someone I really appriciate. 

So good night
Friedrich
_______________________________________________
Help-gnu-emacs mailing list
Help-gnu-emacs@gnu.org
http://mail.gnu.org/mailman/listinfo/help-gnu-emacs

^ permalink raw reply	[flat|nested] 17+ messages in thread
[parent not found: <mailman.1036012442.21874.help-gnu-emacs@gnu.org>]
* RE: Newbie regexp question
@ 2002-10-31 18:11 Bingham, Jay
  0 siblings, 0 replies; 17+ messages in thread
From: Bingham, Jay @ 2002-10-31 18:11 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 7012 bytes --]

Friedrich,

Thanks for answering Paul's questions about the replace-between-regexp function.
I have one comment regarding the what appears to be a suggestion to change the insert logic to (when repl-string (insert repl-string)).
I do not believe that it is necessary to enclose the insert construct in the when construct.  Without the when construct if repl-string is empty nothing gets inserted into the buffer, so it serves no purpose, the nil that magically appeared is totally a byproduct of using the C-j to invoke the function.

Regarding Paul's question about where to put the function I agree that it is best to put functions into a separate file and load or auto-load them into emacs in the .emacs file.
Here is the function in a file with instructions on how to load or auto-load it.
 <<repbetre.el>> 

-_
J_)
C_)ingham
.    HP - NonStop Austin Software & Services - Software Quality Assurance
.    Austin, TX
. Language is the apparel in which your thoughts parade in public.
. Never clothe them in vulgar and shoddy attire.          -Dr. George W. Crane-

 -----Original Message-----
From: 	Friedrich Dominicus [mailto:frido@q-software-solutions.com] 
Sent:	Thursday, October 31, 2002 8:41 AM
To:	help-gnu-emacs@gnu.org
Subject:	Re: Newbie regexp question

Paul Cohen <paco@enea.se> writes:

> Hi all,
> 
> Thanks to everyone who has been kind to take time to answer my question! Jay's answer is definitely closest to solving my problem.
> 
> "Bingham, Jay" wrote:
> 
> > After seeing Mike and Friedrich's exchange on your question I decided to create my own solution, a general purpose function that allows the user to specify the start and end as regular expressions and supply a replacement for the text between them, and optionally by giving a numeric prefix argument to the function force it to replace the tags as well.
> 
> Neat.
> 
> > In the process I noticed that Mike's function has a logic flaw.  It will never find the case of a missing end tag and instead deletes that final start tag.
> 
> Ok.
> 
> > Here is my function if you want it.
> 
> Yes I do! :-)
> 
> >
> > (defun replace-between-regexp (start-re end-re repl-str &optional incl)
> >   "Replace the text between two regular expressions supplied as arguments.
> > With a numeric argument the regular expressions are included.
> > When called non interactively incl should be nil for non-inclusion and
> > non-nil for inclusion."
> >   (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s and %s with: \nP")
> >   (while (re-search-forward start-re nil t)
> >     (let ((beg (if incl (match-beginning 0) (match-end 0)))
> >           (end
> >            (progn
> >              (if (re-search-forward end-re nil t)
> >                  (if incl (match-end 0) (match-beginning 0))
> >                nil))))
> >       (if (not end)
> >           (error "Unmatched \"%s\" sequence at position %d" start-re beg)
> >         (delete-region beg end)
> >         (insert repl-str)))))
> 
> I have few comments/questions.
> 
> I tried the above function in my *scratch* buffer by writing it and then adding the following lines (with line numbers!):
> 
> 19. (replace-between-regexp "<!--Test-->" "<!--End of Test-->" "" 1)
> 20.
> 21. Pub
> 22. <!--Test-->
> 23. Foo
> 24. <!--End of Test-->
> 25. Bar
> 
> I then evaluated the function with C-j. This resulted in:
> 
> 19. (replace-between-regexp "<!--Test-->" "<!--End of Test-->" "" 1)
> 20.
> 21.
> 22. Pub
> 23. nil
> 24.
> 25. Bar
> 
> With the cursor on line 24. My comments/questions are:
> 
> 1) I understand that the "nil" on line 23 comes from the value of
> the last item in the function list, in this case "(insert
> repl-str)". But there is also a newline character is inserted after
> "nil". But I don't want either the "nil" or the extra newline
> character! 
I'm a bit lazy to answer all or suggest other things but here's are my
thoughts on that
Call it with (replace-betwe.... "<!-- End of Test" nil t)

than check before the output
(when repl-string (insert repl-string))


> 
> 2) Line 21 containing "pub" is moved forward to line 22. I guess
> this is just because I did C-j at the end of line 19 or?
Check the documentation of C-j
C-h k C-j gives it to you
> 
> 3) It would be nice if the cursor would return to its original
> position after running the command. I tried adding the
> "save-excursion" command after the "interactive" line in the function
>but it didn't work. The cursor still ended up on line 24.
You have to enclose it all in save-excursion

It looks like this than:
(defun replace-between-regexp (start-re end-re repl-str &optional incl)
  "Replace the text between two regular expressions supplied as arguments.
 With a numeric argument the regular expressions are included.
 When called non interactively incl should be nil for non-inclusion and
 non-nil for inclusion."
  (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s
and %s with: \nP")
  (save-excursion
    (while (re-search-forward start-re nil t)
      (let ((beg (if incl (match-beginning 0) (match-end 0)))
            (end
             (progn
               (if (re-search-forward end-re nil t)
                   (if incl (match-end 0) (match-beginning 0))
                 nil))))
        (if (not end)
            (error "Unmatched \"%s\" sequence at position %d" start-re beg)
          (delete-region beg end)
          (when repl-str (insert repl-str)))))))

Do not use C-j than but C-x C-e 

This is the documentation for C-j
Documentation:
Evaluate sexp before point; print value into current buffer.
So you will get nil after this run

C-x C-e runs


Documentation:
Evaluate sexp before point; print value in minibuffer.
With argument, print output into current buffer.

So the output will be put into the minibuffer.

> 
> 4) The idea to solve my problem with a lisp function is neat and I
> guess there are many situations where one would like to add ones own
> special purpose functions to Emacs. My question is: where is the
> suitable place to put them? In my .emacs file or in a separate
> file. What are the conventions? 
.emacs is change quite frequently. I don't think you self-written
functions will after development, therfor put them into an own file,
byte-compiler that file and load it from the .emacs file.

I've put all my stuff under .xemacs here and libraries I've installed
over time have found there way under ~/lib/elisp. It makes it quite
easy to update other systems such that XEmacs behaves like I want it
too.

It's IMHO nearly unavaiable that you customize your Emacs over time
and you feel like a fish on land while you are used to some things and
they are not there. 

Regards
Friedrich
_______________________________________________
Help-gnu-emacs mailing list
Help-gnu-emacs@gnu.org
http://mail.gnu.org/mailman/listinfo/help-gnu-emacs


[-- Attachment #2: repbetre.el --]
[-- Type: application/octet-stream, Size: 4417 bytes --]

;;; repbetre.el --- Replace Between Regexp

;;-----------------------------------------------------------------------------
;; Last Modified Time-stamp: <31Oct2002 11:50:12 CST by JCBingham>
;;-----------------------------------------------------------------------------

;; Copyright 2002 JCBingham
;;
;; Author: jay.bingham@hp.com
;; Version: $Id: repbetre.el,v 0.01 2002/10/31 17:32:21 JCBingham Exp $
;; Keywords: replace regexp multiple lines
;; Requirements: None
;; Status: not intended to be distributed yet

;; This program is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 2, or (at your option)
;; any later version.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with this program; if not, write to the Free Software
;; Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


;;; Commentary:

;; This package contains a function that will replace multiple lines between
;; two delimiters which can be specified as regular expressions with text
;; supplied as an argument.
;; 
;; The interactive functions in this package are:
;;	replace-between-regexp - 
;;		A function that replaces the text between two regular 
;;		expressions supplied as arguments.  With a numeric argument
;;		the regular expressions are inculded.

;; To load this package in all your emacs sessions put this file into your 
;; load-path and put the following line (without the comment delimiter) 
;; into your ~/.emacs:
;;   (require 'repbetre)

;; To auto-load this package in your emacs sessions (loaded only when needed)
;; put this file into your load-path and put the following lines (without the
;; comment delimiters) into your ~/.emacs:
;;  (autoload 'replace-between-regexp "jcb-tools"
;;   "Replace the text between two regular expressions supplied as arguments."
;;   t nil)
;;  (autoload 'dired-copy-filename "jcb-tools"


;;;++ Module History ++

;; 31 Oct 2002 - JCBingham - 
;;	 Initial version containing the following -
;;	interactive functions:
;;	 replace-between-regexp

;;;-- Module History end --


;;; Code:

(provide 'repbetre)

\f
;;;;##########################################################################
;;;;  Interactive Functions
;;;;##########################################################################

;;;======<Interactive Function>===============================================
;;
;; Function: 
;;   Replace the text between two regular expressions supplied as arguments.
;;   With a numeric argument the regular expressions are inculded.
;;
;; Psuedo code:
;;  while search for the start-re is successful
;;    if incl specified
;;      let start be the beginning of the match location
;;    else
;;      let start be the end of the match location
;;    endif
;;    search for the end-re
;;    if incl is specified 
;;      let end be the end of the match location
;;    else
;;      let end be the start of the match location
;;    endif
;;    if end is not set
;;      issue an error message
;;    else
;;      delete the range speified by beg and end
;;      insert the text from the repl-str argument
;;  end while
;;
(defun replace-between-regexp (start-re end-re repl-str &optional incl)
  "Replace the text between two regular expressions supplied as arguments.
 With a numeric argument the regular expressions are included.
 When called non interactively incl should be nil for non-inclusion and
 non-nil for inclusion."
  (interactive "sStart regexp: \nsEnd regexp: \nsReplace between %s and %s with: \nP")
  (save-excursion
    (while (re-search-forward start-re nil t)
      (let ((beg (if incl (match-beginning 0) (match-end 0)))
            (end
	     (and (re-search-forward end-re nil t)
		  (if incl (match-end 0) (match-beginning 0)))))
        (if (not end)
            (error "Search failed for ending regexp \"%s\" after position %d" end-re beg)
          (delete-region beg end)
          (insert repl-str))))))

;;; END OF repbetre.el

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2002-10-31 18:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-30 15:07 Newbie regexp question Paul Cohen
2002-10-30 15:33 ` Friedrich Dominicus
2002-10-30 16:46   ` Paul Cohen
2002-10-30 17:19     ` Friedrich Dominicus
2002-10-30 17:24     ` Michael Slass
2002-10-30 17:42       ` Friedrich Dominicus
2002-10-30 17:50         ` Michael Slass
2002-10-30 21:37           ` Michael Slass
2002-10-30 16:49   ` Barry Margolin
2002-10-30 18:48   ` Stefan Monnier <foo@acm.com>
2002-10-30 19:29     ` Barry Margolin
2002-10-31 14:45 ` kgold
  -- strict thread matches above, loose matches on Subject: below --
2002-10-30 16:57 Bingham, Jay
2002-10-30 21:12 Bingham, Jay
     [not found] <mailman.1036012442.21874.help-gnu-emacs@gnu.org>
2002-10-31 13:56 ` Paul Cohen
2002-10-31 14:41   ` Friedrich Dominicus
2002-10-31 18:11 Bingham, Jay

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.