all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* regexp with match over multiple lines
@ 2011-05-05  9:05 AngusC
  2011-05-05 10:04 ` Peter Dyballa
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: AngusC @ 2011-05-05  9:05 UTC (permalink / raw
  To: Help-gnu-emacs


I want to remove all instances of <![CDATA[ ... ]]> data in a file.  My
regexp works if the start and end tag is on the same line.  But not if the
end tag is not on this same line.  Is it possible to apply regex across
multiple lines.

My regex is: <\!\[CDATA\[.*\]\]> and that works if all on one line.  

What can I do?  Is this where lisp required?

Angus
-- 
View this message in context: http://old.nabble.com/regexp-with-match-over-multiple-lines-tp31548643p31548643.html
Sent from the Emacs - Help mailing list archive at Nabble.com.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05  9:05 regexp with match over multiple lines AngusC
@ 2011-05-05 10:04 ` Peter Dyballa
  2011-05-05 16:58   ` AngusC
  2011-05-05 13:08 ` ken
  2011-05-05 17:28 ` Andreas Röhler
  2 siblings, 1 reply; 15+ messages in thread
From: Peter Dyballa @ 2011-05-05 10:04 UTC (permalink / raw
  To: AngusC; +Cc: Help-gnu-emacs


Am 05.05.2011 um 11:05 schrieb AngusC:

> What can I do?


Check the archive! This question has been answered a few times. You  
can insert a newline character with C-q C-j. "\\(.\\|\n\\)" matches  
everything. With character classes this should work as well:  
"[[:print:][:space]]".

--
Greetings

   Pete

When people run around and around in circles we say they are crazy.  
When planets do it we say they are orbiting.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05  9:05 regexp with match over multiple lines AngusC
  2011-05-05 10:04 ` Peter Dyballa
@ 2011-05-05 13:08 ` ken
  2011-05-05 17:28 ` Andreas Röhler
  2 siblings, 0 replies; 15+ messages in thread
From: ken @ 2011-05-05 13:08 UTC (permalink / raw
  To: AngusC; +Cc: Help-gnu-emacs

On 05/05/2011 05:05 AM AngusC wrote:
> I want to remove all instances of <![CDATA[ ... ]]> data in a file.  My
> regexp works if the start and end tag is on the same line.  But not if the
> end tag is not on this same line.  Is it possible to apply regex across
> multiple lines.
> 
> My regex is: <\!\[CDATA\[.*\]\]> and that works if all on one line.  
> 
> What can I do?  Is this where lisp required?
> 
> Angus

"<\!\[CDATA\[^>]*?>" should work.

-- 
"Truth is the most valuable thing we have, so I try to conserve it."
	--Mark Twain



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05 10:04 ` Peter Dyballa
@ 2011-05-05 16:58   ` AngusC
  2011-05-05 17:02     ` Deniz Dogan
  0 siblings, 1 reply; 15+ messages in thread
From: AngusC @ 2011-05-05 16:58 UTC (permalink / raw
  To: Help-gnu-emacs



Peter Dyballa wrote:
> 
> 
> Am 05.05.2011 um 11:05 schrieb AngusC:
> 
>> What can I do?
> 
> 
> Check the archive! This question has been answered a few times. You  
> can insert a newline character with C-q C-j. "\\(.\\|\n\\)" matches  
> everything. With character classes this should work as well:  
> "[[:print:][:space]]".
> 
> 

I can insert C-1 C-j and that works but only for the number of carriage
returns I enter.  If this is variable then this means I would have to rerun
with eg 2 crs, 3crs, etc.  How can I specify that the C-q C-j can be
repeated any number of times?


-- 
View this message in context: http://old.nabble.com/regexp-with-match-over-multiple-lines-tp31548643p31552210.html
Sent from the Emacs - Help mailing list archive at Nabble.com.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05 16:58   ` AngusC
@ 2011-05-05 17:02     ` Deniz Dogan
  2011-05-05 18:15       ` AngusC
  0 siblings, 1 reply; 15+ messages in thread
From: Deniz Dogan @ 2011-05-05 17:02 UTC (permalink / raw
  To: AngusC; +Cc: Help-gnu-emacs

2011/5/5 AngusC <anguscomber@gmail.com>:
> I can insert C-1 C-j and that works but only for the number of carriage
> returns I enter.  If this is variable then this means I would have to rerun
> with eg 2 crs, 3crs, etc.  How can I specify that the C-q C-j can be
> repeated any number of times?
>

C-q C-j * (will look like ^J* at least in my Emacs)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05  9:05 regexp with match over multiple lines AngusC
  2011-05-05 10:04 ` Peter Dyballa
  2011-05-05 13:08 ` ken
@ 2011-05-05 17:28 ` Andreas Röhler
  2011-05-05 18:17   ` AngusC
  2 siblings, 1 reply; 15+ messages in thread
From: Andreas Röhler @ 2011-05-05 17:28 UTC (permalink / raw
  To: help-gnu-emacs

Am 05.05.2011 11:05, schrieb AngusC:
>
> I want to remove all instances of<![CDATA[ ... ]]>  data in a file.  My
> regexp works if the start and end tag is on the same line.  But not if the
> end tag is not on this same line.  Is it possible to apply regex across
> multiple lines.
>
> My regex is:<\!\[CDATA\[.*\]\]>  and that works if all on one line.
>
> What can I do?  Is this where lisp required?
>
> Angus

Hi,

when dealing with expressions characterized by a start- and end
string, quite often a little function is convenient:

Below a simplified example:

(setq startstring "abc")
(setq endstring "def")

(defun my-start-end-delete ()
   " "
   (interactive "*")
   (let (beg)
     (while (search-forward startstring nil (quote move) 1)
       (setq beg (match-beginning 0))
       (when (search-forward endstring nil (quote move) 1)
         (delete-region beg (match-end 0))))))

abcABCDEFdefAAAAAAAAAA -> AAAAAAAAAA







^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05 17:02     ` Deniz Dogan
@ 2011-05-05 18:15       ` AngusC
  2011-05-05 20:05         ` PJ Weisberg
  2011-05-05 22:10         ` Peter Dyballa
  0 siblings, 2 replies; 15+ messages in thread
From: AngusC @ 2011-05-05 18:15 UTC (permalink / raw
  To: Help-gnu-emacs



Deniz Dogan-3 wrote:
> 
> 2011/5/5 AngusC <anguscomber@gmail.com>:
>> I can insert C-1 C-j and that works but only for the number of carriage
>> returns I enter.  If this is variable then this means I would have to
>> rerun
>> with eg 2 crs, 3crs, etc.  How can I specify that the C-q C-j can be
>> repeated any number of times?
>>
> 
> C-q C-j * (will look like ^J* at least in my Emacs)
> 
> 
> 

I have this:

    <description>
      <![CDATA[^M
     some text^M
     and some more^M
       ]]>^M
    </description>

and I am using:
 <\1\[CDATA.*^J*>

But it doesn't make any replacement.  I used C-q C-j - ie typed those keys
when entering.


-- 
View this message in context: http://old.nabble.com/regexp-with-match-over-multiple-lines-tp31548643p31552810.html
Sent from the Emacs - Help mailing list archive at Nabble.com.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05 17:28 ` Andreas Röhler
@ 2011-05-05 18:17   ` AngusC
  2011-05-06  8:27     ` Andreas Röhler
  0 siblings, 1 reply; 15+ messages in thread
From: AngusC @ 2011-05-05 18:17 UTC (permalink / raw
  To: Help-gnu-emacs



Andreas Röhler wrote:
> 
> Am 05.05.2011 11:05, schrieb AngusC:
>>
>> I want to remove all instances of<![CDATA[ ... ]]>  data in a file.  My
>> regexp works if the start and end tag is on the same line.  But not if
>> the
>> end tag is not on this same line.  Is it possible to apply regex across
>> multiple lines.
>>
>> My regex is:<\!\[CDATA\[.*\]\]>  and that works if all on one line.
>>
>> What can I do?  Is this where lisp required?
>>
>> Angus
> 
> Hi,
> 
> when dealing with expressions characterized by a start- and end
> string, quite often a little function is convenient:
> 
> Below a simplified example:
> 
> (setq startstring "abc")
> (setq endstring "def")
> 
> (defun my-start-end-delete ()
>    " "
>    (interactive "*")
>    (let (beg)
>      (while (search-forward startstring nil (quote move) 1)
>        (setq beg (match-beginning 0))
>        (when (search-forward endstring nil (quote move) 1)
>          (delete-region beg (match-end 0))))))
> 
> abcABCDEFdefAAAAAAAAAA -> AAAAAAAAAA
> 
> 

I am thinking I probably need to learn lisp to have real power.  More regex
would probably help.

-- 
View this message in context: http://old.nabble.com/regexp-with-match-over-multiple-lines-tp31548643p31552827.html
Sent from the Emacs - Help mailing list archive at Nabble.com.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05 18:15       ` AngusC
@ 2011-05-05 20:05         ` PJ Weisberg
  2011-05-05 22:10         ` Peter Dyballa
  1 sibling, 0 replies; 15+ messages in thread
From: PJ Weisberg @ 2011-05-05 20:05 UTC (permalink / raw
  To: AngusC; +Cc: Help-gnu-emacs@gnu.org

On Thursday, May 5, 2011, AngusC <anguscomber@gmail.com> wrote:
> and I am using:
>  <\1\[CDATA.*^J*>
>
> But it doesn't make any replacement.  I used C-q C-j - ie typed those keys
> when entering.

That matches any number of characters that sren't newlines, followed
by any number of newlines, before the '>'.  The way to match any
character including newlines was in the first reply you got:
"\\(.\\|\n\\)"

On Thursday, May 5, 2011, AngusC <anguscomber@gmail.com> wrote:
>
>
> Andreas Röhler wrote:
>> (setq startstring "abc")
>> (setq endstring "def")
>>
>> (defun my-start-end-delete ()
>>    " "
>>    (interactive "*")
>>    (let (beg)
>>      (while (search-forward startstring nil (quote move) 1)
>>        (setq beg (match-beginning 0))
>>        (when (search-forward endstring nil (quote move) 1)
>>          (delete-region beg (match-end 0))))))
>>
>> abcABCDEFdefAAAAAAAAAA -> AAAAAAAAAA
>>
>>
>
> I am thinking I probably need to learn lisp to have real power.  More regex
> would probably help.

Learning lisp isn't a bad thing, but the above code is really an
example of why you should learn regular expressions.  There's no
reason the above should have taken more than one line.


-- 

-PJ



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05 18:15       ` AngusC
  2011-05-05 20:05         ` PJ Weisberg
@ 2011-05-05 22:10         ` Peter Dyballa
  2011-05-06 11:10           ` AngusC
  2011-05-06 11:20           ` AngusC
  1 sibling, 2 replies; 15+ messages in thread
From: Peter Dyballa @ 2011-05-05 22:10 UTC (permalink / raw
  To: AngusC; +Cc: Help-gnu-emacs


Am 05.05.2011 um 20:15 schrieb AngusC:

>       ]]>^M
>    </description>
>
> and I am using:
> <\1\[CDATA.*^J*>


If you see ^M then you should switch to some DOS or MAC encoding. But  
what's puzzling me is that not all lines have ^M at the end. Does this  
work: "<!\[CDATA\[[^>]+>"? I think other expressions would become too  
greedy...

BTW, is this \1 what you are really using or is it a typo, actually  
meaning "!"? (Which isn't special in Lisp, I think.)

--
Greetings

              ~  O
   Pete       ~~_\\_/%
              ~  O  o




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05 18:17   ` AngusC
@ 2011-05-06  8:27     ` Andreas Röhler
  0 siblings, 0 replies; 15+ messages in thread
From: Andreas Röhler @ 2011-05-06  8:27 UTC (permalink / raw
  To: help-gnu-emacs

Am 05.05.2011 20:17, schrieb AngusC:
>
>
> Andreas Röhler wrote:
>>
>> Am 05.05.2011 11:05, schrieb AngusC:
>>>
>>> I want to remove all instances of<![CDATA[ ... ]]>   data in a file.  My
>>> regexp works if the start and end tag is on the same line.  But not if
>>> the
>>> end tag is not on this same line.  Is it possible to apply regex across
>>> multiple lines.
>>>
>>> My regex is:<\!\[CDATA\[.*\]\]>   and that works if all on one line.
>>>
>>> What can I do?  Is this where lisp required?
>>>
>>> Angus
>>
>> Hi,
>>
>> when dealing with expressions characterized by a start- and end
>> string, quite often a little function is convenient:
>>
>> Below a simplified example:
>>
>> (setq startstring "abc")
>> (setq endstring "def")
>>
>> (defun my-start-end-delete ()
>>     " "
>>     (interactive "*")
>>     (let (beg)
>>       (while (search-forward startstring nil (quote move) 1)
>>         (setq beg (match-beginning 0))
>>         (when (search-forward endstring nil (quote move) 1)
>>           (delete-region beg (match-end 0))))))
>>
>> abcABCDEFdefAAAAAAAAAA ->  AAAAAAAAAA
>>
>>
>
> I am thinking I probably need to learn lisp to have real power.  More regex
> would probably help.
>

For me pleasure started when having some Emacs Lisp.

Before it was promising - afterward great :)

Cheers



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05 22:10         ` Peter Dyballa
@ 2011-05-06 11:10           ` AngusC
  2011-05-06 12:35             ` Peter Dyballa
  2011-05-06 11:20           ` AngusC
  1 sibling, 1 reply; 15+ messages in thread
From: AngusC @ 2011-05-06 11:10 UTC (permalink / raw
  To: Help-gnu-emacs



Peter Dyballa wrote:
> 
> 
> Am 05.05.2011 um 20:15 schrieb AngusC:
> 
>>       ]]>^M
>>    </description>
>>
>> and I am using:
>> <\1\[CDATA.*^J*>
> 
> 
> If you see ^M then you should switch to some DOS or MAC encoding. But  
> what's puzzling me is that not all lines have ^M at the end. Does this  
> work: "<!\[CDATA\[[^>]+>"? I think other expressions would become too  
> greedy...
> 
> BTW, is this \1 what you are really using or is it a typo, actually  
> meaning "!"? (Which isn't special in Lisp, I think.)
> 

Interestingly I made a mistake in missing out the first " character and it
worked.

This is what works:
<!\[CDATA\[[^>]+>"?

But not "<!\[CDATA\[[^>]+>"?

Anyway, I don't understand what the " bit in there is doing (nor some of the
other stuff) so I will study further.  Thanks a lot.


Yes the \1 was a type - meant to be \!

The file was created from a perl script - so yes ^M is not something I
usually see in other files.

Angus

-- 
View this message in context: http://old.nabble.com/regexp-with-match-over-multiple-lines-tp31548643p31557911.html
Sent from the Emacs - Help mailing list archive at Nabble.com.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-05 22:10         ` Peter Dyballa
  2011-05-06 11:10           ` AngusC
@ 2011-05-06 11:20           ` AngusC
  2011-05-06 13:52             ` Peter Dyballa
  1 sibling, 1 reply; 15+ messages in thread
From: AngusC @ 2011-05-06 11:20 UTC (permalink / raw
  To: Help-gnu-emacs



Peter Dyballa wrote:
> 
> 
> Am 05.05.2011 um 20:15 schrieb AngusC:
> 
>>       ]]>^M
>>    </description>
>>
>> and I am using:
>> <\1\[CDATA.*^J*>
> 
> 
> If you see ^M then you should switch to some DOS or MAC encoding. But  
> what's puzzling me is that not all lines have ^M at the end. Does this  
> work: "<!\[CDATA\[[^>]+>"? I think other expressions would become too  
> greedy...
> 
> BTW, is this \1 what you are really using or is it a typo, actually  
> meaning "!"? (Which isn't special in Lisp, I think.)
> 

This is the key bit which works:
<!\[CDATA\[[^>]+>

I need to enhance it to end with ]]> (in case there are embedded <> angle
brackets - but I can handle that one.  

-- 
View this message in context: http://old.nabble.com/regexp-with-match-over-multiple-lines-tp31548643p31557970.html
Sent from the Emacs - Help mailing list archive at Nabble.com.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-06 11:10           ` AngusC
@ 2011-05-06 12:35             ` Peter Dyballa
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Dyballa @ 2011-05-06 12:35 UTC (permalink / raw
  To: AngusC; +Cc: Help-gnu-emacs


Am 06.05.2011 um 13:10 schrieb AngusC:

> Anyway, I don't understand what the " bit in there is doing

It was just a mark for the borders (start and end) of the regexp.  
Without them it is:

	<!\[CDATA\[[^>]+>

The particle, in borders, "[^>]" means: every character except ">", to  
make the expression less greedy.

--
Greetings

   Pete


"Evolution"            o           __o                     _o _
           °\___o      /0~         -\<,              ^\___ /=\\_/-%
oo~_______ /\ /\______/ \_________O/ O_______________o===>-->O--o____




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: regexp with match over multiple lines
  2011-05-06 11:20           ` AngusC
@ 2011-05-06 13:52             ` Peter Dyballa
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Dyballa @ 2011-05-06 13:52 UTC (permalink / raw
  To: AngusC; +Cc: Help-gnu-emacs


Am 06.05.2011 um 13:20 schrieb AngusC:

> I need to enhance it to end with ]]> (in case there are embedded <>  
> angle
> brackets - but I can handle that one.

It can be easier to perform 90 % of the work with one not so complete  
regexp and then edit it (a few times, perhaps) to deal with the  
remainder... (OTOH it can take days until the complete regexp is found)

--
Greetings

   Pete

Engineer: a mechanism for converting caffeine into designs




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-05-06 13:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-05  9:05 regexp with match over multiple lines AngusC
2011-05-05 10:04 ` Peter Dyballa
2011-05-05 16:58   ` AngusC
2011-05-05 17:02     ` Deniz Dogan
2011-05-05 18:15       ` AngusC
2011-05-05 20:05         ` PJ Weisberg
2011-05-05 22:10         ` Peter Dyballa
2011-05-06 11:10           ` AngusC
2011-05-06 12:35             ` Peter Dyballa
2011-05-06 11:20           ` AngusC
2011-05-06 13:52             ` Peter Dyballa
2011-05-05 13:08 ` ken
2011-05-05 17:28 ` Andreas Röhler
2011-05-05 18:17   ` AngusC
2011-05-06  8:27     ` Andreas Röhler

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.