* Regexp: match any character including newline
@ 2013-10-16 14:42 Yuri Khan
2013-10-16 15:31 ` Kai Großjohann
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Yuri Khan @ 2013-10-16 14:42 UTC (permalink / raw)
To: help-gnu-emacs@gnu.org
Hello All,
I’m doing regexp replacements on a hard-wrapped XHTML-alike. Here’s an
original fragment:
===
<tr><td><pre><code>X(n, t)
X a(n, t)</code></pre></td><td></td>
<td><requires><p><code>T</code> shall be
<concept>Copy­Insert­able</concept> into
<code>X</code>.</p></requires>
<p>post: <code>distance(begin(), end()) == n</code></p>
<p>Constructs a sequence container with <code>n</code> copies
of <code>t</code></p></td></tr>
===
Here’s what I need to turn it into:
===
<expression><pre><code>X(n, t)
X a(n, t)</code></pre></expression>
<return_type></return_type>
<assertion_note><requires><p><code>T</code> shall be
<concept>Copy­Insert­able</concept> into
<code>X</code>.</p></requires>
<p>post: <code>distance(begin(), end()) == n</code></p>
<p>Constructs a sequence container with <code>n</code> copies
of <code>t</code></p></assertion_note>
===
To this end, I want to do a regexp replace of:
===
<tr><td>\(.*?\)</td><td>\(.*?\)</td>
<td>\(.*?\)</td></tr>
===
with
===
<expression>\1</expression>
<return_type>\2</return_type>
<assertion_note>\3</assertion_note>
===
except that “.” needs to match any character including newline.
I know the obvious solution: instead of “.”, use the following monstrosity:
===
\(?:.\|
\)
===
However, I find that very cumbersome to type, especially since I have
to press C-q C-j in between.
Is there a way to make “.” match newline too, or is there an easier
way to match any character including newline? (I don’t want to limit
myself to [:ascii:] as there are also Unicode-specific dashes.)
For now, I’ve devised a workaround of using [^@] where @ is a
character that does not occur in the text. Maybe [^^] since it’s
easier to type and looks cute :)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regexp: match any character including newline
2013-10-16 14:42 Regexp: match any character including newline Yuri Khan
@ 2013-10-16 15:31 ` Kai Großjohann
2013-10-16 15:56 ` Yuri Khan
2013-10-16 16:53 ` Drew Adams
2013-10-17 2:25 ` Eric Abrahamsen
2 siblings, 1 reply; 8+ messages in thread
From: Kai Großjohann @ 2013-10-16 15:31 UTC (permalink / raw)
To: Yuri Khan; +Cc: help-gnu-emacs@gnu.org
Yuri Khan wrote:
>
> To this end, I want to do a regexp replace of:
>
> ===
> <tr><td>\(.*?\)</td><td>\(.*?\)</td>
> <td>\(.*?\)</td></tr>
> ===
>
> with
>
> ===
> <expression>\1</expression>
> <return_type>\2</return_type>
> <assertion_note>\3</assertion_note>
> ===
You can use keyboard macros, but you will need a mode that understands
XML. Let's say you install nxml (it's part of Emacs I think). Let's
say the content is in a file foo.xml, so that nxml mode is turned on.
Consider that point is before the <tr>. Now you can use C-M-f to move
it before the <td>. Now you can use C-M-n to move it after the closing
</td>. Even if the content of <td>...</td> contains tags!
So you can record a keyboard macro that does the following steps:
- Move after the <tr>
- Insert "<expression>"
- Move to after the </td> with C-M-n
- Insert "</expression>" (using C-c /, say)
- Insert a newline
- Insert "<return_type>"
- Move to after the </td> with C-M-n
- C-c / to insert "</return_type>"
- "<assertion_note>", C-M-n, C-c /
- Use C-M-f to move past the closing </tr>
After all of this, you've got:
<tr><expression><td>foo</td></expression>
<return_type><td>bar</td></return_type>
<assertion_node><td>baz</td></assertion_node></tr>
Now you can do this: You set the mark with C-space. You move backward
over the whole thing with C-M-p. Now the whole <tr>...</tr> is marked.
Now you can use query-replace to replace <tr>, <td>, </td> and </tr>
with nothing in the highlighted region. (Need to experiment a bit
whether the region goes away after a query-replace. If it does, C-x C-x
might be your friend.)
See? No regex anywhere. Way cool! Instead, you're exploiting the
navigation that you get from Emacs modes.
Kai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regexp: match any character including newline
2013-10-16 15:31 ` Kai Großjohann
@ 2013-10-16 15:56 ` Yuri Khan
0 siblings, 0 replies; 8+ messages in thread
From: Yuri Khan @ 2013-10-16 15:56 UTC (permalink / raw)
To: Kai Großjohann; +Cc: help-gnu-emacs@gnu.org
On Wed, Oct 16, 2013 at 10:31 PM, Kai Großjohann
<kai.grossjohann@gmx.net> wrote:
> You can use keyboard macros, but you will need a mode that understands
> XML. Let's say you install nxml (it's part of Emacs I think). Let's
> say the content is in a file foo.xml, so that nxml mode is turned on.
> Consider that point is before the <tr>. Now you can use C-M-f to move
> it before the <td>. Now you can use C-M-n to move it after the closing
> </td>. Even if the content of <td>...</td> contains tags!
Good alternate approach. If only macros were as fast and responsive as
regexp replace in my configuration…
In my case, nesting is not a concern (as HTML tables almost never nest
except for layouting, and even then it’s evil), so regexps are an
adequate tool.
> See? No regex anywhere. Way cool! Instead, you're exploiting the
> navigation that you get from Emacs modes.
This is way cool indeed, and I am in fact using nxml-mode and its
navigation commands.
However, this line of thought makes me wish for a match/replace
language as concise as regexps and at least as powerful as XSLT :]
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: Regexp: match any character including newline
2013-10-16 14:42 Regexp: match any character including newline Yuri Khan
2013-10-16 15:31 ` Kai Großjohann
@ 2013-10-16 16:53 ` Drew Adams
2013-10-17 2:25 ` Eric Abrahamsen
2 siblings, 0 replies; 8+ messages in thread
From: Drew Adams @ 2013-10-16 16:53 UTC (permalink / raw)
To: Yuri Khan, help-gnu-emacs
> “.” needs to match any character including newline.
> I know the obvious solution: instead of “.”, use the following
> monstrosity:
>
> \(?:.\|
> \)
>
> However, I find that very cumbersome to type, especially since I
> have to press C-q C-j in between.
>
> Is there a way to make “.” match newline too, or is there an easier
> way to match any character including newline?
1. I and others have requested this for vanilla Emacs a few times,
as a user toggle. E.g.:
* http://lists.gnu.org/archive/html/emacs-devel/2006-03/msg00162.html
* http://lists.gnu.org/archive/html/emacs-devel/2006-03/msg00476.html
* http://lists.gnu.org/archive/html/emacs-devel/2006-11/msg01559.html
* http://lists.gnu.org/archive/html/emacs-devel/2006-12/msg00115.html
2. In Icicles at least, you can use `C-M-.' to toggle what `.'
represents in the minibuffer (i.e., for most interactive use). When
`.' matches also a newline, it appears as `.' in the minibuffer, but
the actual regexp used under the covers is "\(.\|[
]\)". (When this is the case, it is also highlighted, so you can tell.)
IOW, when newline is also being matched by `.', this propertized string
is inserted in the minibuffer when you type `.':
#("\\(.\\|[
]\\)" 0 10 (face highlight display "."))
Not the ideal solution (hence the requests cited), but handy enough.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regexp: match any character including newline
2013-10-16 14:42 Regexp: match any character including newline Yuri Khan
2013-10-16 15:31 ` Kai Großjohann
2013-10-16 16:53 ` Drew Adams
@ 2013-10-17 2:25 ` Eric Abrahamsen
2 siblings, 0 replies; 8+ messages in thread
From: Eric Abrahamsen @ 2013-10-17 2:25 UTC (permalink / raw)
To: help-gnu-emacs
Yuri Khan <yuri.v.khan@gmail.com> writes:
> Hello All,
>
> I’m doing regexp replacements on a hard-wrapped XHTML-alike. Here’s an
> original fragment:
>
> ===
> <tr><td><pre><code>X(n, t)
> X a(n, t)</code></pre></td><td></td>
> <td><requires><p><code>T</code> shall be
> <concept>Copy­Insert­able</concept> into
> <code>X</code>.</p></requires>
> <p>post: <code>distance(begin(), end()) == n</code></p>
> <p>Constructs a sequence container with <code>n</code> copies
> of <code>t</code></p></td></tr>
> ===
Another option (though I'm not claiming you'll actually want to do this)
is to use xml.el (comes with emacs?) to parse that xml into a tree, and
then mess with the tree. Parsing the above gets me:
((tr nil (td nil (pre nil (code nil "X(n, t) X a(n, t)"))) (td nil) " "
(td nil (requires nil (p nil (code nil "T") " shall be " (concept nil
"Copy?Insert?able") " into " (code nil "X") ".")) " " (p nil "post: "
(code nil "distance(begin(), end()) == n")) " " (p nil "Constructs a
sequence container with " (code nil "n") " copies of " (code nil
"t")))))
`xml-entity-alist' would have to be tweaked.
Like I said, you probably wouldn't want this, but it's an interesting
option...
E
^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <mailman.4131.1381934579.10748.help-gnu-emacs@gnu.org>]
* Re: Regexp: match any character including newline
[not found] <mailman.4131.1381934579.10748.help-gnu-emacs@gnu.org>
@ 2013-10-16 15:58 ` Rustom Mody
2013-10-16 16:16 ` Yuri Khan
[not found] ` <mailman.4141.1381940186.10748.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 8+ messages in thread
From: Rustom Mody @ 2013-10-16 15:58 UTC (permalink / raw)
To: help-gnu-emacs
On Wednesday, October 16, 2013 8:12:54 PM UTC+5:30, Yuri Khan wrote:
> Hello All,
>
>
> I’m doing regexp replacements on a hard-wrapped XHTML-alike. Here’s an
> original fragment:
Regexp handling of xml is commonly a source of grief.
It is usually better to use a dedicated tool like this
http://www.crummy.com/software/BeautifulSoup/
or (more xmlish than htmlish)
http://lxml.de/
These are python solutions. Im sure there are equivalent ones in other scripting languages of your choice
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regexp: match any character including newline
2013-10-16 15:58 ` Rustom Mody
@ 2013-10-16 16:16 ` Yuri Khan
[not found] ` <mailman.4141.1381940186.10748.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 8+ messages in thread
From: Yuri Khan @ 2013-10-16 16:16 UTC (permalink / raw)
To: Rustom Mody; +Cc: help-gnu-emacs@gnu.org
On Wed, Oct 16, 2013 at 10:58 PM, Rustom Mody <rustompmody@gmail.com> wrote:
> Regexp handling of xml is commonly a source of grief.
Oh, please don’t get me wrong. I know all about the Chomsky hierarchy,
the pumping lemmas, and Tony the Pony[1]. Regexps only cause grief
when they collide with nesting.
[1]: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <mailman.4141.1381940186.10748.help-gnu-emacs@gnu.org>]
* Re: Regexp: match any character including newline
[not found] ` <mailman.4141.1381940186.10748.help-gnu-emacs@gnu.org>
@ 2013-10-16 16:48 ` Rustom Mody
0 siblings, 0 replies; 8+ messages in thread
From: Rustom Mody @ 2013-10-16 16:48 UTC (permalink / raw)
To: help-gnu-emacs
On Wednesday, October 16, 2013 9:46:18 PM UTC+5:30, Yuri Khan wrote:
> On Wed, Oct 16, 2013 at 10:58 PM, Rustom Mody wrote:
>
> > Regexp handling of xml is commonly a source of grief.
>
> Oh, please don’t get me wrong. I know all about the Chomsky hierarchy,
> the pumping lemmas, and Tony the Pony[1]. Regexps only cause grief
> when they collide with nesting.
heh! Enjoy the pony-ride!
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-10-17 2:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-16 14:42 Regexp: match any character including newline Yuri Khan
2013-10-16 15:31 ` Kai Großjohann
2013-10-16 15:56 ` Yuri Khan
2013-10-16 16:53 ` Drew Adams
2013-10-17 2:25 ` Eric Abrahamsen
[not found] <mailman.4131.1381934579.10748.help-gnu-emacs@gnu.org>
2013-10-16 15:58 ` Rustom Mody
2013-10-16 16:16 ` Yuri Khan
[not found] ` <mailman.4141.1381940186.10748.help-gnu-emacs@gnu.org>
2013-10-16 16:48 ` Rustom Mody
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.