* regexp newline issue
@ 2005-11-28 1:03 Christopher C. Stacy
2005-11-28 4:44 ` John Paul Wallington
2005-11-29 18:23 ` Kevin Rodgers
0 siblings, 2 replies; 5+ messages in thread
From: Christopher C. Stacy @ 2005-11-28 1:03 UTC (permalink / raw)
I want to search for a string like "foo bar",
but there might be whitespace (even newlines)
between any of the characters.
Given the string "foo bar", I tried constructing a regexp by
(replace-regexp-in-string "\\(.\\)"
"[\040\011\012\015]*\\1"
"foo bar")
but that doesn't quite do it.
By the way, I originally tried whitespace syntax "\\s-"
rather than character alternatives, but that seemed to
work even less well. Also, I am using an older emacs
that does not seem to have the named character classes.
In particular, the above [tab,LF,CR] hack finds #1 and #3,
but not #2, in the following examples:
----------
1. foo bar
2. foo
bar
-or-
3. fo
o ba
r
----------
Whenever regexps involve newlines, I get confused.
(OK, I admit I just get confused about almost all regexps!)
TIA for your help!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: regexp newline issue
2005-11-28 1:03 regexp newline issue Christopher C. Stacy
@ 2005-11-28 4:44 ` John Paul Wallington
2005-11-28 5:22 ` John Paul Wallington
[not found] ` <mailman.17016.1133155392.20277.help-gnu-emacs@gnu.org>
2005-11-29 18:23 ` Kevin Rodgers
1 sibling, 2 replies; 5+ messages in thread
From: John Paul Wallington @ 2005-11-28 4:44 UTC (permalink / raw)
cstacy@news.dtpq.com (Christopher C. Stacy) writes:
> I want to search for a string like "foo bar",
> but there might be whitespace (even newlines)
> between any of the characters.
>
> Given the string "foo bar", I tried constructing a regexp by
>
> (replace-regexp-in-string "\\(.\\)"
> "[\040\011\012\015]*\\1"
> "foo bar")
> but that doesn't quite do it.
I think you wanna zap all the whitespace chars from your "foo bar"
string first (eg: so that it's "foobar") before doing the
`replace-regexp-in-string' hack on it.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: regexp newline issue
2005-11-28 4:44 ` John Paul Wallington
@ 2005-11-28 5:22 ` John Paul Wallington
[not found] ` <mailman.17016.1133155392.20277.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 5+ messages in thread
From: John Paul Wallington @ 2005-11-28 5:22 UTC (permalink / raw)
I wrote:
>> I want to search for a string like "foo bar",
>> but there might be whitespace (even newlines)
>> between any of the characters.
>>
>> Given the string "foo bar", I tried constructing a regexp by
>>
>> (replace-regexp-in-string "\\(.\\)"
>> "[\040\011\012\015]*\\1"
>> "foo bar")
>> but that doesn't quite do it.
>
> I think you wanna zap all the whitespace chars from your "foo bar"
> string first (eg: so that it's "foobar") before doing the
> `replace-regexp-in-string' hack on it.
Hm. If you do want to match on the whitespace within the string
then maybe something like this would do:
(defun cons-up-regexp (string)
(mapconcat (lambda (char)
(if (memq char '(32 9 10 13))
"[\040\011\012\015]"
(format "[\040\011\012\015]*%c" char)))
string nil))
^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <mailman.17016.1133155392.20277.help-gnu-emacs@gnu.org>]
* Re: regexp newline issue
2005-11-28 1:03 regexp newline issue Christopher C. Stacy
2005-11-28 4:44 ` John Paul Wallington
@ 2005-11-29 18:23 ` Kevin Rodgers
1 sibling, 0 replies; 5+ messages in thread
From: Kevin Rodgers @ 2005-11-29 18:23 UTC (permalink / raw)
Christopher C. Stacy wrote:
> I want to search for a string like "foo bar",
> but there might be whitespace (even newlines)
> between any of the characters.
>
> Given the string "foo bar", I tried constructing a regexp by
>
> (replace-regexp-in-string "\\(.\\)"
> "[\040\011\012\015]*\\1"
> "foo bar")
>
> but that doesn't quite do it.
>
> By the way, I originally tried whitespace syntax "\\s-"
> rather than character alternatives, but that seemed to
> work even less well. Also, I am using an older emacs
> that does not seem to have the named character classes.
The whitespace syntax class depends on the current syntax table, which
is specific to the buffer's major mode (and can even be overridden by
text properties in the buffer).
BTW, I think this is more readable than the octal notation: "[ \t\n\r]"
> In particular, the above [tab,LF,CR] hack finds #1 and #3,
> but not #2, in the following examples:
>
> ----------
>
> 1. foo bar
>
> 2. foo
> bar
>
> -or-
>
> 3. fo
> o ba
> r
> ----------
That's because the regexp you generate requires each of the characters
in the string to be present, including the space between the second "o"
and "b". As John Paul Wellington showed, you should ignore the
whitespace characters in the string when generating the regexp.
--
Kevin Rodgers
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-12-01 4:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-28 1:03 regexp newline issue Christopher C. Stacy
2005-11-28 4:44 ` John Paul Wallington
2005-11-28 5:22 ` John Paul Wallington
[not found] ` <mailman.17016.1133155392.20277.help-gnu-emacs@gnu.org>
2005-12-01 4:38 ` Stefan Monnier
2005-11-29 18:23 ` Kevin Rodgers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).