all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* regexp newline issue
@ 2005-11-28  1:03 Christopher C. Stacy
  2005-11-28  4:44 ` John Paul Wallington
  2005-11-29 18:23 ` Kevin Rodgers
  0 siblings, 2 replies; 5+ messages in thread
From: Christopher C. Stacy @ 2005-11-28  1:03 UTC (permalink / raw)


I want to search for a string like "foo bar",
but there might be whitespace (even newlines)
between any of the characters.

Given the string "foo bar", I tried constructing a regexp by

(replace-regexp-in-string "\\(.\\)"
                            "[\040\011\012\015]*\\1" 
                            "foo bar")

but that doesn't quite do it.

By the way, I originally tried whitespace syntax "\\s-" 
rather than character alternatives, but that seemed to
work even less well.  Also, I am using an older emacs
that does not seem to have the named character classes.

In particular, the above [tab,LF,CR] hack finds #1 and #3,
but not #2, in the following examples:

----------        

1. foo bar

2. foo
bar

-or-

3. fo 
o ba
r
----------

Whenever regexps involve newlines, I get confused.
(OK, I admit I just get confused about almost all regexps!)

TIA for your help!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regexp newline issue
  2005-11-28  1:03 regexp newline issue Christopher C. Stacy
@ 2005-11-28  4:44 ` John Paul Wallington
  2005-11-28  5:22   ` John Paul Wallington
       [not found]   ` <mailman.17016.1133155392.20277.help-gnu-emacs@gnu.org>
  2005-11-29 18:23 ` Kevin Rodgers
  1 sibling, 2 replies; 5+ messages in thread
From: John Paul Wallington @ 2005-11-28  4:44 UTC (permalink / raw)


cstacy@news.dtpq.com (Christopher C. Stacy) writes:

> I want to search for a string like "foo bar",
> but there might be whitespace (even newlines)
> between any of the characters.
>
> Given the string "foo bar", I tried constructing a regexp by
>
> (replace-regexp-in-string "\\(.\\)"
>                             "[\040\011\012\015]*\\1" 
>                             "foo bar")
> but that doesn't quite do it.

I think you wanna zap all the whitespace chars from your "foo bar"
string first (eg: so that it's "foobar") before doing the
`replace-regexp-in-string' hack on it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regexp newline issue
  2005-11-28  4:44 ` John Paul Wallington
@ 2005-11-28  5:22   ` John Paul Wallington
       [not found]   ` <mailman.17016.1133155392.20277.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 5+ messages in thread
From: John Paul Wallington @ 2005-11-28  5:22 UTC (permalink / raw)


I wrote:

>> I want to search for a string like "foo bar",
>> but there might be whitespace (even newlines)
>> between any of the characters.
>>
>> Given the string "foo bar", I tried constructing a regexp by
>>
>> (replace-regexp-in-string "\\(.\\)"
>>                             "[\040\011\012\015]*\\1" 
>>                             "foo bar")
>> but that doesn't quite do it.
>
> I think you wanna zap all the whitespace chars from your "foo bar"
> string first (eg: so that it's "foobar") before doing the
> `replace-regexp-in-string' hack on it.

Hm.  If you do want to match on the whitespace within the string
then maybe something like this would do:

(defun cons-up-regexp (string)
  (mapconcat (lambda (char)
               (if (memq char '(32 9 10 13))
                   "[\040\011\012\015]"
                 (format "[\040\011\012\015]*%c" char)))
             string nil))

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regexp newline issue
  2005-11-28  1:03 regexp newline issue Christopher C. Stacy
  2005-11-28  4:44 ` John Paul Wallington
@ 2005-11-29 18:23 ` Kevin Rodgers
  1 sibling, 0 replies; 5+ messages in thread
From: Kevin Rodgers @ 2005-11-29 18:23 UTC (permalink / raw)


Christopher C. Stacy wrote:
 > I want to search for a string like "foo bar",
 > but there might be whitespace (even newlines)
 > between any of the characters.
 >
 > Given the string "foo bar", I tried constructing a regexp by
 >
 > (replace-regexp-in-string "\\(.\\)"
 >                             "[\040\011\012\015]*\\1"
 >                             "foo bar")
 >
 > but that doesn't quite do it.
 >
 > By the way, I originally tried whitespace syntax "\\s-"
 > rather than character alternatives, but that seemed to
 > work even less well.  Also, I am using an older emacs
 > that does not seem to have the named character classes.

The whitespace syntax class depends on the current syntax table, which
is specific to the buffer's major mode (and can even be overridden by
text properties in the buffer).

BTW, I think this is more readable than the octal notation: "[ \t\n\r]"

 > In particular, the above [tab,LF,CR] hack finds #1 and #3,
 > but not #2, in the following examples:
 >
 > ----------
 >
 > 1. foo bar
 >
 > 2. foo
 > bar
 >
 > -or-
 >
 > 3. fo
 > o ba
 > r
 > ----------

That's because the regexp you generate requires each of the characters
in the string to be present, including the space between the second "o"
and "b".  As John Paul Wellington showed, you should ignore the
whitespace characters in the string when generating the regexp.

-- 
Kevin Rodgers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regexp newline issue
       [not found]   ` <mailman.17016.1133155392.20277.help-gnu-emacs@gnu.org>
@ 2005-12-01  4:38     ` Stefan Monnier
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Monnier @ 2005-12-01  4:38 UTC (permalink / raw)


>   (mapconcat (lambda (char)
>                (if (memq char '(32 9 10 13))
>                    "[\040\011\012\015]"
>                  (format "[\040\011\012\015]*%c" char)))
>              string nil))

Other than to force the reader to try and remember those ASCII codes, is
there any reason not use:

  (mapconcat (lambda (char)
               (if (memq char '(?\t ?\n ?\r ?\ ))
                   "[\t\n\r ]"
                 (format "[\t\n\r ]*%c" char)))
             string nil))


-- Stefan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-12-01  4:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-28  1:03 regexp newline issue Christopher C. Stacy
2005-11-28  4:44 ` John Paul Wallington
2005-11-28  5:22   ` John Paul Wallington
     [not found]   ` <mailman.17016.1133155392.20277.help-gnu-emacs@gnu.org>
2005-12-01  4:38     ` Stefan Monnier
2005-11-29 18:23 ` Kevin Rodgers

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.