Re: Help with regexp

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* Re: Help with regexp
       [not found] <mailman.11978.1259698881.2239.help-gnu-emacs@gnu.org>
@ 2009-12-01 21:07 ` Pascal J. Bourguignon
  2009-12-02  5:16   ` tomas
                     ` (3 more replies)
  2009-12-02 15:37 ` Colin S. Miller
  1 sibling, 4 replies; 20+ messages in thread
From: Pascal J. Bourguignon @ 2009-12-01 21:07 UTC (permalink / raw)
  To: help-gnu-emacs

Xavier Maillard <xma@gnu.org> writes:

> Hi,
>
> I am trying to find the regexp that could match on these:
>
> "=>foo"
> "=>foo:bar"
> "=>foo:bar:baz"
>
> The regexp must match on all of these strings. The string can't
> be deeper than 3 levels (like latest example) and each component
> may have spaces.
>
> How can I match this ?

"=>\\([^:]*\\(\\|:[^:]*\\(\\|:[^:]*\\)\\)\\)"

Accepts also "=>::".  Replace the stars with \\+ if you want to have
at least one character between the colons.

(mapcar (lambda (s)
         (string-match "^=>\\([^:]*\\(\\|:[^:]*\\(\\|:[^:]*\\)\\)\\)$" s))
        '("=>foo"
          "=>foo:bar"
          "=>foo:bar:baz"
          "=>" "=>:" "=>::" 
          "=>This is : also a good : example"
          "=>And this one !@#$^^& also"
          ;; no match:
          "=>:::" "=>a:b:c:d"))
--> (0 0 0 0 0 0 0 0 nil nil)


-- 
__Pascal Bourguignon__


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-01 21:07 ` Help with regexp Pascal J. Bourguignon
@ 2009-12-02  5:16   ` tomas
  2009-12-02  6:03     ` Andreas Politz
       [not found]     ` <mailman.12000.1259733814.2239.help-gnu-emacs@gnu.org>
  2009-12-02  6:36   ` Xavier Maillard
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: tomas @ 2009-12-02  5:16 UTC (permalink / raw)
  To: Pascal J. Bourguignon; +Cc: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Dec 01, 2009 at 10:07:46PM +0100, Pascal J. Bourguignon wrote:
> Xavier Maillard <xma@gnu.org> writes:
> 
> > Hi,
> >
> > I am trying to find the regexp that could match on these:
> >
> > "=>foo"

[...]

> "=>\\([^:]*\\(\\|:[^:]*\\(\\|:[^:]*\\)\\)\\)"

Wow, Pascal. You have taught this old dog a new trick. Up to now, I
hadn't thought of putting an alternative (i.e. "|") at the start of a
parenthesized sub-expression. Thanks :-)

- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFLFfgqBcgs9XrR2kYRArCSAJsF/cNzBW4JxbFhy9RVXchyu0wIdwCfVN48
tHvpGkOyBF8b1HjeOvTXM2Q=
=RIly
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-02  5:16   ` tomas
@ 2009-12-02  6:03     ` Andreas Politz
  2009-12-02  7:11       ` tomas
  2009-12-02  7:18       ` suvayu ali
       [not found]     ` <mailman.12000.1259733814.2239.help-gnu-emacs@gnu.org>
  1 sibling, 2 replies; 20+ messages in thread
From: Andreas Politz @ 2009-12-02  6:03 UTC (permalink / raw)
  To: help-gnu-emacs

tomas@tuxteam.de writes:

> On Tue, Dec 01, 2009 at 10:07:46PM +0100, Pascal J. Bourguignon wrote:
>> Xavier Maillard <xma@gnu.org> writes:
>> 
>> > Hi,
>> >
>> > I am trying to find the regexp that could match on these:
>> >
>> > "=>foo"
>
> [...]
>
>> "=>\\([^:]*\\(\\|:[^:]*\\(\\|:[^:]*\\)\\)\\)"
>
> Wow, Pascal. You have taught this old dog a new trick. Up to now, I
> hadn't thought of putting an alternative (i.e. "|") at the start of a
> parenthesized sub-expression. Thanks :-)
>
> -- tomás

Since this topic is more or less closed (modulo escaped delimiter), we
could use it to discuss the question, why using non-trivial Emacs regexp
makes one feel like Chomsky had just written his influential book on
grammars.

Vim, the other editor, has updated and extended it's regexp engine
frequently in it's version history, while Emacs regexp never seemed to
be on top of antibody's to-do list ?

Things I (won't) miss most:

- extreme backslasheritis
- no short aliases for important constructs :
  digits,symbol-constituents,newline,space
- no zero-width matches ; look(ahead|behind)

-ap






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-02  6:03     ` Andreas Politz
@ 2009-12-02  7:11       ` tomas
  2009-12-02  8:14         ` Andreas Politz
  2009-12-02  7:18       ` suvayu ali
  1 sibling, 1 reply; 20+ messages in thread
From: tomas @ 2009-12-02  7:11 UTC (permalink / raw)
  To: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Dec 02, 2009 at 07:03:02AM +0100, Andreas Politz wrote:
> tomas@tuxteam.de writes:

[...]

> > Wow, Pascal. You have taught this old dog a new trick [...]

[...]

> Since this topic is more or less closed (modulo escaped delimiter), we
> could use it to discuss the question, why using non-trivial Emacs regexp
> makes one feel like Chomsky had just written his influential book on
> grammars.

;-)

> Things I (won't) miss most:
> 
> - extreme backslasheritis

Meaning: backslasheritis of the First Kind (aka |, (, ), {, } not having
special meaning) or backslasheritis of the Second Kind (aka having to
escape backslashes to get them into the string in the first place)?

Mind you, I don't like it either, but any idea I had kills some aspect
of Simplicity we all appreciate in Emacs :-(

> - no short aliases for important constructs :
>   digits,symbol-constituents,newline,space

Well, you always have those pesky [:stuff:] ones. They ain't so tidy,
but once you get used to them they are even more readable (and they
reduce the backslash density considerably).

> - no zero-width matches ; look(ahead|behind)

Hm. To be fair, there are some, among others \b, \B, \<, \> (and the
funky \=, which matches at point). Yor are looking for a general zero
width match?

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFLFhMqBcgs9XrR2kYRAgZ/AJ4iNyq4OAinfojde4FZw1uFBzjNCgCeLw0j
MHOwWQ3FXHWgs5mSB6cNB3s=
=L2T1
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-02  7:11       ` tomas
@ 2009-12-02  8:14         ` Andreas Politz
  2009-12-05  6:46           ` tomas
  0 siblings, 1 reply; 20+ messages in thread
From: Andreas Politz @ 2009-12-02  8:14 UTC (permalink / raw)
  To: help-gnu-emacs

tomas@tuxteam.de writes:

> On Wed, Dec 02, 2009 at 07:03:02AM +0100, Andreas Politz wrote:
>> tomas@tuxteam.de writes:
>
> [...]
>
>> > Wow, Pascal. You have taught this old dog a new trick [...]
>
> [...]
>
>> Since this topic is more or less closed (modulo escaped delimiter), we
>> could use it to discuss the question, why using non-trivial Emacs regexp
>> makes one feel like Chomsky had just written his influential book on
>> grammars.
>
> ;-)
>
>> Things I (won't) miss most:
>> 
>> - extreme backslasheritis
>
> Meaning: backslasheritis of the First Kind (aka |, (, ), {, } not having
> special meaning) or backslasheritis of the Second Kind (aka having to
> escape backslashes to get them into the string in the first place)?
>
> Mind you, I don't like it either, but any idea I had kills some aspect
> of Simplicity we all appreciate in Emacs :-(
>
I am glad there isn't a third kind.  What about the idea of getting rid
of them ?  Some ideas :

- a new family of regexp functions ([+]backwards compatibility)
- a 2nd string syntax w/o escapes ([+]should not need new data-type)
- a flag in the re to signal backslasheritis frailty
  (\v in vim, [-]backwards compatibility)

What kind of simplicity are you referring to ?

>> - no short aliases for important constructs :
>>   digits,symbol-constituents,newline,space
>
> Well, you always have those pesky [:stuff:] ones. They ain't so tidy,
> but once you get used to them they are even more readable (and they
> reduce the backslash density considerably).

That sounds pretty weak.  I would prefer \d over [[:digit:]] and [0-9] .

>
>> - no zero-width matches ; look(ahead|behind)
>
> Hm. To be fair, there are some, among others \b, \B, \<, \> (and the
> funky \=, which matches at point). Yor are looking for a general zero
> width match?

True. No, I was mostly thinking of look ahead/behind kind of context
matching.  To be fair, I already saw a patch on emacs.dev, I believe,
for this.  
>
> Regards
> -- tomás

-ap





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-02  8:14         ` Andreas Politz
@ 2009-12-05  6:46           ` tomas
  0 siblings, 0 replies; 20+ messages in thread
From: tomas @ 2009-12-05  6:46 UTC (permalink / raw)
  To: Andreas Politz; +Cc: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Dec 02, 2009 at 09:14:30AM +0100, Andreas Politz wrote:
> tomas@tuxteam.de writes:
> 
> > On Wed, Dec 02, 2009 at 07:03:02AM +0100, Andreas Politz wrote:

[...]

> > Mind you, I don't like it either, but any idea I had kills some aspect
> > of Simplicity we all appreciate in Emacs :-(
> >
> I am glad there isn't a third kind.  What about the idea of getting rid
> of them ?  Some ideas :
> 
> - a new family of regexp functions ([+]backwards compatibility)

I'm not sure this is needed. The functions themselves look fine to me.

> - a 2nd string syntax w/o escapes ([+]should not need new data-type)

Hm I was thinking about that too. But how would you represent e.g. a tab
in this new syntax?

> - a flag in the re to signal backslasheritis frailty
>   (\v in vim, [-]backwards compatibility)

I see (didn't know about this one).

> What kind of simplicity are you referring to ?

Well -- now, to express a regular expression we just need a normal
string. To do all this magic we'd first have to hook into the reader
before it gets hold of the string. Common Lisp has such a facility (you
can define a new reader syntax, which is very cool), but it pays some
price for this flexibility. Emacs has taken the conscious decision not
to go this route.

> >> - no short aliases for important constructs :
> >>   digits,symbol-constituents,newline,space
> >
> > Well, you always have those pesky [:stuff:] ones. They ain't so tidy,
> > but once you get used to them they are even more readable (and they
> > reduce the backslash density considerably).
> 
> That sounds pretty weak.  I would prefer \d over [[:digit:]] and [0-9] .

I miss that sometimes too :-)

> >> - no zero-width matches ; look(ahead|behind)
> >
> > Hm. To be fair, there are some, among others \b, \B, \<, \> (and the
> > funky \=, which matches at point). Yor are looking for a general zero
> > width match?
> 
> True. No, I was mostly thinking of look ahead/behind kind of context
> matching.  To be fair, I already saw a patch on emacs.dev, I believe,
> for this.  

Well, I see your points, but am torn. Simplicity, you know :-)

(Now having PEGs as an alternative to regexps would be quite exciting :)

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFLGgGwBcgs9XrR2kYRAgNyAJ0WnVxJc94YYyA2gozoGhmKt3mwfwCffApd
ERlCyw8tri/9Qa8BPJZO/8g=
=yyn6
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-02  6:03     ` Andreas Politz
  2009-12-02  7:11       ` tomas
@ 2009-12-02  7:18       ` suvayu ali
  1 sibling, 0 replies; 20+ messages in thread
From: suvayu ali @ 2009-12-02  7:18 UTC (permalink / raw)
  To: Andreas Politz; +Cc: help-gnu-emacs

2009/12/1 Andreas Politz <politza@fh-trier.de>:
> Things I (won't) miss most:
>
> - extreme backslasheritis
> - no short aliases for important constructs :
>  digits,symbol-constituents,newline,space
> - no zero-width matches ; look(ahead|behind)
>

I am completely confounded by regexps, correct me if I am wrong but I
think some of those exists.
e.g. [[:digit:]] for [0-9] or [[:alnum:]] for [0-9a-zA-Z]

-- 
Suvayu

Open source is the future. It sets us free.




^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <mailman.12000.1259733814.2239.help-gnu-emacs@gnu.org>]

* Re: Help with regexp
       [not found]     ` <mailman.12000.1259733814.2239.help-gnu-emacs@gnu.org>
@ 2009-12-02  9:41       ` harven
  2009-12-02 13:31         ` Andreas Politz
       [not found]         ` <mailman.12029.1259760724.2239.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 20+ messages in thread
From: harven @ 2009-12-02  9:41 UTC (permalink / raw)
  To: help-gnu-emacs

Andreas Politz <politza@fh-trier.de> writes:

> Things I (won't) miss most:
>
> - extreme backslasheritis
> - no short aliases for important constructs :
>   digits,symbol-constituents,newline,space

??

\sw  word constituent. Same as \w.
\s_  symbol constituent.
\s-  whitespace character. Same as [[:space:]]

See the wiki for the full list
http://www.emacswiki.org/emacs-en/RegularExpression

In a string you can use \n to match a newline, \t to match a tab. 
That's the reason why you have to use \\ to match a backslash.

You can of course define your own classes using the category mechanism.
And there is a user-friendly syntax with the rx command.

Finally, if you miss perl, just use it. The following command
will search, replace with the perl engine.

(defun my-perl (prefix start end code)
"ask for a perl expression in the minibuffer. Execute with the region as input. 
 By default, the result is put in a separate buffer.
 If an argument is given, replace the region with the output.
 The perl command is executed with the -ln switches."
  (interactive "P\nr\nsPerl : ")  
  (shell-command-on-region start end
      (concat "perl -lne '" code "'") 
      (if prefix '(nil t))))

Examples
List lines in the region that contain the string "string"
M-x my-perl RET print if /string/ RET

Replace in the region all e by E
C-u M-x my-perl RET s/e/E/g;print RET

Count the number of lines in the region
M-x my-perl RET print $. if eof RET

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-02  9:41       ` harven
@ 2009-12-02 13:31         ` Andreas Politz
  2009-12-02 16:05           ` Lennart Borgman
  2009-12-02 17:01           ` Andreas Politz
       [not found]         ` <mailman.12029.1259760724.2239.help-gnu-emacs@gnu.org>
  1 sibling, 2 replies; 20+ messages in thread
From: Andreas Politz @ 2009-12-02 13:31 UTC (permalink / raw)
  To: help-gnu-emacs

harven <harven@free.fr> writes:

> Andreas Politz <politza@fh-trier.de> writes:
>
>
>> Things I (won't) miss most:
>>
>> - extreme backslasheritis
>> - no short aliases for important constructs :
>>   digits,symbol-constituents,newline,space
>
> ??

I should have defined short as a synonym for a 2-character sequence.
The main idea here is conciseness.
>
> \sw  word constituent. Same as \w.
> \s_  symbol constituent.

I guess I was involved with vim for a to long time, where \w matches chars in a
c identifier, my bad.  

> \s-  whitespace character. Same as [[:space:]]
>
> See the wiki for the full list
> http://www.emacswiki.org/emacs-en/RegularExpression
>
> In a string you can use \n to match a newline, \t to match a tab. 
> That's the reason why you have to use \\ to match a backslash.
>
But I can't enter a constant string in the mini-buffer...

> You can of course define your own classes using the category mechanism.
> And there is a user-friendly syntax with the rx command.
>
> Finally, if you miss perl, just use it. The following command
> will search, replace with the perl engine.
>
> (defun my-perl (prefix start end code)
> "ask for a perl expression in the minibuffer. Execute with the region as input. 
>  By default, the result is put in a separate buffer.
>  If an argument is given, replace the region with the output.
>  The perl command is executed with the -ln switches."
>   (interactive "P\nr\nsPerl : ")  
>   (shell-command-on-region start end
>       (concat "perl -lne '" code "'") 
>       (if prefix '(nil t))))
>
> Examples
> List lines in the region that contain the string "string"
> M-x my-perl RET print if /string/ RET
>
> Replace in the region all e by E
> C-u M-x my-perl RET s/e/E/g;print RET
>
> Count the number of lines in the region
> M-x my-perl RET print $. if eof RET

That maybe a good workaround, thanks.

I guess my main complain would be the over-expressiveness.  Be it in the
actual regexp, due to backslashes and most atoms being 3-5 characters
in length.  Or in the replacement, due to missing zero-width matches.

-ap





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-02 13:31         ` Andreas Politz
@ 2009-12-02 16:05           ` Lennart Borgman
  2009-12-02 16:56             ` Andreas Politz
  2009-12-02 17:01           ` Andreas Politz
  1 sibling, 1 reply; 20+ messages in thread
From: Lennart Borgman @ 2009-12-02 16:05 UTC (permalink / raw)
  To: Andreas Politz; +Cc: help-gnu-emacs

> I guess my main complain would be the over-expressiveness.  Be it in the
> actual regexp, due to backslashes and most atoms being 3-5 characters
> in length.  Or in the replacement, due to missing zero-width matches.

And I instead wonder "can't we use rx for input when doing replace too?"...




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-02 16:05           ` Lennart Borgman
@ 2009-12-02 16:56             ` Andreas Politz
  2009-12-02 17:16               ` Lennart Borgman
       [not found]               ` <mailman.12039.1259774199.2239.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 20+ messages in thread
From: Andreas Politz @ 2009-12-02 16:56 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: help-gnu-emacs

Lennart Borgman <lennart.borgman@gmail.com> writes:

>> I guess my main complain would be the over-expressiveness.  Be it in the
>> actual regexp, due to backslashes and most atoms being 3-5 characters
>> in length.  Or in the replacement, due to missing zero-width matches.
>
> And I instead wonder "can't we use rx for input when doing replace too?"...

This reminds me of something else regexp related,  though fixable.
Navigating point inside a regexp is very painfully.

I guess that's enough complains from me for this year.

-ap




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-02 16:56             ` Andreas Politz
@ 2009-12-02 17:16               ` Lennart Borgman
       [not found]               ` <mailman.12039.1259774199.2239.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 20+ messages in thread
From: Lennart Borgman @ 2009-12-02 17:16 UTC (permalink / raw)
  To: Andreas Politz; +Cc: help-gnu-emacs

On Wed, Dec 2, 2009 at 5:56 PM, Andreas Politz <politza@fh-trier.de> wrote:
>
> This reminds me of something else regexp related,  though fixable.
> Navigating point inside a regexp is very painfully.

How does it hurt?




^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <mailman.12039.1259774199.2239.help-gnu-emacs@gnu.org>]

* Re: Help with regexp
       [not found]               ` <mailman.12039.1259774199.2239.help-gnu-emacs@gnu.org>
@ 2009-12-02 20:00                 ` harven
  0 siblings, 0 replies; 20+ messages in thread
From: harven @ 2009-12-02 20:00 UTC (permalink / raw)
  To: help-gnu-emacs

Lennart Borgman <lennart.borgman@gmail.com> writes:

> On Wed, Dec 2, 2009 at 5:56 PM, Andreas Politz <politza@fh-trier.de> wrote:
>>
>> This reminds me of something else regexp related,  though fixable.
>> Navigating point inside a regexp is very painfully.
>
> How does it hurt?

That way (quoting the emacs wiki, ParenthesisMatching).

"...It should be noted that the syntax-table makes all delimiters even.
That means that a beginning parenthesis ( may match a closing bracket ] 
if the delimiters are not balanced as a whole. 
Try C-M-f on the following expression:

     (  [   )  ]

Here is a short piece of Lisp code in which such a situation occurs:

  (while
     (re-search-forward "\\(\\[[0-9]\\),\\([0-9]\\]\\)" nil t)
     (replace-match (concat (match-string 1) "." (match-string 2))))

This code replaces, e.g. [4,5] by [4.5]. Now, in the regular expression
between quotes " ", the first opening bracket [ matches the
first closing parenthesis ), whereas the last opening parenthesis (
matches the last closing bracket ]..."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-02 13:31         ` Andreas Politz
  2009-12-02 16:05           ` Lennart Borgman
@ 2009-12-02 17:01           ` Andreas Politz
  1 sibling, 0 replies; 20+ messages in thread
From: Andreas Politz @ 2009-12-02 17:01 UTC (permalink / raw)
  To: help-gnu-emacs

Andreas Politz <politza@fh-trier.de> writes:

> [..], due to missing zero-width matches.
>

Sorry I was constantly confusing 'zero-width atoms' with the specific
zero-width matcher look-ahead and look-behind.

-ap





^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <mailman.12029.1259760724.2239.help-gnu-emacs@gnu.org>]

* Re: Help with regexp
       [not found]         ` <mailman.12029.1259760724.2239.help-gnu-emacs@gnu.org>
@ 2009-12-02 14:16           ` Pascal J. Bourguignon
  0 siblings, 0 replies; 20+ messages in thread
From: Pascal J. Bourguignon @ 2009-12-02 14:16 UTC (permalink / raw)
  To: help-gnu-emacs

Andreas Politz <politza@fh-trier.de> writes:

> harven <harven@free.fr> writes:
>
>> Andreas Politz <politza@fh-trier.de> writes:
>>
>>
>>> Things I (won't) miss most:
>>>
>>> - extreme backslasheritis
>>> - no short aliases for important constructs :
>>>   digits,symbol-constituents,newline,space
>>
>> ??
>
> I should have defined short as a synonym for a 2-character sequence.
> The main idea here is conciseness.
>>
>> \sw  word constituent. Same as \w.
>> \s_  symbol constituent.
>
> I guess I was involved with vim for a to long time, where \w matches chars in a
> c identifier, my bad.  
>
>> \s-  whitespace character. Same as [[:space:]]
>>
>> See the wiki for the full list
>> http://www.emacswiki.org/emacs-en/RegularExpression
>>
>> In a string you can use \n to match a newline, \t to match a tab. 
>> That's the reason why you have to use \\ to match a backslash.
>>
> But I can't enter a constant string in the mini-buffer...

Of course you can:

(defun test (string)
  (interactive "sPlease enter a string: ")
  (insert string))



>> Count the number of lines in the region
>> M-x my-perl RET print $. if eof RET
>
> That maybe a good workaround, thanks.

It would be nicer to just implement the new regexp syntax in emacs
lisp, translating to the old regexp syntax.


> I guess my main complain would be the over-expressiveness.  Be it in the
> actual regexp, due to backslashes and most atoms being 3-5 characters
> in length.  Or in the replacement, due to missing zero-width matches.

But in any case, it would be better to use sexps to build regexps:

(seq "=>" (rep (comp ":"))
          (alt ""
               (seq ":" (rep (comp ":"))
                    (alt ""
                         (seq ":" (rep (comp ":"))
                              (alt ""
                                   (seq ":" (rep (comp ":")))))))))

--> "=>\\([^:]\\)*\\(\\|:\\([^:]\\)*\\(\\|:\\([^:]\\)*\\(\\|:\\([^:]\\)*\\)\\)\\)"


(defun seq (&rest seq) (apply (function concat) seq))
(defun rep (&rest seq) (format "\\(%s\\)*" (apply (function concat) seq)))
(defun alt (alt &rest rest) 
   (with-output-to-string 
      (princ "\\(")
      (princ alt)
      (dolist (alt rest)
         (princ "\\|")
         (princ alt))
      (princ "\\)")))
(defun comp (&rest chars)  (format "[^%s]" (apply (function concat) chars)))

 
-- 
__Pascal Bourguignon__


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
  2009-12-01 21:07 ` Help with regexp Pascal J. Bourguignon
  2009-12-02  5:16   ` tomas
@ 2009-12-02  6:36   ` Xavier Maillard
       [not found]   ` <mailman.12002.1259735748.2239.help-gnu-emacs@gnu.org>
       [not found]   ` <mailman.11999.1259731365.2239.help-gnu-emacs@gnu.org>
  3 siblings, 0 replies; 20+ messages in thread
From: Xavier Maillard @ 2009-12-02  6:36 UTC (permalink / raw)
  To: Pascal J. Bourguignon; +Cc: help-gnu-emacs

Hi Pascal

   "=>\\([^:]*\\(\\|:[^:]*\\(\\|:[^:]*\\)\\)\\)"

Awesome !

Would you mind explaining it to me step by step in plain english
please ? :)

	Xavier
-- 
http://www.gnu.org
http://www.april.org
http://www.lolica.org




^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <mailman.12002.1259735748.2239.help-gnu-emacs@gnu.org>]

* Re: Help with regexp
       [not found]   ` <mailman.12002.1259735748.2239.help-gnu-emacs@gnu.org>
@ 2009-12-02 11:16     ` Pascal J. Bourguignon
  0 siblings, 0 replies; 20+ messages in thread
From: Pascal J. Bourguignon @ 2009-12-02 11:16 UTC (permalink / raw)
  To: help-gnu-emacs

Xavier Maillard <xma@gnu.org> writes:

> Hi Pascal
>
>    "=>\\([^:]*\\(\\|:[^:]*\\(\\|:[^:]*\\)\\)\\)"
>
> Awesome !
>
> Would you mind explaining it to me step by step in plain english
> please ? :)


(insert   "=>\\([^:]*\\(\\|:[^:]*\\(\\|:[^:]*\\)\\)\\)") C-x C-e
gives:
=>\([^:]*\(\|:[^:]*\(\|:[^:]*\)\)\)

Now parse it:

   =>  \(  [^:]*  \( \| : [^:]* \( \| : [^:]* \) \) \)

   equal superior, 
           followed by any number of non colon, 
                   followed by either nothing or
                        a colon followed by any number of non colon,
                                  followed by either nothing or
                                      a colon followed by any number of non colon.

                             

-- 
__Pascal Bourguignon__


^ permalink raw reply	[flat|nested] 20+ messages in thread

[parent not found: <mailman.11999.1259731365.2239.help-gnu-emacs@gnu.org>]

* Re: Help with regexp
       [not found]   ` <mailman.11999.1259731365.2239.help-gnu-emacs@gnu.org>
@ 2009-12-03  3:48     ` Stefan Monnier
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Monnier @ 2009-12-03  3:48 UTC (permalink / raw)
  To: help-gnu-emacs

>> > Hi,
>> >
>> > I am trying to find the regexp that could match on these:
>> >
>> > "=>foo"

> [...]

>> "=>\\([^:]*\\(\\|:[^:]*\\(\\|:[^:]*\\)\\)\\)"

> Wow, Pascal. You have taught this old dog a new trick. Up to now, I
> hadn't thought of putting an alternative (i.e. "|") at the start of a
> parenthesized sub-expression. Thanks :-)

An equivalent way is to use ? (or ?? if you care about the order in
which it's matched):

  "=>\\([^:]*\\(:[^:]*\\(:[^:]*\\)?\\)?\\)"


        Stefan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Help with regexp
       [not found] <mailman.11978.1259698881.2239.help-gnu-emacs@gnu.org>
  2009-12-01 21:07 ` Help with regexp Pascal J. Bourguignon
@ 2009-12-02 15:37 ` Colin S. Miller
  1 sibling, 0 replies; 20+ messages in thread
From: Colin S. Miller @ 2009-12-02 15:37 UTC (permalink / raw)
  To: help-gnu-emacs

Xavier Maillard wrote:
> Hi,
> 
> I am trying to find the regexp that could match on these:
> 
> "=>foo"
> "=>foo:bar"
> "=>foo:bar:baz"
> 
> The regexp must match on all of these strings. The string can't
> be deeper than 3 levels (like latest example) and each component
> may have spaces.
> 
> How can I match this ?
> 
> Regards
> 
> 	Xavier


If you don't want to pick out the components using \1 etc, then

=>\([a-z ]+:\)\{0,2\}[a-z ]+

should work

HTH,
Colin S. Miller


-- 
Replace the obvious in my email address with the first three letters of the hostname to reply.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Help with regexp
@ 2009-12-01 20:21 Xavier Maillard
  0 siblings, 0 replies; 20+ messages in thread
From: Xavier Maillard @ 2009-12-01 20:21 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

I am trying to find the regexp that could match on these:

"=>foo"
"=>foo:bar"
"=>foo:bar:baz"

The regexp must match on all of these strings. The string can't
be deeper than 3 levels (like latest example) and each component
may have spaces.

How can I match this ?

Regards

	Xavier
-- 
http://www.gnu.org
http://www.april.org
http://www.lolica.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-12-05  6:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.11978.1259698881.2239.help-gnu-emacs@gnu.org>
2009-12-01 21:07 ` Help with regexp Pascal J. Bourguignon
2009-12-02  5:16   ` tomas
2009-12-02  6:03     ` Andreas Politz
2009-12-02  7:11       ` tomas
2009-12-02  8:14         ` Andreas Politz
2009-12-05  6:46           ` tomas
2009-12-02  7:18       ` suvayu ali
     [not found]     ` <mailman.12000.1259733814.2239.help-gnu-emacs@gnu.org>
2009-12-02  9:41       ` harven
2009-12-02 13:31         ` Andreas Politz
2009-12-02 16:05           ` Lennart Borgman
2009-12-02 16:56             ` Andreas Politz
2009-12-02 17:16               ` Lennart Borgman
     [not found]               ` <mailman.12039.1259774199.2239.help-gnu-emacs@gnu.org>
2009-12-02 20:00                 ` harven
2009-12-02 17:01           ` Andreas Politz
     [not found]         ` <mailman.12029.1259760724.2239.help-gnu-emacs@gnu.org>
2009-12-02 14:16           ` Pascal J. Bourguignon
2009-12-02  6:36   ` Xavier Maillard
     [not found]   ` <mailman.12002.1259735748.2239.help-gnu-emacs@gnu.org>
2009-12-02 11:16     ` Pascal J. Bourguignon
     [not found]   ` <mailman.11999.1259731365.2239.help-gnu-emacs@gnu.org>
2009-12-03  3:48     ` Stefan Monnier
2009-12-02 15:37 ` Colin S. Miller
2009-12-01 20:21 Xavier Maillard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).