* cleaning up a big regexp
@ 2014-09-15 11:15 Tory S. Anderson
2014-09-15 11:25 ` Eric Abrahamsen
2014-09-15 14:35 ` Thorsten Jolitz
0 siblings, 2 replies; 11+ messages in thread
From: Tory S. Anderson @ 2014-09-15 11:15 UTC (permalink / raw)
To: emacs list
Using gnus I have a growing regexp that represents the criteria for bulk email and splits accordingly:
(setq my-gnus-bulk-regexp "\\(^From:.*\\(@maillist.codeproject.com\\|@papajohns-specials.com\\|@qomail.quikorder.com\\|@linkedin.com\\|@facebookmail.com\\|@plus.google.com\\|@twitter.com\\|@youtube.com\\|@linguistlist.org\\|sportsauthority.com\\)\\)\\|\\(^To:.*torysanderson@gmail.com\\)")
;; TODO make bulk-list
(setq nnmail-split-methods
'(("mail.bulk" my-gnus-bulk-regexp)
...
Is there a way to clean this up to make it both more readable and more easily editable? It seems like keeping some kind of list would be the way to do it, instead of an ever-lengthening string.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: cleaning up a big regexp
2014-09-15 11:15 cleaning up a big regexp Tory S. Anderson
@ 2014-09-15 11:25 ` Eric Abrahamsen
2014-09-15 14:35 ` Thorsten Jolitz
1 sibling, 0 replies; 11+ messages in thread
From: Eric Abrahamsen @ 2014-09-15 11:25 UTC (permalink / raw)
To: help-gnu-emacs
torys.anderson@gmail.com (Tory S. Anderson) writes:
> Using gnus I have a growing regexp that represents the criteria for bulk email and splits accordingly:
>
> (setq my-gnus-bulk-regexp "\\(^From:.*\\(@maillist.codeproject.com\\|@papajohns-specials.com\\|@qomail.quikorder.com\\|@linkedin.com\\|@facebookmail.com\\|@plus.google.com\\|@twitter.com\\|@youtube.com\\|@linguistlist.org\\|sportsauthority.com\\)\\)\\|\\(^To:.*torysanderson@gmail.com\\)")
>
> ;; TODO make bulk-list
> (setq nnmail-split-methods
> '(("mail.bulk" my-gnus-bulk-regexp)
> ...
>
>
> Is there a way to clean this up to make it both more readable and more easily editable? It seems like keeping some kind of list would be the way to do it, instead of an ever-lengthening string.
Check out the regexp-opt function, which takes an arbitrary number of
strings and gloms them into a big regexp. At least it looks nicer in
your .gnus.el file.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: cleaning up a big regexp
2014-09-15 11:15 cleaning up a big regexp Tory S. Anderson
2014-09-15 11:25 ` Eric Abrahamsen
@ 2014-09-15 14:35 ` Thorsten Jolitz
1 sibling, 0 replies; 11+ messages in thread
From: Thorsten Jolitz @ 2014-09-15 14:35 UTC (permalink / raw)
To: help-gnu-emacs
torys.anderson@gmail.com (Tory S. Anderson) writes:
> Using gnus I have a growing regexp that represents the criteria for bulk email and splits accordingly:
>
> (setq my-gnus-bulk-regexp )
>
> ;; TODO make bulk-list
> (setq nnmail-split-methods
> '(("mail.bulk" my-gnus-bulk-regexp)
> ...
>
>
> Is there a way to clean this up to make it both more readable and more
> easily editable? It seems like keeping some kind of list would be the
> way to do it, instead of an ever-lengthening string.
I have a little library called drx.el on github
(https://github.com/tj64/drx ) which isn't as extreme as rx.el in
avoiding regexp strings, but is pretty flexible when it comes to avoid
all the plumbing/glue-code between the actual regexp strings.
Furthermore it abstracts from 3 common regexp elements: "^", "$" and
"\\*", so it lets you e.g. write a regexp for Org-mode:
#+BEGIN_SRC emacs-lisp :results raw
(require 'drx)
(format "%S"
(drx "foo" t '(t nil (2)) t '(nil t (2))
"bar" "loo"))
#+END_SRC
#+results:
"^\\(\\*\\(\\*\\)\\{2\\}\\)foo\\(bar\\)\\(loo\\)\\{2\\}$"
and reuse it for css mode with different outline stars:
#+BEGIN_SRC emacs-lisp :results raw
(require 'drx)
(let ((drx-BOL "^/\\* ")
(drx-EOL " \\*/$")
(drx-STAR "@"))
(format "%S"
(drx "foo" t '(t nil (2)) t '(nil t (2))
"bar" "loo")))
#+END_SRC
#+results:
"^/\\* \\(@\\(@\\)\\{2\\}\\)foo\\(bar\\)\\(loo\\)\\{2\\} \\*/$"
Here is something similar to your example regexp:
#+BEGIN_SRC emacs-lisp :results raw
(require 'drx)
(format "%S"
(drx
(drx "From:.*" t nil nil 'group
(drx "@maillist.codeproject.com" nil nil nil 'alt
"@papajohns-specials.com"
"@qomail.quikorder.com"
"@linkedin.com"
"@facebookmail.com"
"@plus.google.com"
"@twitter.com"
"@youtube.com"
"@linguistlist.org"
"sportsauthority.com"))
nil nil nil 'alt
(drx "To:.*torysanderson@gmail.com" nil nil nil t)))
#+END_SRC
#+results:
"\\(^\\(From:.*\\(@maillist.codeproject.com\\|@papajohns-specials.com\\|@qomail.quikorder.com\\|@linkedin.com\\|@facebookmail.com\\|@plus.google.com\\|@twitter.com\\|@youtube.com\\|@linguistlist.org\\|sportsauthority.com\\)\\)\\|\\(To:.*torysanderson@gmail.com\\)\\)"
--
cheers,
Thorsten
^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <mailman.8887.1410779723.1147.help-gnu-emacs@gnu.org>]
* Re: cleaning up a big regexp
[not found] <mailman.8887.1410779723.1147.help-gnu-emacs@gnu.org>
@ 2014-09-15 11:18 ` Joost Kremers
2014-09-15 11:26 ` Tory S. Anderson
2014-09-15 11:41 ` Michael Albinus
0 siblings, 2 replies; 11+ messages in thread
From: Joost Kremers @ 2014-09-15 11:18 UTC (permalink / raw)
To: help-gnu-emacs
Tory S. Anderson wrote:
> Using gnus I have a growing regexp that represents the criteria for bulk email and splits accordingly:
[...]
> Is there a way to clean this up to make it both more readable and more
> easily editable? It seems like keeping some kind of list would be the
> way to do it, instead of an ever-lengthening string.
There's the function `regexp-opt', which takes a list of strings and
returns a regular expression that will match any of those strings.
Perhaps you can use that?
--
Joost Kremers joostkremers@fastmail.fm
Selbst in die Unterwelt dringt durch Spalten Licht
EN:SiS(9)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: cleaning up a big regexp
2014-09-15 11:18 ` Joost Kremers
@ 2014-09-15 11:26 ` Tory S. Anderson
2014-09-15 11:41 ` Michael Albinus
1 sibling, 0 replies; 11+ messages in thread
From: Tory S. Anderson @ 2014-09-15 11:26 UTC (permalink / raw)
To: help-gnu-emacs
That command would shrink the regexp and make it more efficient, but my goal here is to be able to more easily read and append to it, rather than optimize it. I would like to be able to have a list, maybe something like:
("^From:.*" ((".*@bulk1.com")
(".*@bulk2.com")
(".*@bulk3.com")))
I'm new enough to [e]lisp that I'm not sure what list->string concatenation functions would do the trick here.
Joost Kremers <joost.m.kremers@gmail.com> writes:
> Tory S. Anderson wrote:
>> Using gnus I have a growing regexp that represents the criteria for bulk email and splits accordingly:
>
> [...]
>
>> Is there a way to clean this up to make it both more readable and more
>> easily editable? It seems like keeping some kind of list would be the
>> way to do it, instead of an ever-lengthening string.
>
> There's the function `regexp-opt', which takes a list of strings and
> returns a regular expression that will match any of those strings.
> Perhaps you can use that?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: cleaning up a big regexp
2014-09-15 11:18 ` Joost Kremers
2014-09-15 11:26 ` Tory S. Anderson
@ 2014-09-15 11:41 ` Michael Albinus
2014-09-15 12:06 ` Tory S. Anderson
1 sibling, 1 reply; 11+ messages in thread
From: Michael Albinus @ 2014-09-15 11:41 UTC (permalink / raw)
To: help-gnu-emacs
Joost Kremers <joost.m.kremers@gmail.com> writes:
> There's the function `regexp-opt', which takes a list of strings and
> returns a regular expression that will match any of those strings.
> Perhaps you can use that?
regexp-opt cannot handle meta characters like "*". The OP showed such a regexp.
Maybe somthing like this works (untested):
(setq my-gnus-bulk-regexp
(concat
"^\\(From:.*@"
(regexp-opt
'("maillist.codeproject.com"
"papajohns-specials.com"
"qomail.quikorder.com"
"linkedin.com"
"facebookmail.com"
"plus.google.com"
"twitter.com"
"youtube.com"
"linguistlist.org"
"sportsauthority.com")
'par)
"\\)\\|\\(To:.*torysanderson@gmail.com\\)"))
Best regards, Michael.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: cleaning up a big regexp
2014-09-15 11:41 ` Michael Albinus
@ 2014-09-15 12:06 ` Tory S. Anderson
2014-09-15 12:18 ` Michael Albinus
2014-09-15 12:36 ` Stefan Monnier
0 siblings, 2 replies; 11+ messages in thread
From: Tory S. Anderson @ 2014-09-15 12:06 UTC (permalink / raw)
To: Michael Albinus; +Cc: help-gnu-emacs
Ok. Applying the advice I've received so far, I have the following (which doesn't quite evaluate). Clearly my syntax is wrong.
(setq my-gnus-bulk-from-address-list '("@maillist.codeproject.com"
"@papajohns-specials.com"
"@qomail.quikorder.com"
"@linkedin.com"
"@facebookmail.com"
"@plus.google.com"
"@twitter.com"
"@youtube.com"
"@linguistlist.org"
"sportsauthority.com")) ;; list of bulkmail addresses
(setq my-gnus-bulk-from-address-regexp (mapconcat `regexp-quote my-gnus-bulk-from-address-list "\\|")) ;; make OR
(setq my-gnus-bulk-from-regexp
(regexp-opt (mapconcat `(concat "^From:.*") my-gnus-bulk-from-address-regexp "\\|"))) ;; apply "From.*" to the start of each address
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: cleaning up a big regexp
2014-09-15 12:06 ` Tory S. Anderson
@ 2014-09-15 12:18 ` Michael Albinus
2014-09-15 12:36 ` Stefan Monnier
1 sibling, 0 replies; 11+ messages in thread
From: Michael Albinus @ 2014-09-15 12:18 UTC (permalink / raw)
To: Tory S. Anderson; +Cc: help-gnu-emacs
torys.anderson@gmail.com (Tory S. Anderson) writes:
> (setq my-gnus-bulk-from-address-regexp (mapconcat `regexp-quote my-gnus-bulk-from-address-list "\\|")) ;; make OR
Use a single apostroph "'" here.
> (setq my-gnus-bulk-from-regexp
> (regexp-opt (mapconcat `(concat "^From:.*") my-gnus-bulk-from-address-regexp "\\|"))) ;; apply "From.*" to the start of each address
That's wrong. You `concat' just one element, not needed. `regexp-opt'
cannot be applied over meta characters, like "*". Even if you quote
them, it doesn't work.
Best regards, Michael.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: cleaning up a big regexp
2014-09-15 12:06 ` Tory S. Anderson
2014-09-15 12:18 ` Michael Albinus
@ 2014-09-15 12:36 ` Stefan Monnier
2014-09-15 14:11 ` Michael Albinus
1 sibling, 1 reply; 11+ messages in thread
From: Stefan Monnier @ 2014-09-15 12:36 UTC (permalink / raw)
To: help-gnu-emacs
> (mapconcat `regexp-quote my-gnus-bulk-from-address-list "\\|"))
A.k.a (regexp-opt my-gnus-bulk-from-address-list)
-- Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: cleaning up a big regexp
2014-09-15 12:36 ` Stefan Monnier
@ 2014-09-15 14:11 ` Michael Albinus
2014-09-15 19:00 ` Stefan Monnier
0 siblings, 1 reply; 11+ messages in thread
From: Michael Albinus @ 2014-09-15 14:11 UTC (permalink / raw)
To: Stefan Monnier; +Cc: help-gnu-emacs
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> (mapconcat `regexp-quote my-gnus-bulk-from-address-list "\\|"))
>
> A.k.a (regexp-opt my-gnus-bulk-from-address-list)
Not when the strings contain meta characters like "*". I repeat it again
and again, because I've entered this trap very recently. See commit
trunk r117880, where I refuse to tell that I have been an idiot.
> -- Stefan
Best regards, Michael.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-09-15 19:00 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-15 11:15 cleaning up a big regexp Tory S. Anderson
2014-09-15 11:25 ` Eric Abrahamsen
2014-09-15 14:35 ` Thorsten Jolitz
[not found] <mailman.8887.1410779723.1147.help-gnu-emacs@gnu.org>
2014-09-15 11:18 ` Joost Kremers
2014-09-15 11:26 ` Tory S. Anderson
2014-09-15 11:41 ` Michael Albinus
2014-09-15 12:06 ` Tory S. Anderson
2014-09-15 12:18 ` Michael Albinus
2014-09-15 12:36 ` Stefan Monnier
2014-09-15 14:11 ` Michael Albinus
2014-09-15 19:00 ` Stefan Monnier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).