unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Read/process mbox file in Gnus
@ 2020-11-05 16:58 Skip Montanaro
  2020-11-06  1:45 ` Eric Abrahamsen
  0 siblings, 1 reply; 12+ messages in thread
From: Skip Montanaro @ 2020-11-05 16:58 UTC (permalink / raw)
  To: Help GNU Emacs

I recently switched to Manjaro from Ubuntu and decided to (finally) give
Gnus a try (VM no longer seems to be actively maintained and I'd prefer not
to install it from source different than everything else - which comes via
elpa/melpa). Up 'til now, I've used VM to do the mail processing for my
spam work related to mail.python.org. The workflow goes something like this:

   1. Assemble an mbox file of unsure messages on mail.python.org (mpo)
   2. Download it to my laptop
   3. Process it, saving some messages as spam, others as ham, and
   discarding the rest
   4. Upload the new spam and ham mbox files to mpo
   5. Retrain the system with the new messages

I have been using VM for step three. I'd like to do that with Gnus. Looking
through the manual, skimming Google search output and skimming an apropos
list of gnus commands I didn't see an obvious way to read a Unix mbox file.
I'm sure I must be missing something basic. Pointers appreciated.

Skip


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-05 16:58 Read/process mbox file in Gnus Skip Montanaro
@ 2020-11-06  1:45 ` Eric Abrahamsen
  2020-11-06 10:44   ` Robert Pluim
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Abrahamsen @ 2020-11-06  1:45 UTC (permalink / raw)
  To: Skip Montanaro; +Cc: Help GNU Emacs

Skip Montanaro <skip.montanaro@gmail.com> writes:

> I recently switched to Manjaro from Ubuntu and decided to (finally) give
> Gnus a try (VM no longer seems to be actively maintained and I'd prefer not
> to install it from source different than everything else - which comes via
> elpa/melpa). Up 'til now, I've used VM to do the mail processing for my
> spam work related to mail.python.org. The workflow goes something like this:
>
>    1. Assemble an mbox file of unsure messages on mail.python.org (mpo)
>    2. Download it to my laptop
>    3. Process it, saving some messages as spam, others as ham, and
>    discarding the rest
>    4. Upload the new spam and ham mbox files to mpo
>    5. Retrain the system with the new messages
>
> I have been using VM for step three. I'd like to do that with Gnus. Looking
> through the manual, skimming Google search output and skimming an apropos
> list of gnus commands I didn't see an obvious way to read a Unix mbox file.
> I'm sure I must be missing something basic. Pointers appreciated.

The easiest thing to do will be to follow the workflow here:

(gnus) Incorporating Old Mail

That requires you to manually mark and respool all the messages in the
resulting group, but maybe that's actually what you want -- to
explicitly deal with each message.

I'd be curious to know if Gnus+tramp can handle a remote filename
transparently here -- you might be able to skip step 2.

The only other solution that I would consider "normal" here is to have
the message dump (whether remote or local) set up as a "mail source" in
Gnus ("Mail Sources" in the manual). Gnus then knows that it should
regularly check that source for new messages to fetch. You could either
use mail splitting to do a bit of automatic pre-processing, or just
specify that all incoming messages should go to a single group.

mbox doesn't seem to be one of the supported mail source formats,
though. maildir is, and I also wonder if the "directory" specifier might
work. Maybe you can stick the mbox file in a particular directory, then
set the directory specifier's :path keyword to that directory, and the
:suffix keyword to ".mbox".

Someone else might have tried this before?

HTH,
Eric



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-06  1:45 ` Eric Abrahamsen
@ 2020-11-06 10:44   ` Robert Pluim
  2020-11-06 12:19     ` Eric S Fraga
  2020-11-06 15:34     ` Colin Baxter
  0 siblings, 2 replies; 12+ messages in thread
From: Robert Pluim @ 2020-11-06 10:44 UTC (permalink / raw)
  To: Eric Abrahamsen; +Cc: Skip Montanaro, Help GNU Emacs

Eric Abrahamsen <eric@ericabrahamsen.net> writes:

> Skip Montanaro <skip.montanaro@gmail.com> writes:

>> through the manual, skimming Google search output and skimming an apropos
>> list of gnus commands I didn't see an obvious way to read a Unix mbox file.
>> I'm sure I must be missing something basic. Pointers appreciated.
>

'G f' (gnus-group-make-doc-group) will happily create a Gnus group
from an mbox file

> The easiest thing to do will be to follow the workflow here:
>
> (gnus) Incorporating Old Mail
>
> That requires you to manually mark and respool all the messages in the
> resulting group, but maybe that's actually what you want -- to
> explicitly deal with each message.
>
> I'd be curious to know if Gnus+tramp can handle a remote filename
> transparently here -- you might be able to skip step 2.
>
> The only other solution that I would consider "normal" here is to have
> the message dump (whether remote or local) set up as a "mail source" in
> Gnus ("Mail Sources" in the manual). Gnus then knows that it should
> regularly check that source for new messages to fetch. You could either
> use mail splitting to do a bit of automatic pre-processing, or just
> specify that all incoming messages should go to a single group.
>
> mbox doesn't seem to be one of the supported mail source formats,
> though. maildir is, and I also wonder if the "directory" specifier might
> work. Maybe you can stick the mbox file in a particular directory, then
> set the directory specifier's :path keyword to that directory, and the
> :suffix keyword to ".mbox".

Iʼm told that Gnus' maildir support is not great (Iʼve never tried it
myself)

Robert



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-06 10:44   ` Robert Pluim
@ 2020-11-06 12:19     ` Eric S Fraga
  2020-11-06 13:09       ` Skip Montanaro
  2020-11-06 15:34     ` Colin Baxter
  1 sibling, 1 reply; 12+ messages in thread
From: Eric S Fraga @ 2020-11-06 12:19 UTC (permalink / raw)
  To: help-gnu-emacs

On Friday,  6 Nov 2020 at 11:44, Robert Pluim wrote:
> Iʼm told that Gnus' maildir support is not great (Iʼve never tried it
> myself)

In some cases, gnus nnmaildir requires O(n^3) or even O(n^4) time in
visiting a maildir group where n is the number of messages.  Anything
over a few hundred and it can take minutes to enter the
group.  Something to do with sequencing the emails; IIRC it uses time
stamps to determine whether to re-sequence or not.

I really like maildir as I read my emails on several different
systems.  I use unison to keep everything in sync.  Unfortunately, this
triggers the above problem.

So I switched to nnml, because of the above, with splitting and careful
naming of groups to include system names, then using virtual groups to
put all together.  Sounds messy but it works.

(my groups have 10s of thousands of emails)

-- 
Eric S Fraga via Emacs 28.0.50 & org 9.4 on Debian bullseye/sid




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-06 12:19     ` Eric S Fraga
@ 2020-11-06 13:09       ` Skip Montanaro
  2020-11-06 13:27         ` Eric S Fraga
  2020-11-06 18:40         ` Skip Montanaro
  0 siblings, 2 replies; 12+ messages in thread
From: Skip Montanaro @ 2020-11-06 13:09 UTC (permalink / raw)
  To: Help GNU Emacs

Thanks for the input folks. Seems like I should bite the bullet and
reinstall VM.

Skip


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-06 13:09       ` Skip Montanaro
@ 2020-11-06 13:27         ` Eric S Fraga
  2020-11-06 16:27           ` Eric Abrahamsen
  2020-11-06 18:40         ` Skip Montanaro
  1 sibling, 1 reply; 12+ messages in thread
From: Eric S Fraga @ 2020-11-06 13:27 UTC (permalink / raw)
  To: help-gnu-emacs

On Friday,  6 Nov 2020 at 07:09, Skip Montanaro wrote:
> Thanks for the input folks. Seems like I should bite the bullet and
> reinstall VM.

Gnus does understand mbox format files.  I have the following snippet in
my gnus configuration:

#+begin_src emacs-lisp
  (setq
   mail-sources '((file :path "/var/mail/ucecesf")
                  (file :path "/home/ucecesf/mbox")))
#+end_src 

and the emails are picked up from there perfectly fine.  Whether this is
sufficient for your actual use case, is another story.

-- 
Eric S Fraga via Emacs 28.0.50 & org 9.4 on Debian bullseye/sid




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-06 10:44   ` Robert Pluim
  2020-11-06 12:19     ` Eric S Fraga
@ 2020-11-06 15:34     ` Colin Baxter
  2020-11-06 16:22       ` Robert Pluim
  1 sibling, 1 reply; 12+ messages in thread
From: Colin Baxter @ 2020-11-06 15:34 UTC (permalink / raw)
  To: help-gnu-emacs

Dear Robert,

Sorry for butting in.
>>>>> Robert Pluim <rpluim@gmail.com> writes:

    > 'G f' (gnus-group-make-doc-group) will happily create a Gnus group
    > from an mbox file

This works well, but what I've never been able to do is to edit
(i.e. shorten) the long generated name of the group. I have tried Gp and
GE but to no avail. I would like to get 'nndoc+RMAIL:RMAIL' from
'nndoc+/home/<USER>/path/to/rmail:RMAIL' How can this be done?

Best wishes,

Colin Baxter.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-06 15:34     ` Colin Baxter
@ 2020-11-06 16:22       ` Robert Pluim
  2020-11-06 18:03         ` Colin Baxter
  0 siblings, 1 reply; 12+ messages in thread
From: Robert Pluim @ 2020-11-06 16:22 UTC (permalink / raw)
  To: Colin Baxter; +Cc: help-gnu-emacs

Colin Baxter <m43cap@yandex.com> writes:

> Dear Robert,
>
> Sorry for butting in.
>>>>>> Robert Pluim <rpluim@gmail.com> writes:
>
>     > 'G f' (gnus-group-make-doc-group) will happily create a Gnus group
>     > from an mbox file
>
> This works well, but what I've never been able to do is to edit
> (i.e. shorten) the long generated name of the group. I have tried Gp and
> GE but to no avail. I would like to get 'nndoc+RMAIL:RMAIL' from
> 'nndoc+/home/<USER>/path/to/rmail:RMAIL' How can this be done?

'G e' (gnus-group-edit-group-method) and edit the string that appears
just after 'nndoc'

Robert



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-06 13:27         ` Eric S Fraga
@ 2020-11-06 16:27           ` Eric Abrahamsen
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Abrahamsen @ 2020-11-06 16:27 UTC (permalink / raw)
  To: help-gnu-emacs

Eric S Fraga <e.fraga@ucl.ac.uk> writes:

> On Friday,  6 Nov 2020 at 07:09, Skip Montanaro wrote:
>> Thanks for the input folks. Seems like I should bite the bullet and
>> reinstall VM.
>
> Gnus does understand mbox format files.  I have the following snippet in
> my gnus configuration:
>
> #+begin_src emacs-lisp
>   (setq
>    mail-sources '((file :path "/var/mail/ucecesf")
>                   (file :path "/home/ucecesf/mbox")))
> #+end_src 
>
> and the emails are picked up from there perfectly fine.  Whether this is
> sufficient for your actual use case, is another story.

Oh, well that answers my earlier question -- both the nndoc route and
the mail-sources route would serve Skip's purposes, since both can read
mbox. It's just a question of whether the process is very occasional and
very manual (use nndoc) or regular and automatable (use mail-sources).

maildir wouldn't need to come into it.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-06 16:22       ` Robert Pluim
@ 2020-11-06 18:03         ` Colin Baxter
  0 siblings, 0 replies; 12+ messages in thread
From: Colin Baxter @ 2020-11-06 18:03 UTC (permalink / raw)
  To: help-gnu-emacs

>>>>> Robert Pluim <rpluim@gmail.com> writes:

    > Colin Baxter <m43cap@yandex.com> writes:
    >> Dear Robert,
    >> 
    >> Sorry for butting in.
    >>>>>>> Robert Pluim <rpluim@gmail.com> writes:
    >> 
    >> > 'G f' (gnus-group-make-doc-group) will happily create a Gnus
    >> group > from an mbox file
    >> 
    >> This works well, but what I've never been able to do is to edit
    >> (i.e. shorten) the long generated name of the group. I have tried
    >> Gp and GE but to no avail. I would like to get
    >> 'nndoc+RMAIL:RMAIL' from 'nndoc+/home/<USER>/path/to/rmail:RMAIL'
    >> How can this be done?

    > 'G e' (gnus-group-edit-group-method) and edit the string that
    > appears just after 'nndoc'

    > Robert

Great. Thanks you very much for this.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-06 13:09       ` Skip Montanaro
  2020-11-06 13:27         ` Eric S Fraga
@ 2020-11-06 18:40         ` Skip Montanaro
  2020-11-09 13:40           ` Eric S Fraga
  1 sibling, 1 reply; 12+ messages in thread
From: Skip Montanaro @ 2020-11-06 18:40 UTC (permalink / raw)
  To: Help GNU Emacs

I was out riding around (new bike day <https://flic.kr/p/2k3WWdr> 😀) and
it occurred to me that the way I am using mbox files is just as a
transmission tool. All mail readers save some detail about the current
state of affairs (read, deleted, message order, etc). I think most probably
keep this information out-of-band somewhere, in something akin to a
database. VM keeps that information in the mbox file in the form of a bunch
of X-VM-whatever headers. This works perfectly for how I consume these
messages. My workflow on mail.python.org generates a file named u.mbox. I
download it and load it into VM, saving messages of interest to either
s.mbox or h.mbox. Those two files are uploaded back to mail.python.org,
then all three local files [ush].mbox are deleted. The next time I process
new unsure messages, I start from scratch. Deleting an mbox file
effectively zaps all metadata for that file's messages and I start anew
next time.

I'm going to guess Gnus saves metadata in ~/.newsrc.eld. It's not clear how
well it would take to me pulling the rug out from under it by replacing the
previous u.mbox file with a completely new one without also somehow
performing careful surgery on its metadata (assuming I also use it as my
normal mail reader).

Skip


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Read/process mbox file in Gnus
  2020-11-06 18:40         ` Skip Montanaro
@ 2020-11-09 13:40           ` Eric S Fraga
  0 siblings, 0 replies; 12+ messages in thread
From: Eric S Fraga @ 2020-11-09 13:40 UTC (permalink / raw)
  To: help-gnu-emacs

On Friday,  6 Nov 2020 at 12:40, Skip Montanaro wrote:
> I was out riding around (new bike day <https://flic.kr/p/2k3WWdr> 😀) and

Nice! :-)

> it occurred to me that the way I am using mbox files is just as a
> transmission tool. [...] I'm going to guess Gnus saves metadata in
> ~/.newsrc.eld. It's not clear how well it would take to me pulling the
> rug out from under it by replacing the previous u.mbox file with a
> completely new one without also somehow performing careful surgery on
> its metadata (assuming I also use it as my normal mail reader).

I would not recommend playing around with .newsrc.eld directly.

However, I'm not sure you need to.  If you use the u.mbox file as a
"source", i.e. it's emptied by gnus when read, say into an nnml group,
there won't be any inconsistency.  Likewise, your destination mbox files
could be places where you save articles (e.g. using
gnus-summary-save-article, bound to o in the summary view) to and these
will also not be files that gnus will try to keep track of.  Whether
they are empty or not when you ask gnus to save an article to such a
file will not matter as it appends to the file.

Note, you'll want to customize gnus-default-article-saver to use your
desired format (gnus-summary-save-in-mail likely).

-- 
Eric S Fraga via Emacs 28.0.50 & org 9.4 on Debian bullseye/sid




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-09 13:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-05 16:58 Read/process mbox file in Gnus Skip Montanaro
2020-11-06  1:45 ` Eric Abrahamsen
2020-11-06 10:44   ` Robert Pluim
2020-11-06 12:19     ` Eric S Fraga
2020-11-06 13:09       ` Skip Montanaro
2020-11-06 13:27         ` Eric S Fraga
2020-11-06 16:27           ` Eric Abrahamsen
2020-11-06 18:40         ` Skip Montanaro
2020-11-09 13:40           ` Eric S Fraga
2020-11-06 15:34     ` Colin Baxter
2020-11-06 16:22       ` Robert Pluim
2020-11-06 18:03         ` Colin Baxter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).