* mailman mbox migration
@ 2019-02-13 14:48 Ali Alnubani
2019-02-13 22:31 ` Eric Wong
0 siblings, 1 reply; 4+ messages in thread
From: Ali Alnubani @ 2019-02-13 14:48 UTC (permalink / raw)
To: meta@public-inbox.org
Hi,
Hope this is the right place to post this.
I'm trying to migrate archives from a Mailman instance (v2.1.15). The format of the archives is mbox format.
How do you suggest I do that? It seems that public-inbox only supports maildir format. Do I need to convert my mbox files to maildir to be able to import them?
Or is there an easier way to achieve this (i.e. mbox support from public-inbox)?
Thanks,
Ali
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mailman mbox migration
2019-02-13 14:48 mailman mbox migration Ali Alnubani
@ 2019-02-13 22:31 ` Eric Wong
2019-03-21 13:23 ` Ali Alnubani
0 siblings, 1 reply; 4+ messages in thread
From: Eric Wong @ 2019-02-13 22:31 UTC (permalink / raw)
To: Ali Alnubani; +Cc: meta
Ali Alnubani <alialnu@mellanox.com> wrote:
> Hi,
>
> Hope this is the right place to post this.
Of course :>
> I'm trying to migrate archives from a Mailman instance (v2.1.15). The format of the archives is mbox format.
For Mailman, Konstantin posted some scripts he used for kernel.org + Mailman
the other day:
https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git
Specifically:
https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/tree/list-archive-maker.py
> How do you suggest I do that? It seems that public-inbox only supports maildir format. Do I need to convert my mbox files to maildir to be able to import them?
> Or is there an easier way to achieve this (i.e. mbox support from public-inbox)?
For regular mboxrd and mboxo (not mangled by Mailman), you can
look into adapting scripts/import_vger_from_mbox
Apparently, I did add support for importing mboxrd/mboxo formats
in the PublicInbox::InboxWritable::import_mbox subroutine (which
is used by import_vger_from_mbox).
mboxcl isn't supported, yet (and I've seen some really scary
mboxcl with multiple Content-Length headers, and >From escaping
to boot; so I'm not sure if that's a route I want to take...)
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: mailman mbox migration
2019-02-13 22:31 ` Eric Wong
@ 2019-03-21 13:23 ` Ali Alnubani
2019-03-21 16:05 ` Eric Wong
0 siblings, 1 reply; 4+ messages in thread
From: Ali Alnubani @ 2019-03-21 13:23 UTC (permalink / raw)
To: Eric Wong; +Cc: meta@public-inbox.org
Hi Eric,
Thanks for help and I apologize for replying quite late.
The script import_vger_from_mbox worked very well.
Do you think that there might be an issue in a few messages being imported twice by both import_vger_from_mbox and public-inbox-watch? Since the lists I'm migrating are very busy, and there will be a delay between importing with the script and running public-inbox-watch.
Thanks,
Ali
> -----Original Message-----
> From: Eric Wong <e@80x24.org>
> Sent: Thursday, February 14, 2019 12:32 AM
> To: Ali Alnubani <alialnu@mellanox.com>
> Cc: meta@public-inbox.org
> Subject: Re: mailman mbox migration
>
> Ali Alnubani <alialnu@mellanox.com> wrote:
> > Hi,
> >
> > Hope this is the right place to post this.
>
> Of course :>
>
> > I'm trying to migrate archives from a Mailman instance (v2.1.15). The
> format of the archives is mbox format.
>
> For Mailman, Konstantin posted some scripts he used for kernel.org +
> Mailman the other day:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fmricon%2Fkorg-
> helpers.git&data=02%7C01%7Calialnu%40mellanox.com%7C4ed2e432a5
> eb4e93335a08d692030bc6%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C
> 1%7C636856939122967951&sdata=5HW1i1v9OZRpcjMwgv1tn4YP9rB1W
> u5IMrOLCy1lVFo%3D&reserved=0
>
> Specifically:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fmricon%2Fkorg-
> helpers.git%2Ftree%2Flist-archive-
> maker.py&data=02%7C01%7Calialnu%40mellanox.com%7C4ed2e432a5
> eb4e93335a08d692030bc6%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C
> 1%7C636856939122967951&sdata=jHyEcwI1j1M3%2F%2BNd5PxUrOcu1
> 05ebHJhgU1irbwKYu8%3D&reserved=0
>
> > How do you suggest I do that? It seems that public-inbox only supports
> maildir format. Do I need to convert my mbox files to maildir to be able to
> import them?
> > Or is there an easier way to achieve this (i.e. mbox support from public-
> inbox)?
>
> For regular mboxrd and mboxo (not mangled by Mailman), you can look into
> adapting scripts/import_vger_from_mbox
>
> Apparently, I did add support for importing mboxrd/mboxo formats in the
> PublicInbox::InboxWritable::import_mbox subroutine (which is used by
> import_vger_from_mbox).
>
> mboxcl isn't supported, yet (and I've seen some really scary mboxcl with
> multiple Content-Length headers, and >From escaping to boot; so I'm not
> sure if that's a route I want to take...)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mailman mbox migration
2019-03-21 13:23 ` Ali Alnubani
@ 2019-03-21 16:05 ` Eric Wong
0 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2019-03-21 16:05 UTC (permalink / raw)
To: Ali Alnubani; +Cc: meta
Ali Alnubani <alialnu@mellanox.com> wrote:
> Hi Eric,
>
> Thanks for help and I apologize for replying quite late.
>
> The script import_vger_from_mbox worked very well.
Good to know :>
> Do you think that there might be an issue in a few messages
> being imported twice by both import_vger_from_mbox and
> public-inbox-watch? Since the lists I'm migrating are very
> busy, and there will be a delay between importing with the
> script and running public-inbox-watch.
Messages they are deduped by Message-ID and content. However,
V2 allows different messages to use the same Message-IDs,
(because some non-spam-but-buggy bots/mailers do it). So if
Mailman mangles the message going into the mbox differently than
the one going into the Maildir for -watch, then you can get
duplicates.
Fwiw, mass imports are much faster if you use "eatmydata", a
LD_PRELOAD which disables fsync. On a reasonably fast VM with
good, TRIM-ed SSD ("fstrim -a" first), and lots of RAM,
importing 2000-2017 LKML history took around 3-4 hours. More
cores only helps if your SSD can keep up, and I seem to remember
using NPROC=4 (via env) was the point of diminishing returns for
that VM I used.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-03-21 16:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-13 14:48 mailman mbox migration Ali Alnubani
2019-02-13 22:31 ` Eric Wong
2019-03-21 13:23 ` Ali Alnubani
2019-03-21 16:05 ` Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).