unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* Help setting up a public-inbox instance: importing from maildir archive isn't working
@ 2023-06-13 14:49 Rebecca Cran
  2023-06-14  9:44 ` Eric Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Rebecca Cran @ 2023-06-13 14:49 UTC (permalink / raw)
  To: meta

I'm trying to set up a public-inbox instance to mirror the edk2-devel 
mailing list. I'm using version 1.9.0.

I'm having problems importing a mirror of 105,000 messages from 2016 
onward - when I configured it to pull them from an IMAP server it 
appeared to go through all of them, but then the web interface only 
showed the last 10 or so, with no more pages and the edk2-devel.git 
directory is only a few MB.

When imap didn't work, I tried downloading them into a maildir and tried 
importing them via that instead, but that isn't working either.

I was wondering if someone on this list could help point out what I'm 
doing wrong.


The maildir is:


~$ du -h Maildump/
4.0K    Maildump/Mail/cur
4.0K    Maildump/Mail/tmp
2.5G    Maildump/Mail/new
2.5G    Maildump/Mail
2.5G    Maildump/

I'm using the following configuration in ~/.public-inbox/config:


[publicinbox]
     wwwlisting = all

[publicinbox "edk2-devel"]
     address = devel@edk2.groups.io
     inboxdir = /home/public-inbox/edk2-devel.git
     watchheader = List-Id:<devel.edk2.groups.io>
     url = https://openfw.io/edk2-devel
     watch = maildir:/home/public-inbox/Maildump/Mail/
     watch = imaps://imap.example.net/INBOX ; redacted
     indexlevel = full
     replyto = :list
     spamcheck = none
     obfuscate = true


Authentication for the imap server is handled via git-credential.

I'm running the following commands:


public-inbox-init -V2 -L full edk2-devel 
/home/public-inbox/edk2-devel.git https://openfw.io/edk2-devel 
devel@edk2.groups.io

public-inbox-watcher # I run this in a screen session

public-inbox-httpd     # I run this in a separate screen session, once 
public-inbox-watcher appears to have processed the existing messages


-- 

Rebecca Cran


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help setting up a public-inbox instance: importing from maildir archive isn't working
  2023-06-13 14:49 Help setting up a public-inbox instance: importing from maildir archive isn't working Rebecca Cran
@ 2023-06-14  9:44 ` Eric Wong
  2023-06-19 19:12   ` Rebecca Cran
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Wong @ 2023-06-14  9:44 UTC (permalink / raw)
  To: Rebecca Cran; +Cc: meta

Rebecca Cran <rebecca@bsdio.com> wrote:
> I'm trying to set up a public-inbox instance to mirror the edk2-devel
> mailing list. I'm using version 1.9.0.
> 
> I'm having problems importing a mirror of 105,000 messages from 2016 onward
> - when I configured it to pull them from an IMAP server it appeared to go
> through all of them, but then the web interface only showed the last 10 or
> so, with no more pages and the edk2-devel.git directory is only a few MB.
> 
> When imap didn't work, I tried downloading them into a maildir and tried
> importing them via that instead, but that isn't working either.

OK, so I suppose it's a matching problem because the matching
logic is shared between IMAP and Maildir.

> [publicinbox "edk2-devel"]
>     address = devel@edk2.groups.io
>     inboxdir = /home/public-inbox/edk2-devel.git

>     watchheader = List-Id:<devel.edk2.groups.io>

Any chance that List-Id doesn't match the older messages?

You are allowed multiple `watchheader' directives for an inbox
to account for address/name changes and such (and older headers
such as `X-BeenThere')

I haven't tried it, but this should work as long as you want
every message in a watched Maildir (or IMAP folder):

	watchheader = From:.

...which matches all messages with a literal `.' in the From: header;
so practically every valid message.  Likewise for Received:, Date:
or any practically-always-present header:value-substring combo.

I think everything else in your configs+commands looked fine;
but I'm still struggling with lack-of-sleep and could've missed
things :<


I designed the `watchheader' directives to handle multiple lists
funneled into one Maildir; but I suppose it's less intuitive for
users with a 1:1 list => Maildir mapping :x

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help setting up a public-inbox instance: importing from maildir archive isn't working
  2023-06-14  9:44 ` Eric Wong
@ 2023-06-19 19:12   ` Rebecca Cran
  2023-06-20  2:56     ` Eric Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Rebecca Cran @ 2023-06-19 19:12 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On 6/14/23 03:44, Eric Wong wrote:

> Any chance that List-Id doesn't match the older messages?
>
> You are allowed multiple `watchheader' directives for an inbox
> to account for address/name changes and such (and older headers
> such as `X-BeenThere')
>
> I haven't tried it, but this should work as long as you want
> every message in a watched Maildir (or IMAP folder):
>
> 	watchheader = From:.
>
> ...which matches all messages with a literal `.' in the From: header;
> so practically every valid message.  Likewise for Received:, Date:
> or any practically-always-present header:value-substring combo.
>
> I think everything else in your configs+commands looked fine;
> but I'm still struggling with lack-of-sleep and could've missed
> things :<
>
>
> I designed the `watchheader' directives to handle multiple lists
> funneled into one Maildir; but I suppose it's less intuitive for
> users with a 1:1 list => Maildir mapping :x

Unfortunately I have several lists in the same Maildir, so I need to use 
watchheader.

The List-Id hasn't changed for several years: for example this is a 
message from November 2021:

List-Unsubscribe: <mailto:devel+unsubscribe@edk2.groups.io>
List-Subscribe: <mailto:devel+subscribe@edk2.groups.io>
List-Help: <mailto:devel+help@edk2.groups.io>
Sender: devel@edk2.groups.io
List-Id: <devel.edk2.groups.io>
Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io
X-Remote-Delivered-To: mailing list devel@edk2.groups.io
Reply-To: devel@edk2.groups.io,abdattar@amd.com
X-Gm-Message-State: WX5Eq2TqR2PVB2bR8lJXcv0Zx3953573AA=
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain

I added a line to match against To: as well, and that's working.

-- 
Rebecca Cran


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help setting up a public-inbox instance: importing from maildir archive isn't working
  2023-06-19 19:12   ` Rebecca Cran
@ 2023-06-20  2:56     ` Eric Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2023-06-20  2:56 UTC (permalink / raw)
  To: Rebecca Cran; +Cc: meta

Rebecca Cran <rebecca@bsdio.com> wrote:
> On 6/14/23 03:44, Eric Wong wrote:
> 
> > Any chance that List-Id doesn't match the older messages?
> > 
> > You are allowed multiple `watchheader' directives for an inbox
> > to account for address/name changes and such (and older headers
> > such as `X-BeenThere')
> > 
> > I haven't tried it, but this should work as long as you want
> > every message in a watched Maildir (or IMAP folder):
> > 
> > 	watchheader = From:.
> > 
> > ...which matches all messages with a literal `.' in the From: header;
> > so practically every valid message.  Likewise for Received:, Date:
> > or any practically-always-present header:value-substring combo.
> > 
> > I think everything else in your configs+commands looked fine;
> > but I'm still struggling with lack-of-sleep and could've missed
> > things :<
> > 
> > 
> > I designed the `watchheader' directives to handle multiple lists
> > funneled into one Maildir; but I suppose it's less intuitive for
> > users with a 1:1 list => Maildir mapping :x
> 
> Unfortunately I have several lists in the same Maildir, so I need to use
> watchheader.
> 
> The List-Id hasn't changed for several years: for example this is a message
> from November 2021:
> 
> List-Unsubscribe: <mailto:devel+unsubscribe@edk2.groups.io>
> List-Subscribe: <mailto:devel+subscribe@edk2.groups.io>
> List-Help: <mailto:devel+help@edk2.groups.io>
> Sender: devel@edk2.groups.io
> List-Id: <devel.edk2.groups.io>
> Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io
> X-Remote-Delivered-To: mailing list devel@edk2.groups.io
> Reply-To: devel@edk2.groups.io,abdattar@amd.com
> X-Gm-Message-State: WX5Eq2TqR2PVB2bR8lJXcv0Zx3953573AA=
> Content-Transfer-Encoding: quoted-printable
> Content-Type: text/plain
> 
> I added a line to match against To: as well, and that's working.

OK, it's good that To: works for you; but it's still worrying to
me that List-Id didn't work...

If you have time to help diagnose this, can you try:

	listid = devel.edk2.groups.io

in the config file and omit all `watchheader' directives?

public-inbox-watch will auto-translate listid to the appropriate
watchheader directive, but be case-insensitive in accordance
with RFC 2919 section 6.

Or are you be able to share a dump of the messages for me to try?
(getting a 502 error on <https://openfw.io/edk2-devel>)
Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-06-20  2:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-13 14:49 Help setting up a public-inbox instance: importing from maildir archive isn't working Rebecca Cran
2023-06-14  9:44 ` Eric Wong
2023-06-19 19:12   ` Rebecca Cran
2023-06-20  2:56     ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).