unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Rob Herring <robh@kernel.org>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: lei missing mails
Date: Wed, 29 Jun 2022 16:01:57 -0600	[thread overview]
Message-ID: <CAL_Jsq+PU=Os_dOpTDMq5E5BCrqGRpxKbROsuCGctf-Mbr38hA@mail.gmail.com> (raw)
In-Reply-To: <20220629172742.M978900@dcvr>

On Wed, Jun 29, 2022 at 11:27 AM Eric Wong <e@80x24.org> wrote:
>
> Rob Herring <robh@kernel.org> wrote:
> > On Wed, Jun 29, 2022 at 10:30 AM Eric Wong <e@80x24.org> wrote:
> > >
> > > Rob Herring <robh@kernel.org> wrote:
> > > > Hi,
> > > >
> > > > I'm using lei with lore where I have 2 queries which overlap. Really,
> > > > one is a subset of the other. On those overlapping threads, I'm
> > > > finding that sometimes new messages are written to one mailbox and not
> > > > the other. (At least sometimes, the messages may be missing from all
> > > > mailboxes sometimes too. I'm not certain.) Using --remote-fudge-time
> > > > to force refetching seems to get the missing mails. I haven't found
> > > > anything strange in timestamps of the missing mails, but otherwise am
> > > > not sure how to debug this further. The queries are retrieving full
> > > > threads and the missing mails are in the threads, but not direct
> > > > matches to the queries. I realize that's not a lot of detail to go on.
> > > > Suggestions on debugging this further?
> > >
> > > Is this with 1.8 or 1.7?
> >
> > Commit 68b53c888911 actually. So post 1.8.
>
> OK, thanks for that info.
>
> > > I forgot to note in the release notes, but there were some
> > > SQLite usage-related fixes which could avoid missing messages.
> > >
> > > You'll need "lei daemon-kill" after upgrading to 1.8 to ensure
> > > the new code is running.
> >
> > It's possible I haven't done that since updating though I do vaguely
> > recall seeing something about needing to do that. Is there any way to
> > tell before I restart it?
>
> Not really, but it's pretty cheap to restart (assuming there's no
> long-running jobs).

I've restarted and just hit this again.


> > > What might be interesting is to use the URLs lei prints and
> > > comparing the results w/o lei.

$ lei up --all
# updating /home/rob/Mail/from-me
# updating /home/rob/Mail/missing-cc
# updating /home/rob/Mail/my-patches
# updating /home/rob/Mail/pci
# https://lore.kernel.org/all/ limiting to 2022-06-27 12:42 -0600 and newer
# https://lore.kernel.org/all/ limiting to 2022-06-27  9:50 -0600 and newer
# https://lore.kernel.org/all/ limiting to 2022-06-27 12:42 -0600 and newer
# /usr/bin/curl -Sf -s -d ''
https://lore.kernel.org/all/?x=m&t=1&q=(dt%3A20220529211430..+AND+(f%3Arobh%40kernel.org+OR+f%3Arobh%2Bdt%40kernel.org))+AND+dt%3A20220627184226..
# /home/rob/.local/share/lei/store 144/144
# /home/rob/.local/share/lei/store 3/3
# /usr/bin/curl -Sf -s -d ''
https://lore.kernel.org/all/?x=m&t=1&q=((dfn%3Adrivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation%2F*+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org+AND+rt%3A1640812470..)+AND+dt%3A20220627155025..
# /usr/bin/curl -Sf -s -d ''
https://lore.kernel.org/all/?x=m&t=1&q=(l%3Alinux-pci+dfn%3Adrivers%2Fpci%2Fcontroller+dt%3A20220529211430..)+AND+dt%3A20220627184226..
# /home/rob/.local/share/lei/store 0/0
# /home/rob/.local/share/lei/store 362/362
# 0 written to /home/rob/Mail/missing-cc/ (0 matches)
# https://lore.kernel.org/all/ 72/72
# https://lore.kernel.org/all/ 4/4
# https://lore.kernel.org/all/ 131/?
# https://lore.kernel.org/all/ 184/?
# https://lore.kernel.org/all/ 412/?
# https://lore.kernel.org/all/ 603/?
# https://lore.kernel.org/all/ 853/?
# https://lore.kernel.org/all/ 1069/?
# https://lore.kernel.org/all/ 1442/?
# https://lore.kernel.org/all/ 1443/1443
# 1 written to /home/rob/Mail/pci/ (75 matches)
# 2 written to /home/rob/Mail/my-patches/ (148 matches)
# 7 written to /home/rob/Mail/from-me/ (1805 matches)


What I expected was 3 messages written to 'my-patches'.

I think the problem is just simply that the new message missing
doesn't match the query, but is a reply to a match. So with a date
after the original match in the thread won't pick up anything. The 2nd
URL above indeed only has 2 results. I guess I just have to fetch a
wider window like a month every time? What's needed is a get any new
messages in existing threads. I don't suppose there's an efficient way
to do that?

> > >
> > > I'll have to double-check if overlapping affects things, but it
> > > shouldn't; since the dedupe logic is per-output.
> > >
> > > Is this exclusively with HTTPS endpoints and writing to Maildirs
> > > (or something else?)
> >
> > Yes. It's querying lore and writing to a maildir. Here's one of the queries:
> >
> > [lei]
> >         q = (dfn:drivers OR dfn:arch OR dfn:Documentation/* OR
> > dfn:include OR dfn:scripts) AND \
> >          f:robh@kernel.org AND rt:6.month.ago..
> > [lei "q"]
> >         include = https://lore.kernel.org/all/
> >         external = 1
> >         local = 1
> >         remote = 1
> >         threads = 1
> >         dedupe = mid
> >         output = maildir:/home/rob/Mail/my-patches
>
> Fwiw, dedupe based on mid could be vulnerable to spoofing, which
> is why `content' is the default.  But yes, in the past, I've
> noticed some messages to meta@public-inbox.org not showing up,
> though not recently (I guess lack of activity here is a culprit :x)

Does 'content' ignore trailers that mailman lists like to add? I think
I switched because of that.

Rob

  reply	other threads:[~2022-06-29 22:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29 16:15 lei missing mails Rob Herring
2022-06-29 16:30 ` Eric Wong
2022-06-29 16:53   ` Rob Herring
2022-06-29 17:27     ` Eric Wong
2022-06-29 22:01       ` Rob Herring [this message]
2022-06-30  8:55         ` Eric Wong
2022-07-07  9:48           ` Eric Wong
2022-07-11 21:17             ` Rob Herring
2022-07-11 21:59           ` Rob Herring
2022-07-18 23:41             ` Eric Wong
2022-07-20 22:57               ` [PATCH] www: note "x=m" and "t=1" (mis)use for GET requests Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL_Jsq+PU=Os_dOpTDMq5E5BCrqGRpxKbROsuCGctf-Mbr38hA@mail.gmail.com' \
    --to=robh@kernel.org \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).