From: Eric Wong <e@80x24.org>
To: Rob Herring <robh@kernel.org>
Cc: meta@public-inbox.org
Subject: Re: lei missing mails
Date: Mon, 18 Jul 2022 23:41:24 +0000 [thread overview]
Message-ID: <20220718234124.M213084@dcvr> (raw)
In-Reply-To: <CAL_JsqLc8HSZxw1ucPJbo2vVhMa9P1z0bbEQ7pErTgyEUuZ+Dw@mail.gmail.com>
Rob Herring <robh@kernel.org> wrote:
> On Thu, Jun 30, 2022 at 2:55 AM Eric Wong <e@80x24.org> wrote:
> >
> > Rob Herring <robh@kernel.org> wrote:
> > > On Wed, Jun 29, 2022 at 11:27 AM Eric Wong <e@80x24.org> wrote:
> > > > Rob Herring <robh@kernel.org> wrote:
> > > > > On Wed, Jun 29, 2022 at 10:30 AM Eric Wong <e@80x24.org> wrote:
> > > > > > Rob Herring <robh@kernel.org> wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm using lei with lore where I have 2 queries which overlap. Really,
> > > > > > > one is a subset of the other. On those overlapping threads, I'm
> > > > > > > finding that sometimes new messages are written to one mailbox and not
> > > > > > > the other. (At least sometimes, the messages may be missing from all
> > > > > > > mailboxes sometimes too. I'm not certain.) Using --remote-fudge-time
> > > > > > > to force refetching seems to get the missing mails. I haven't found
> > > > > > > anything strange in timestamps of the missing mails, but otherwise am
> > > > > > > not sure how to debug this further. The queries are retrieving full
> > > > > > > threads and the missing mails are in the threads, but not direct
> > > > > > > matches to the queries. I realize that's not a lot of detail to go on.
> > > > > > > Suggestions on debugging this further?
> > > > > >
> > > > > > Is this with 1.8 or 1.7?
> > > > >
> > > > > Commit 68b53c888911 actually. So post 1.8.
> > > >
> > > > OK, thanks for that info.
> > > >
> > > > > > I forgot to note in the release notes, but there were some
> > > > > > SQLite usage-related fixes which could avoid missing messages.
> > > > > >
> > > > > > You'll need "lei daemon-kill" after upgrading to 1.8 to ensure
> > > > > > the new code is running.
> > > > >
> > > > > It's possible I haven't done that since updating though I do vaguely
> > > > > recall seeing something about needing to do that. Is there any way to
> > > > > tell before I restart it?
> > > >
> > > > Not really, but it's pretty cheap to restart (assuming there's no
> > > > long-running jobs).
> > >
> > > I've restarted and just hit this again.
> >
> > Ugh, sorry to hear that :<
> >
> > > > > > What might be interesting is to use the URLs lei prints and
> > > > > > comparing the results w/o lei.
> > >
> > > $ lei up --all
> > > # updating /home/rob/Mail/from-me
> > > # updating /home/rob/Mail/missing-cc
> > > # updating /home/rob/Mail/my-patches
> > > # updating /home/rob/Mail/pci
> > > # https://lore.kernel.org/all/ limiting to 2022-06-27 12:42 -0600 and newer
> > > # https://lore.kernel.org/all/ limiting to 2022-06-27 9:50 -0600 and newer
> > > # https://lore.kernel.org/all/ limiting to 2022-06-27 12:42 -0600 and newer
> > > # /usr/bin/curl -Sf -s -d ''
> > > https://lore.kernel.org/all/?x=m&t=1&q=(dt%3A20220529211430..+AND+(f%3Arobh%40kernel.org+OR+f%3Arobh%2Bdt%40kernel.org))+AND+dt%3A20220627184226..
> > > # /home/rob/.local/share/lei/store 144/144
> > > # /home/rob/.local/share/lei/store 3/3
> > > # /usr/bin/curl -Sf -s -d ''
> > > https://lore.kernel.org/all/?x=m&t=1&q=((dfn%3Adrivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation%2F*+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org+AND+rt%3A1640812470..)+AND+dt%3A20220627155025..
> > > # /usr/bin/curl -Sf -s -d ''
> > > https://lore.kernel.org/all/?x=m&t=1&q=(l%3Alinux-pci+dfn%3Adrivers%2Fpci%2Fcontroller+dt%3A20220529211430..)+AND+dt%3A20220627184226..
> > > # /home/rob/.local/share/lei/store 0/0
> > > # /home/rob/.local/share/lei/store 362/362
> > > # 0 written to /home/rob/Mail/missing-cc/ (0 matches)
> > > # https://lore.kernel.org/all/ 72/72
> > > # https://lore.kernel.org/all/ 4/4
> > > # https://lore.kernel.org/all/ 131/?
> > > # https://lore.kernel.org/all/ 184/?
> > > # https://lore.kernel.org/all/ 412/?
> > > # https://lore.kernel.org/all/ 603/?
> > > # https://lore.kernel.org/all/ 853/?
> > > # https://lore.kernel.org/all/ 1069/?
> > > # https://lore.kernel.org/all/ 1442/?
> > > # https://lore.kernel.org/all/ 1443/1443
> > > # 1 written to /home/rob/Mail/pci/ (75 matches)
> > > # 2 written to /home/rob/Mail/my-patches/ (148 matches)
> > > # 7 written to /home/rob/Mail/from-me/ (1805 matches)
> > >
> > >
> > > What I expected was 3 messages written to 'my-patches'.
> > >
> > > I think the problem is just simply that the new message missing
> > > doesn't match the query, but is a reply to a match. So with a date
> > > after the original match in the thread won't pick up anything. The 2nd
> > > URL above indeed only has 2 results. I guess I just have to fetch a
> > > wider window like a month every time? What's needed is a get any new
> > > messages in existing threads. I don't suppose there's an efficient way
> > > to do that?
> >
> > No, I don't think so. I think this is a separate issue in lei...
> > "t=1" in the remote query expands threads in a time-agnostic
> > way, so I don't think that's the problem (though I may be wrong...).
>
> Based on what the web interface presents, it sure seems like 't=1' is
> independent of the query. The results listed are only those that match
> the query and date range on the match.
Actually, for the HTML results, t=1 is ignored right now...
it's only for mboxrd downloads (via POST) at the moment...
That should be clarified/changed.
> For example, this query returns 3 matches:
>
> https://lore.kernel.org/all/?x=m&t=1&q=((dfn%3Adrivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation%2F*+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org+AND+rt%3A1641934905..)+AND+dt%3A20220630203819..
>
> If I change 'dt' to 1 day earlier, I get 1 more match:
>
> https://lore.kernel.org/all/?x=m&t=1&q=((dfn%3Adrivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation%2F*+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org+AND+rt%3A1641934905..)+AND+dt%3A20220629203819..
>
> That 4th match has a reply after 6/30, but the 1st query will not get
> the reply. This is all reproducible without lei involved at all.
Right. t=1 only expands threads if they're linked via
References/In-Reply-To or (loosely) via matching Subjects.
> What seems to be needed is a 'thread date' which is the latest time
> for any message in a thread that matches. Or perhaps some way to
> separate the query from what's transferred. IOW, query for X, but only
> send results newer than some date.
Yeah, the thread ID info is stored independently of the search
index documents, though; so searching (Xapian) vs filtering
(SQLite) gets a bit tricky
I'll look into JMAP further before making more changes to the
data models to accomodate this.
Apologies for the delays; been stressed over other things :<
next prev parent reply other threads:[~2022-07-18 23:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-29 16:15 lei missing mails Rob Herring
2022-06-29 16:30 ` Eric Wong
2022-06-29 16:53 ` Rob Herring
2022-06-29 17:27 ` Eric Wong
2022-06-29 22:01 ` Rob Herring
2022-06-30 8:55 ` Eric Wong
2022-07-07 9:48 ` Eric Wong
2022-07-11 21:17 ` Rob Herring
2022-07-11 21:59 ` Rob Herring
2022-07-18 23:41 ` Eric Wong [this message]
2022-07-20 22:57 ` [PATCH] www: note "x=m" and "t=1" (mis)use for GET requests Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220718234124.M213084@dcvr \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
--cc=robh@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).