From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.1 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 6D0901F727; Thu, 30 Jun 2022 08:55:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1656579339; bh=WI+uf5YDNh54YOAewGqtxfAoQxPYXobTnPnzvx1hCOY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=TLIn/IsW8GN3bTMGKfgcvx4KFoukaFkKksjyTU4PC7U1RD8PfshbA11tbslMtdpom sa6vWQPCn2wouCrG4/L1h257vazHBIkhIuG5NsVF9+xH0MJMMb258ZnopRXOhozcX5 NoYAtl+PArS6thVWBf+M25mCFoIMqdHHAN4OBJv8= Date: Thu, 30 Jun 2022 08:55:39 +0000 From: Eric Wong To: Rob Herring Cc: meta@public-inbox.org Subject: Re: lei missing mails Message-ID: <20220630085539.M324144@dcvr> References: <20220629163033.GA14412@dcvr> <20220629172742.M978900@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: List-Id: Rob Herring wrote: > On Wed, Jun 29, 2022 at 11:27 AM Eric Wong wrote: > > Rob Herring wrote: > > > On Wed, Jun 29, 2022 at 10:30 AM Eric Wong wrote: > > > > Rob Herring wrote: > > > > > Hi, > > > > > > > > > > I'm using lei with lore where I have 2 queries which overlap. Really, > > > > > one is a subset of the other. On those overlapping threads, I'm > > > > > finding that sometimes new messages are written to one mailbox and not > > > > > the other. (At least sometimes, the messages may be missing from all > > > > > mailboxes sometimes too. I'm not certain.) Using --remote-fudge-time > > > > > to force refetching seems to get the missing mails. I haven't found > > > > > anything strange in timestamps of the missing mails, but otherwise am > > > > > not sure how to debug this further. The queries are retrieving full > > > > > threads and the missing mails are in the threads, but not direct > > > > > matches to the queries. I realize that's not a lot of detail to go on. > > > > > Suggestions on debugging this further? > > > > > > > > Is this with 1.8 or 1.7? > > > > > > Commit 68b53c888911 actually. So post 1.8. > > > > OK, thanks for that info. > > > > > > I forgot to note in the release notes, but there were some > > > > SQLite usage-related fixes which could avoid missing messages. > > > > > > > > You'll need "lei daemon-kill" after upgrading to 1.8 to ensure > > > > the new code is running. > > > > > > It's possible I haven't done that since updating though I do vaguely > > > recall seeing something about needing to do that. Is there any way to > > > tell before I restart it? > > > > Not really, but it's pretty cheap to restart (assuming there's no > > long-running jobs). > > I've restarted and just hit this again. Ugh, sorry to hear that :< > > > > What might be interesting is to use the URLs lei prints and > > > > comparing the results w/o lei. > > $ lei up --all > # updating /home/rob/Mail/from-me > # updating /home/rob/Mail/missing-cc > # updating /home/rob/Mail/my-patches > # updating /home/rob/Mail/pci > # https://lore.kernel.org/all/ limiting to 2022-06-27 12:42 -0600 and newer > # https://lore.kernel.org/all/ limiting to 2022-06-27 9:50 -0600 and newer > # https://lore.kernel.org/all/ limiting to 2022-06-27 12:42 -0600 and newer > # /usr/bin/curl -Sf -s -d '' > https://lore.kernel.org/all/?x=m&t=1&q=(dt%3A20220529211430..+AND+(f%3Arobh%40kernel.org+OR+f%3Arobh%2Bdt%40kernel.org))+AND+dt%3A20220627184226.. > # /home/rob/.local/share/lei/store 144/144 > # /home/rob/.local/share/lei/store 3/3 > # /usr/bin/curl -Sf -s -d '' > https://lore.kernel.org/all/?x=m&t=1&q=((dfn%3Adrivers+OR+dfn%3Aarch+OR+dfn%3ADocumentation%2F*+OR+dfn%3Ainclude+OR+dfn%3Ascripts)+AND+f%3Arobh%40kernel.org+AND+rt%3A1640812470..)+AND+dt%3A20220627155025.. > # /usr/bin/curl -Sf -s -d '' > https://lore.kernel.org/all/?x=m&t=1&q=(l%3Alinux-pci+dfn%3Adrivers%2Fpci%2Fcontroller+dt%3A20220529211430..)+AND+dt%3A20220627184226.. > # /home/rob/.local/share/lei/store 0/0 > # /home/rob/.local/share/lei/store 362/362 > # 0 written to /home/rob/Mail/missing-cc/ (0 matches) > # https://lore.kernel.org/all/ 72/72 > # https://lore.kernel.org/all/ 4/4 > # https://lore.kernel.org/all/ 131/? > # https://lore.kernel.org/all/ 184/? > # https://lore.kernel.org/all/ 412/? > # https://lore.kernel.org/all/ 603/? > # https://lore.kernel.org/all/ 853/? > # https://lore.kernel.org/all/ 1069/? > # https://lore.kernel.org/all/ 1442/? > # https://lore.kernel.org/all/ 1443/1443 > # 1 written to /home/rob/Mail/pci/ (75 matches) > # 2 written to /home/rob/Mail/my-patches/ (148 matches) > # 7 written to /home/rob/Mail/from-me/ (1805 matches) > > > What I expected was 3 messages written to 'my-patches'. > > I think the problem is just simply that the new message missing > doesn't match the query, but is a reply to a match. So with a date > after the original match in the thread won't pick up anything. The 2nd > URL above indeed only has 2 results. I guess I just have to fetch a > wider window like a month every time? What's needed is a get any new > messages in existing threads. I don't suppose there's an efficient way > to do that? No, I don't think so. I think this is a separate issue in lei... "t=1" in the remote query expands threads in a time-agnostic way, so I don't think that's the problem (though I may be wrong...). I'll have to check more closely this week (still stuck with POP3 user account/storage issues :<) > > > > > > > > I'll have to double-check if overlapping affects things, but it > > > > shouldn't; since the dedupe logic is per-output. > > > > > > > > Is this exclusively with HTTPS endpoints and writing to Maildirs > > > > (or something else?) > > > > > > Yes. It's querying lore and writing to a maildir. Here's one of the queries: > > > > > > [lei] > > > q = (dfn:drivers OR dfn:arch OR dfn:Documentation/* OR > > > dfn:include OR dfn:scripts) AND \ > > > f:robh@kernel.org AND rt:6.month.ago.. > > > [lei "q"] > > > include = https://lore.kernel.org/all/ > > > external = 1 > > > local = 1 > > > remote = 1 > > > threads = 1 > > > dedupe = mid > > > output = maildir:/home/rob/Mail/my-patches > > > > Fwiw, dedupe based on mid could be vulnerable to spoofing, which > > is why `content' is the default. But yes, in the past, I've > > noticed some messages to meta@public-inbox.org not showing up, > > though not recently (I guess lack of activity here is a culprit :x) > > Does 'content' ignore trailers that mailman lists like to add? I think > I switched because of that. No, unfortunately not. Hopefully the admins can be convinced to get rid of trailers (I'm happy vger did so a few years back). But I'd rather deal with duplicates than miss messages (there have been legitimate messages in the past which reused msgids, unfortunately).