From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id E23911F452; Tue, 28 Mar 2023 19:45:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1680032749; bh=XoZACgsPa9pGARe0jKLgwOdn2n5SdPhRkp10kUbwy+Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rqKci9SNduVjv1FZGqxbzEcegh8Gfx8fma4wXDfkCZx1jZ/k0nfaAfKY7wz9+f/34 JVQKdaiQ9SV2KAAwrvk4G0EksvcllOu0DYy2aU2/6bjZR/E6tPCWKsKGHAxaJqf6dQ mMDJbIpFyyAlhzXXjKM+o3xB5fQWUnnV9iZU0VII= Date: Tue, 28 Mar 2023 19:45:49 +0000 From: Eric Wong To: Konstantin Ryabitsev Cc: meta@public-inbox.org Subject: Re: Cheap way to check for new messages in a thread Message-ID: <20230328194549.M808175@dcvr> References: <20230327191049.M277377@dcvr> <20230327213849.M743623@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: List-Id: Konstantin Ryabitsev wrote: > On Mon, Mar 27, 2023 at 09:38:49PM +0000, Eric Wong wrote: > > I thought about that, too; but I'm worried about having one-off > > stuff that ends up needing to be supported indefinitely. > > > > JMAP for this would take more time, but I'd be more comfortable > > carrying it long-term. > > > > I don't expect trimming after the first paragraph to be a huge > > improvement. Retrieving any part of the message from git and > > dealing with MIME is expensive, anyways. I wouldn't expect it > > to be a big (if any) improvement compared to POST-ing for the > > mbox.gz (&x=m&t=1) endpoint with rt:$SINCE.. > > Hmm... This didn't seem to do the right thing for me. For example, this > thread: > > https://lore.kernel.org/lkml/20230327080502.GA570847@ziqianlu-desk2 > > If I ask for any new messages in that thread since 20230327120000, I get > nothing: > > curl -Sf -d '' 'https://lore.kernel.org/all/?x=m&t=1&q=mid%3A20230327080502.GA570847@ziqianlu-desk2+AND+dt%3A20230328120000..' Ugh, that's because the thread expansion (t=1) happens after Xapian handles dt:/rt:/d: I don't know if there's a good way to do that entirely within Xapian via high-level Perl bindings. Some options: A) grab MSGID first, lookup THREADID for a given MSGID, use remaining query The problem is figuring out which parts of the query to handle, first. Maybe a solution below... B) add explicit before= and after= parameters which allow us to do filtering ourselves in the thread expansion phase C) index References:/In-Reply-To: so searching `ref:$MSGID' can work. This doesn't work for some MUAs and deep threads, though. D) Support `thread:{subquery}' like notmuch. Thus `thread:{mid:$MSGID} AND dt:$START..' would communicate to Xapian what we want for A). I'm not sure this is doable unless using Xapian via C++, but I've been considering providing the option to use C++ anyways to support less hacky approxidate query parsing. According to notmuch docs, it's expensive, though :< I think it's possible to support /$INBOX/$MSGID/t.mbox.gz?q=... for A) without too much difficulty. I'll have to think about it a bit... D) is good for long-term consideration if proper timeouts can be implemented. > > The mbox.gz endpoints should be a bit more efficient for the > > server than Atom feeds; decoding MIME and HTML escaping takes up > > considerable CPU time. > > Good to know. I'm really looking for a way to ask the remote system "hey, is > there anything new in this thread?" so that I can quickly ignore threads > without any updates. All the mbox.gz endpoints will 404 if there's no results, and the `-f' flag of curl will ensure nothing's emitted to stdout in that case.