unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* downloading t.mbox.gz messages are not sorted in expected order
@ 2024-04-11 22:32 Jacob Keller
  2024-04-11 22:42 ` Konstantin Ryabitsev
  0 siblings, 1 reply; 5+ messages in thread
From: Jacob Keller @ 2024-04-11 22:32 UTC (permalink / raw)
  To: meta

Hi,

I sometimes download patch series off of public inbox hosted servers to
apply with git-am. Occasionally I have found that these do not apply
cleanly because the thread is not sorted in patch order.

For an example, see
https://lore.kernel.org/lkml/20240308230557.805580-1-alex.williamson@redhat.com/

This thread shows up and the links are all in order on the HTML view,
but if I download the t.mbox and split it using git mailsplit, I get the
following:

$ git mailsplit t.mbox -osplit
$ cd split
$ for f in *; do echo $f ; grep '^Subject:' $f; done
> 0001
> Subject: [PATCH v2 0/7] vfio: Interrupt eventfd hardening
> 0002
> Subject: [PATCH v2 2/7] vfio/pci: Lock external INTx masking ops
> 0003
> Subject: [PATCH v2 3/7] vfio: Introduce interface to flush virqfd inject workqueue
> 0004
> Subject: [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ
> 0005
> Subject: [PATCH v2 4/7] vfio/pci: Create persistent INTx handler
> 0006
> Subject: [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
> 0007
> Subject: [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup
> 0008
> Subject: [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger
> 0009
> Subject: RE: [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup
> 0010
> Subject: RE: [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
> 0011
> Subject: RE: [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler
> 0012
> Subject: Re: [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx
> 0013
> Subject: Re: [PATCH v2 2/7] vfio/pci: Lock external INTx masking ops
> 0014
> Subject: Re: [PATCH v2 3/7] vfio: Introduce interface to flush virqfd inject
> 0015
> Subject: Re: [PATCH v2 4/7] vfio/pci: Create persistent INTx handler
> 0016
> Subject: Re: [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup
> 0017
> Subject: Re: [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
> 0018
> Subject: Re: [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler
> 0019
> Subject: Re: [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx
> 0020
> Subject: Re: [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
> 

I'm not really sure why messages appear in this order in the t.mbox
file. The messages appear sorted properly when viewing the thread via
the next and previous links and in the summary of links at the HTML
display..

Any ideas? It is rather frustrating to have to realize that an mbox is
not sorted and I have to split and sort it manually...

I tried looking through the the public-inbox to see if I could figure it
out but its not clear to me.

Maybe its because of how thread_html uses walk_thread to go in a
particular order where was the mbox code seems to just iterate over
messages in the thread and go in essentially a random order?

Thanks,
Jake

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: downloading t.mbox.gz messages are not sorted in expected order
  2024-04-11 22:32 downloading t.mbox.gz messages are not sorted in expected order Jacob Keller
@ 2024-04-11 22:42 ` Konstantin Ryabitsev
  2024-04-12 23:59   ` Jacob Keller
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Ryabitsev @ 2024-04-11 22:42 UTC (permalink / raw)
  To: Jacob Keller; +Cc: meta

On Thu, Apr 11, 2024 at 03:32:43PM -0700, Jacob Keller wrote:
> I sometimes download patch series off of public inbox hosted servers to
> apply with git-am. Occasionally I have found that these do not apply
> cleanly because the thread is not sorted in patch order.

It's more than just the order -- if there are replies in the thread, the mbox
file won't apply either.

This is the reason why the b4 tool exists:
https://b4.docs.kernel.org/

> For an example, see
> https://lore.kernel.org/lkml/20240308230557.805580-1-alex.williamson@redhat.com/

    $ b4 am -o/tmp https://lore.kernel.org/lkml/20240308230557.805580-1-alex.williamson@redhat.com/
    Grabbing thread from lore.kernel.org/all/20240308230557.805580-1-alex.williamson@redhat.com/t.mbox.gz
    Analyzing 20 messages in the thread
    Looking for additional code-review trailers on lore.kernel.org
    Checking attestation on all messages, may take a moment...
    ---
      ✓ [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ
      ✓ [PATCH v2 2/7] vfio/pci: Lock external INTx masking ops
        + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
      ✓ [PATCH v2 3/7] vfio: Introduce interface to flush virqfd inject workqueue
        + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
      ✓ [PATCH v2 4/7] vfio/pci: Create persistent INTx handler
        + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
      ✓ [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup
        + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com)
        + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
      ✓ [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
        + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com)
        + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
        + Tested-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
      ✓ [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger
        + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com)
        + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
      ---
      ✓ Signed: DKIM/redhat.com
    ---
    Total patches: 7
    ---
    Cover: /tmp/v2_20240308_alex_williamson_vfio_interrupt_eventfd_hardening.cover
     Link: https://lore.kernel.org/r/20240308230557.805580-1-alex.williamson@redhat.com
     Base: not specified
           git am /tmp/v2_20240308_alex_williamson_vfio_interrupt_eventfd_hardening.mbx

-K

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: downloading t.mbox.gz messages are not sorted in expected order
  2024-04-11 22:42 ` Konstantin Ryabitsev
@ 2024-04-12 23:59   ` Jacob Keller
  2024-04-13  7:58     ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Jacob Keller @ 2024-04-12 23:59 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta



On 4/11/2024 3:42 PM, Konstantin Ryabitsev wrote:
> On Thu, Apr 11, 2024 at 03:32:43PM -0700, Jacob Keller wrote:
>> I sometimes download patch series off of public inbox hosted servers to
>> apply with git-am. Occasionally I have found that these do not apply
>> cleanly because the thread is not sorted in patch order.
> 
> It's more than just the order -- if there are replies in the thread, the mbox
> file won't apply either.
> 

If the order was correct, it is usually easy enough to just "git am
--skip" the patches which have no content. However...

> This is the reason why the b4 tool exists:
> https://b4.docs.kernel.org/
> 

This is extremely useful and I was unaware of its existence. Thanks!!

>> For an example, see
>> https://lore.kernel.org/lkml/20240308230557.805580-1-alex.williamson@redhat.com/
> 
>     $ b4 am -o/tmp https://lore.kernel.org/lkml/20240308230557.805580-1-alex.williamson@redhat.com/
>     Grabbing thread from lore.kernel.org/all/20240308230557.805580-1-alex.williamson@redhat.com/t.mbox.gz
>     Analyzing 20 messages in the thread
>     Looking for additional code-review trailers on lore.kernel.org
>     Checking attestation on all messages, may take a moment...
>     ---
>       ✓ [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ
>       ✓ [PATCH v2 2/7] vfio/pci: Lock external INTx masking ops
>         + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
>       ✓ [PATCH v2 3/7] vfio: Introduce interface to flush virqfd inject workqueue
>         + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
>       ✓ [PATCH v2 4/7] vfio/pci: Create persistent INTx handler
>         + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
>       ✓ [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup
>         + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com)
>         + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
>       ✓ [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
>         + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com)
>         + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
>         + Tested-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
>       ✓ [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger
>         + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com)
>         + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com)
>       ---
>       ✓ Signed: DKIM/redhat.com
>     ---
>     Total patches: 7
>     ---
>     Cover: /tmp/v2_20240308_alex_williamson_vfio_interrupt_eventfd_hardening.cover
>      Link: https://lore.kernel.org/r/20240308230557.805580-1-alex.williamson@redhat.com
>      Base: not specified
>            git am /tmp/v2_20240308_alex_williamson_vfio_interrupt_eventfd_hardening.mbx
> 
> -K

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: downloading t.mbox.gz messages are not sorted in expected order
  2024-04-12 23:59   ` Jacob Keller
@ 2024-04-13  7:58     ` Eric Wong
  2024-04-15 21:06       ` Jacob Keller
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2024-04-13  7:58 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Konstantin Ryabitsev, meta

Jacob Keller <jacob.e.keller@intel.com> wrote:
> 
> 
> On 4/11/2024 3:42 PM, Konstantin Ryabitsev wrote:
> > On Thu, Apr 11, 2024 at 03:32:43PM -0700, Jacob Keller wrote:
> >> I sometimes download patch series off of public inbox hosted servers to
> >> apply with git-am. Occasionally I have found that these do not apply
> >> cleanly because the thread is not sorted in patch order.
> > 
> > It's more than just the order -- if there are replies in the thread, the mbox
> > file won't apply either.
> > 
> 
> If the order was correct, it is usually easy enough to just "git am
> --skip" the patches which have no content. However...
> 
> > This is the reason why the b4 tool exists:
> > https://b4.docs.kernel.org/
> > 
> 
> This is extremely useful and I was unaware of its existence. Thanks!!

Good to know b4 works for you.

FWIW, t.mbox.gz uses NNTP article number ordering to ensure
batched fetches work and duplicates can't get served.

IOW, it fetches a batch of 1000 header rows at a time from a
single thread to avoid using too much memory for a single
request.  The next batch (another 1K) only gets fetched once the
current batch is done.  So it must order by article number to
deal with that, especially since new messages may appear in the
thread while the current batch is being streamed.

Identical Date: headers can appear multiple times in the same
thread, so using a >= or > comparison for retrieval wouldn't
work.

Of course, most threads are <1000 messages, so I did think
about sorting by Date for small threads (as we do for the HTML
output)...

However with the current t.mbox.gz code, we expect (and can
handle) new messages appearing while a t.mbox.gz is being
served.  So if a thread has 10 messages, the first batch fetch
would only return those 10.  However, while a client is slowly
downloading the first 10 messages, more messages show up.  The
current retrieval scheme allows new messages in a thread to show
up without needing another request.

AFAIK, it's actually easier and fewer SQL statements to do the
current way.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: downloading t.mbox.gz messages are not sorted in expected order
  2024-04-13  7:58     ` Eric Wong
@ 2024-04-15 21:06       ` Jacob Keller
  0 siblings, 0 replies; 5+ messages in thread
From: Jacob Keller @ 2024-04-15 21:06 UTC (permalink / raw)
  To: Eric Wong; +Cc: Konstantin Ryabitsev, meta



On 4/13/2024 12:58 AM, Eric Wong wrote:
> Jacob Keller <jacob.e.keller@intel.com> wrote:
>>
>>
>> On 4/11/2024 3:42 PM, Konstantin Ryabitsev wrote:
>>> On Thu, Apr 11, 2024 at 03:32:43PM -0700, Jacob Keller wrote:
>>>> I sometimes download patch series off of public inbox hosted servers to
>>>> apply with git-am. Occasionally I have found that these do not apply
>>>> cleanly because the thread is not sorted in patch order.
>>>
>>> It's more than just the order -- if there are replies in the thread, the mbox
>>> file won't apply either.
>>>
>>
>> If the order was correct, it is usually easy enough to just "git am
>> --skip" the patches which have no content. However...
>>
>>> This is the reason why the b4 tool exists:
>>> https://b4.docs.kernel.org/
>>>
>>
>> This is extremely useful and I was unaware of its existence. Thanks!!
> 
> Good to know b4 works for you.
> 
> FWIW, t.mbox.gz uses NNTP article number ordering to ensure
> batched fetches work and duplicates can't get served.
> 
> IOW, it fetches a batch of 1000 header rows at a time from a
> single thread to avoid using too much memory for a single
> request.  The next batch (another 1K) only gets fetched once the
> current batch is done.  So it must order by article number to
> deal with that, especially since new messages may appear in the
> thread while the current batch is being streamed.
> 
> Identical Date: headers can appear multiple times in the same
> thread, so using a >= or > comparison for retrieval wouldn't
> work.
> 
> Of course, most threads are <1000 messages, so I did think
> about sorting by Date for small threads (as we do for the HTML
> output)...
> 
> However with the current t.mbox.gz code, we expect (and can
> handle) new messages appearing while a t.mbox.gz is being
> served.  So if a thread has 10 messages, the first batch fetch
> would only return those 10.  However, while a client is slowly
> downloading the first 10 messages, more messages show up.  The
> current retrieval scheme allows new messages in a thread to show
> up without needing another request.
> 
> AFAIK, it's actually easier and fewer SQL statements to do the
> current way.

And given that there are reasons to download a thread than just patches,
I think it makes sense. I can use b4 and get exactly what I want with
not much more or different effort on my part, and the public-inbox side
doesn't need to bloat to handle that, or make other cases fragile or slower.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-15 21:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-11 22:32 downloading t.mbox.gz messages are not sorted in expected order Jacob Keller
2024-04-11 22:42 ` Konstantin Ryabitsev
2024-04-12 23:59   ` Jacob Keller
2024-04-13  7:58     ` Eric Wong
2024-04-15 21:06       ` Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).