* downloading t.mbox.gz messages are not sorted in expected order
@ 2024-04-11 22:32 Jacob Keller
2024-04-11 22:42 ` Konstantin Ryabitsev
0 siblings, 1 reply; 5+ messages in thread
From: Jacob Keller @ 2024-04-11 22:32 UTC (permalink / raw)
To: meta
Hi,
I sometimes download patch series off of public inbox hosted servers to
apply with git-am. Occasionally I have found that these do not apply
cleanly because the thread is not sorted in patch order.
For an example, see
https://lore.kernel.org/lkml/20240308230557.805580-1-alex.williamson@redhat.com/
This thread shows up and the links are all in order on the HTML view,
but if I download the t.mbox and split it using git mailsplit, I get the
following:
$ git mailsplit t.mbox -osplit
$ cd split
$ for f in *; do echo $f ; grep '^Subject:' $f; done
> 0001
> Subject: [PATCH v2 0/7] vfio: Interrupt eventfd hardening
> 0002
> Subject: [PATCH v2 2/7] vfio/pci: Lock external INTx masking ops
> 0003
> Subject: [PATCH v2 3/7] vfio: Introduce interface to flush virqfd inject workqueue
> 0004
> Subject: [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ
> 0005
> Subject: [PATCH v2 4/7] vfio/pci: Create persistent INTx handler
> 0006
> Subject: [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
> 0007
> Subject: [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup
> 0008
> Subject: [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger
> 0009
> Subject: RE: [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup
> 0010
> Subject: RE: [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
> 0011
> Subject: RE: [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler
> 0012
> Subject: Re: [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx
> 0013
> Subject: Re: [PATCH v2 2/7] vfio/pci: Lock external INTx masking ops
> 0014
> Subject: Re: [PATCH v2 3/7] vfio: Introduce interface to flush virqfd inject
> 0015
> Subject: Re: [PATCH v2 4/7] vfio/pci: Create persistent INTx handler
> 0016
> Subject: Re: [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup
> 0017
> Subject: Re: [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
> 0018
> Subject: Re: [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler
> 0019
> Subject: Re: [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx
> 0020
> Subject: Re: [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers
>
I'm not really sure why messages appear in this order in the t.mbox
file. The messages appear sorted properly when viewing the thread via
the next and previous links and in the summary of links at the HTML
display..
Any ideas? It is rather frustrating to have to realize that an mbox is
not sorted and I have to split and sort it manually...
I tried looking through the the public-inbox to see if I could figure it
out but its not clear to me.
Maybe its because of how thread_html uses walk_thread to go in a
particular order where was the mbox code seems to just iterate over
messages in the thread and go in essentially a random order?
Thanks,
Jake
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: downloading t.mbox.gz messages are not sorted in expected order 2024-04-11 22:32 downloading t.mbox.gz messages are not sorted in expected order Jacob Keller @ 2024-04-11 22:42 ` Konstantin Ryabitsev 2024-04-12 23:59 ` Jacob Keller 0 siblings, 1 reply; 5+ messages in thread From: Konstantin Ryabitsev @ 2024-04-11 22:42 UTC (permalink / raw) To: Jacob Keller; +Cc: meta On Thu, Apr 11, 2024 at 03:32:43PM -0700, Jacob Keller wrote: > I sometimes download patch series off of public inbox hosted servers to > apply with git-am. Occasionally I have found that these do not apply > cleanly because the thread is not sorted in patch order. It's more than just the order -- if there are replies in the thread, the mbox file won't apply either. This is the reason why the b4 tool exists: https://b4.docs.kernel.org/ > For an example, see > https://lore.kernel.org/lkml/20240308230557.805580-1-alex.williamson@redhat.com/ $ b4 am -o/tmp https://lore.kernel.org/lkml/20240308230557.805580-1-alex.williamson@redhat.com/ Grabbing thread from lore.kernel.org/all/20240308230557.805580-1-alex.williamson@redhat.com/t.mbox.gz Analyzing 20 messages in the thread Looking for additional code-review trailers on lore.kernel.org Checking attestation on all messages, may take a moment... --- ✓ [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ ✓ [PATCH v2 2/7] vfio/pci: Lock external INTx masking ops + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) ✓ [PATCH v2 3/7] vfio: Introduce interface to flush virqfd inject workqueue + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) ✓ [PATCH v2 4/7] vfio/pci: Create persistent INTx handler + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) ✓ [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com) + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) ✓ [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com) + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) + Tested-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) ✓ [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com) + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) --- ✓ Signed: DKIM/redhat.com --- Total patches: 7 --- Cover: /tmp/v2_20240308_alex_williamson_vfio_interrupt_eventfd_hardening.cover Link: https://lore.kernel.org/r/20240308230557.805580-1-alex.williamson@redhat.com Base: not specified git am /tmp/v2_20240308_alex_williamson_vfio_interrupt_eventfd_hardening.mbx -K ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: downloading t.mbox.gz messages are not sorted in expected order 2024-04-11 22:42 ` Konstantin Ryabitsev @ 2024-04-12 23:59 ` Jacob Keller 2024-04-13 7:58 ` Eric Wong 0 siblings, 1 reply; 5+ messages in thread From: Jacob Keller @ 2024-04-12 23:59 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: meta On 4/11/2024 3:42 PM, Konstantin Ryabitsev wrote: > On Thu, Apr 11, 2024 at 03:32:43PM -0700, Jacob Keller wrote: >> I sometimes download patch series off of public inbox hosted servers to >> apply with git-am. Occasionally I have found that these do not apply >> cleanly because the thread is not sorted in patch order. > > It's more than just the order -- if there are replies in the thread, the mbox > file won't apply either. > If the order was correct, it is usually easy enough to just "git am --skip" the patches which have no content. However... > This is the reason why the b4 tool exists: > https://b4.docs.kernel.org/ > This is extremely useful and I was unaware of its existence. Thanks!! >> For an example, see >> https://lore.kernel.org/lkml/20240308230557.805580-1-alex.williamson@redhat.com/ > > $ b4 am -o/tmp https://lore.kernel.org/lkml/20240308230557.805580-1-alex.williamson@redhat.com/ > Grabbing thread from lore.kernel.org/all/20240308230557.805580-1-alex.williamson@redhat.com/t.mbox.gz > Analyzing 20 messages in the thread > Looking for additional code-review trailers on lore.kernel.org > Checking attestation on all messages, may take a moment... > --- > ✓ [PATCH v2 1/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ > ✓ [PATCH v2 2/7] vfio/pci: Lock external INTx masking ops > + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) > ✓ [PATCH v2 3/7] vfio: Introduce interface to flush virqfd inject workqueue > + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) > ✓ [PATCH v2 4/7] vfio/pci: Create persistent INTx handler > + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) > ✓ [PATCH v2 5/7] vfio/platform: Disable virqfds on cleanup > + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com) > + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) > ✓ [PATCH v2 6/7] vfio/platform: Create persistent IRQ handlers > + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com) > + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) > + Tested-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) > ✓ [PATCH v2 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger > + Reviewed-by: Kevin Tian <kevin.tian@intel.com> (✓ DKIM/intel.com) > + Reviewed-by: Eric Auger <eric.auger@redhat.com> (✓ DKIM/redhat.com) > --- > ✓ Signed: DKIM/redhat.com > --- > Total patches: 7 > --- > Cover: /tmp/v2_20240308_alex_williamson_vfio_interrupt_eventfd_hardening.cover > Link: https://lore.kernel.org/r/20240308230557.805580-1-alex.williamson@redhat.com > Base: not specified > git am /tmp/v2_20240308_alex_williamson_vfio_interrupt_eventfd_hardening.mbx > > -K ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: downloading t.mbox.gz messages are not sorted in expected order 2024-04-12 23:59 ` Jacob Keller @ 2024-04-13 7:58 ` Eric Wong 2024-04-15 21:06 ` Jacob Keller 0 siblings, 1 reply; 5+ messages in thread From: Eric Wong @ 2024-04-13 7:58 UTC (permalink / raw) To: Jacob Keller; +Cc: Konstantin Ryabitsev, meta Jacob Keller <jacob.e.keller@intel.com> wrote: > > > On 4/11/2024 3:42 PM, Konstantin Ryabitsev wrote: > > On Thu, Apr 11, 2024 at 03:32:43PM -0700, Jacob Keller wrote: > >> I sometimes download patch series off of public inbox hosted servers to > >> apply with git-am. Occasionally I have found that these do not apply > >> cleanly because the thread is not sorted in patch order. > > > > It's more than just the order -- if there are replies in the thread, the mbox > > file won't apply either. > > > > If the order was correct, it is usually easy enough to just "git am > --skip" the patches which have no content. However... > > > This is the reason why the b4 tool exists: > > https://b4.docs.kernel.org/ > > > > This is extremely useful and I was unaware of its existence. Thanks!! Good to know b4 works for you. FWIW, t.mbox.gz uses NNTP article number ordering to ensure batched fetches work and duplicates can't get served. IOW, it fetches a batch of 1000 header rows at a time from a single thread to avoid using too much memory for a single request. The next batch (another 1K) only gets fetched once the current batch is done. So it must order by article number to deal with that, especially since new messages may appear in the thread while the current batch is being streamed. Identical Date: headers can appear multiple times in the same thread, so using a >= or > comparison for retrieval wouldn't work. Of course, most threads are <1000 messages, so I did think about sorting by Date for small threads (as we do for the HTML output)... However with the current t.mbox.gz code, we expect (and can handle) new messages appearing while a t.mbox.gz is being served. So if a thread has 10 messages, the first batch fetch would only return those 10. However, while a client is slowly downloading the first 10 messages, more messages show up. The current retrieval scheme allows new messages in a thread to show up without needing another request. AFAIK, it's actually easier and fewer SQL statements to do the current way. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: downloading t.mbox.gz messages are not sorted in expected order 2024-04-13 7:58 ` Eric Wong @ 2024-04-15 21:06 ` Jacob Keller 0 siblings, 0 replies; 5+ messages in thread From: Jacob Keller @ 2024-04-15 21:06 UTC (permalink / raw) To: Eric Wong; +Cc: Konstantin Ryabitsev, meta On 4/13/2024 12:58 AM, Eric Wong wrote: > Jacob Keller <jacob.e.keller@intel.com> wrote: >> >> >> On 4/11/2024 3:42 PM, Konstantin Ryabitsev wrote: >>> On Thu, Apr 11, 2024 at 03:32:43PM -0700, Jacob Keller wrote: >>>> I sometimes download patch series off of public inbox hosted servers to >>>> apply with git-am. Occasionally I have found that these do not apply >>>> cleanly because the thread is not sorted in patch order. >>> >>> It's more than just the order -- if there are replies in the thread, the mbox >>> file won't apply either. >>> >> >> If the order was correct, it is usually easy enough to just "git am >> --skip" the patches which have no content. However... >> >>> This is the reason why the b4 tool exists: >>> https://b4.docs.kernel.org/ >>> >> >> This is extremely useful and I was unaware of its existence. Thanks!! > > Good to know b4 works for you. > > FWIW, t.mbox.gz uses NNTP article number ordering to ensure > batched fetches work and duplicates can't get served. > > IOW, it fetches a batch of 1000 header rows at a time from a > single thread to avoid using too much memory for a single > request. The next batch (another 1K) only gets fetched once the > current batch is done. So it must order by article number to > deal with that, especially since new messages may appear in the > thread while the current batch is being streamed. > > Identical Date: headers can appear multiple times in the same > thread, so using a >= or > comparison for retrieval wouldn't > work. > > Of course, most threads are <1000 messages, so I did think > about sorting by Date for small threads (as we do for the HTML > output)... > > However with the current t.mbox.gz code, we expect (and can > handle) new messages appearing while a t.mbox.gz is being > served. So if a thread has 10 messages, the first batch fetch > would only return those 10. However, while a client is slowly > downloading the first 10 messages, more messages show up. The > current retrieval scheme allows new messages in a thread to show > up without needing another request. > > AFAIK, it's actually easier and fewer SQL statements to do the > current way. And given that there are reasons to download a thread than just patches, I think it makes sense. I can use b4 and get exactly what I want with not much more or different effort on my part, and the public-inbox side doesn't need to bloat to handle that, or make other cases fragile or slower. Thanks, Jake ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-04-15 21:06 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-11 22:32 downloading t.mbox.gz messages are not sorted in expected order Jacob Keller 2024-04-11 22:42 ` Konstantin Ryabitsev 2024-04-12 23:59 ` Jacob Keller 2024-04-13 7:58 ` Eric Wong 2024-04-15 21:06 ` Jacob Keller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).