* [PATCH] doc/extindex: document --dedupe switch
@ 2023-11-24 4:18 Eric Wong
2023-11-24 12:50 ` Štěpán Němec
0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2023-11-24 4:18 UTC (permalink / raw)
To: meta
We've had it since v1.7.0 when -extindex was introduced,
but it was never documented outside of commit messages.
---
Documentation/public-inbox-extindex.pod | 26 +++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/Documentation/public-inbox-extindex.pod b/Documentation/public-inbox-extindex.pod
index be4ea4de..361eb43f 100644
--- a/Documentation/public-inbox-extindex.pod
+++ b/Documentation/public-inbox-extindex.pod
@@ -47,6 +47,20 @@ C<indexlevel> set to C<basic> and their respective Xapian
public-inboxes where cross-posting is common, this allows
significant space savings on Xapian indices.
+=item --dedupe=MSGID
+
+=item --dedupe
+
+Rerun deduplication on messages of a Message-IDs or all messages
+if no Message-ID is specified. Deduplication rules may change
+and evolve over time, especially if filters are involved.
+
+C<--dedupe=MSGID> may be specified multiple times to deduplicate
+multiple Message-IDs.
+
+Use this if you see C<W: BUG? $MSGID not deduplicated properly>
+warnings from WWW logs.
+
=item --gc
Perform garbage collection instead of indexing. Use this if
@@ -61,10 +75,6 @@ used for in-place upgrades and bugfixes while read-only server
processes are utilizing the index. Keep in mind this roughly
doubles the size of the already-large Xapian database.
-The extindex locks will be released roughly every 10s to
-allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)>
-processes to write to the extindex.
-
=item --fast
Used with C<--reindex>, it will only look for new and stale
@@ -131,6 +141,14 @@ Default: none, uses C<publicinbox.indexBatchSize>
Occasionally, public-inbox will update its schema version and
require a full index by running this command.
+=head1 LOCKING
+
+It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while
+other processes are writing to covered inboxes or extindex.
+The extindex locks will be released roughly every 10s to
+allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)>
+processes to write to the extindex.
+
=head1 CONTACT
Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] doc/extindex: document --dedupe switch
2023-11-24 4:18 [PATCH] doc/extindex: document --dedupe switch Eric Wong
@ 2023-11-24 12:50 ` Štěpán Němec
2023-11-24 23:58 ` Eric Wong
0 siblings, 1 reply; 7+ messages in thread
From: Štěpán Němec @ 2023-11-24 12:50 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On Fri, 24 Nov 2023 04:18:19 +0000
Eric Wong wrote:
> --- a/Documentation/public-inbox-extindex.pod
> +++ b/Documentation/public-inbox-extindex.pod
> @@ -47,6 +47,20 @@ C<indexlevel> set to C<basic> and their respective Xapian
> public-inboxes where cross-posting is common, this allows
> significant space savings on Xapian indices.
>
> +=item --dedupe=MSGID
> +
> +=item --dedupe
> +
> +Rerun deduplication on messages of a Message-IDs or all messages
^^^^^^^^^^^^^^^^
"with the given Message-ID"? (or just drop the trailing "s")
> +if no Message-ID is specified. Deduplication rules may change
> +and evolve over time, especially if filters are involved.
> +
> +C<--dedupe=MSGID> may be specified multiple times to deduplicate
> +multiple Message-IDs.
[...]
--
Štěpán
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] doc/extindex: document --dedupe switch
2023-11-24 12:50 ` Štěpán Němec
@ 2023-11-24 23:58 ` Eric Wong
2023-11-25 8:36 ` Štěpán Němec
0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2023-11-24 23:58 UTC (permalink / raw)
To: Štěpán Němec; +Cc: meta
Štěpán Němec <stepnem@smrk.net> wrote:
> Eric Wong wrote:
> > +++ b/Documentation/public-inbox-extindex.pod
> > @@ -47,6 +47,20 @@ C<indexlevel> set to C<basic> and their respective Xapian
> > public-inboxes where cross-posting is common, this allows
> > significant space savings on Xapian indices.
> >
> > +=item --dedupe=MSGID
> > +
> > +=item --dedupe
> > +
> > +Rerun deduplication on messages of a Message-IDs or all messages
> ^^^^^^^^^^^^^^^^
> "with the given Message-ID"? (or just drop the trailing "s")
Yes, the former, thanks.
I'm also wondering if it's necessary to have a blurb about NOT
supporting comma-delimited Message-IDs on the CLI, since some
strange Message-IDs may have a comma in them.
Anyways, I'll squash something like this in:
diff --git a/Documentation/public-inbox-extindex.pod b/Documentation/public-inbox-extindex.pod
index 361eb43f..3a2911e2 100644
--- a/Documentation/public-inbox-extindex.pod
+++ b/Documentation/public-inbox-extindex.pod
@@ -51,9 +51,9 @@ significant space savings on Xapian indices.
=item --dedupe
-Rerun deduplication on messages of a Message-IDs or all messages
-if no Message-ID is specified. Deduplication rules may change
-and evolve over time, especially if filters are involved.
+Rerun deduplication on messages of with the given Message-ID or
+all messages if no Message-ID is specified. Deduplication rules may
+change and evolve over time, especially if filters are involved.
C<--dedupe=MSGID> may be specified multiple times to deduplicate
multiple Message-IDs.
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] doc/extindex: document --dedupe switch
2023-11-24 23:58 ` Eric Wong
@ 2023-11-25 8:36 ` Štěpán Němec
2023-11-25 11:49 ` Eric Wong
0 siblings, 1 reply; 7+ messages in thread
From: Štěpán Němec @ 2023-11-25 8:36 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On Fri, 24 Nov 2023 23:58:29 +0000
Eric Wong wrote:
>> > +Rerun deduplication on messages of a Message-IDs or all messages
>> ^^^^^^^^^^^^^^^^
>> "with the given Message-ID"? (or just drop the trailing "s")
>
> Yes, the former, thanks.
>
> I'm also wondering if it's necessary to have a blurb about NOT
> supporting comma-delimited Message-IDs on the CLI, since some
> strange Message-IDs may have a comma in them.
I think the description is already quite clear on that,
esp. given the subsequent "may be specified multiple times
...". But the "on the CLI" qualification intrigues me: does
that mean that comma-delimited MIDs _are_ supported
somewhere else?
> Anyways, I'll squash something like this in:
[...]
> +Rerun deduplication on messages of with the given Message-ID or
^^^^^^^
not so fast :-P
> +all messages if no Message-ID is specified. Deduplication rules may
> +change and evolve over time, especially if filters are involved.
>
> C<--dedupe=MSGID> may be specified multiple times to deduplicate
> multiple Message-IDs.
Thanks,
Štěpán
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] doc/extindex: document --dedupe switch
2023-11-25 8:36 ` Štěpán Němec
@ 2023-11-25 11:49 ` Eric Wong
2023-11-25 20:25 ` [PATCH v3] " Eric Wong
2023-11-25 21:35 ` [PATCH] " Štěpán Němec
0 siblings, 2 replies; 7+ messages in thread
From: Eric Wong @ 2023-11-25 11:49 UTC (permalink / raw)
To: Štěpán Němec; +Cc: meta
Štěpán Němec <stepnem@smrk.net> wrote:
> Eric Wong wrote:
> >
> > I'm also wondering if it's necessary to have a blurb about NOT
> > supporting comma-delimited Message-IDs on the CLI, since some
> > strange Message-IDs may have a comma in them.
>
> I think the description is already quite clear on that,
> esp. given the subsequent "may be specified multiple times
> ...". But the "on the CLI" qualification intrigues me: does
> that mean that comma-delimited MIDs _are_ supported
> somewhere else?
Not MIDs, but lei has --lock= for various mbox locking methods.
cindex will also support combinations of
--join=aggressive,reset,dt:...,window:$INTEGER
I like to support commas when there's unambiguous keywords/commands
that can be easily parsed. `dt:$approxidate' in --join could have
date-times with commas in them, but I don't think anybody would
use those characters in a command.
> > +Rerun deduplication on messages of with the given Message-ID or
> ^^^^^^^
> not so fast :-P
Thanks. Will s/of // when I commit when more awake.
Getting even more scatter-brained :x
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3] doc/extindex: document --dedupe switch
2023-11-25 11:49 ` Eric Wong
@ 2023-11-25 20:25 ` Eric Wong
2023-11-25 21:35 ` [PATCH] " Štěpán Němec
1 sibling, 0 replies; 7+ messages in thread
From: Eric Wong @ 2023-11-25 20:25 UTC (permalink / raw)
To: Štěpán Němec; +Cc: meta
Eric Wong <e@80x24.org> wrote:
> Štěpán Němec <stepnem@smrk.net> wrote:
> > Eric Wong wrote:
> > > +Rerun deduplication on messages of with the given Message-ID or
> > ^^^^^^^
> > not so fast :-P
>
> Thanks. Will s/of // when I commit when more awake.
> Getting even more scatter-brained :x
OK, will probably push this out:
--------8<--------
Subject: [PATCH] doc/extindex: document --dedupe switch
We've had it since v1.7.0 when -extindex was introduced,
but it was never documented outside of commit messages.
---
Documentation/public-inbox-extindex.pod | 26 +++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/Documentation/public-inbox-extindex.pod b/Documentation/public-inbox-extindex.pod
index be4ea4de..b53e45ed 100644
--- a/Documentation/public-inbox-extindex.pod
+++ b/Documentation/public-inbox-extindex.pod
@@ -47,6 +47,20 @@ C<indexlevel> set to C<basic> and their respective Xapian
public-inboxes where cross-posting is common, this allows
significant space savings on Xapian indices.
+=item --dedupe=MSGID
+
+=item --dedupe
+
+Rerun deduplication on messages with the given Message-ID or
+all messages if no Message-ID is specified. Deduplication rules may
+change and evolve over time, especially if filters are involved.
+
+C<--dedupe=MSGID> may be specified multiple times to deduplicate
+multiple Message-IDs.
+
+Use this if you see C<W: BUG? $MSGID not deduplicated properly>
+warnings from WWW logs.
+
=item --gc
Perform garbage collection instead of indexing. Use this if
@@ -61,10 +75,6 @@ used for in-place upgrades and bugfixes while read-only server
processes are utilizing the index. Keep in mind this roughly
doubles the size of the already-large Xapian database.
-The extindex locks will be released roughly every 10s to
-allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)>
-processes to write to the extindex.
-
=item --fast
Used with C<--reindex>, it will only look for new and stale
@@ -131,6 +141,14 @@ Default: none, uses C<publicinbox.indexBatchSize>
Occasionally, public-inbox will update its schema version and
require a full index by running this command.
+=head1 LOCKING
+
+It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while
+other processes are writing to covered inboxes or extindex.
+The extindex locks will be released roughly every 10s to
+allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)>
+processes to write to the extindex.
+
=head1 CONTACT
Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] doc/extindex: document --dedupe switch
2023-11-25 11:49 ` Eric Wong
2023-11-25 20:25 ` [PATCH v3] " Eric Wong
@ 2023-11-25 21:35 ` Štěpán Němec
1 sibling, 0 replies; 7+ messages in thread
From: Štěpán Němec @ 2023-11-25 21:35 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On Sat, 25 Nov 2023 11:49:39 +0000
Eric Wong wrote:
>> ...". But the "on the CLI" qualification intrigues me: does
>> that mean that comma-delimited MIDs _are_ supported
>> somewhere else?
>
> Not MIDs, but lei has --lock= for various mbox locking methods.
> cindex will also support combinations of
> --join=aggressive,reset,dt:...,window:$INTEGER
I see, thanks.
On Sat, 25 Nov 2023 20:25:20 +0000
Eric Wong wrote:
> Eric Wong <e@80x24.org> wrote:
>> Štěpán Němec <stepnem@smrk.net> wrote:
>> > Eric Wong wrote:
>> > > +Rerun deduplication on messages of with the given Message-ID or
>> > ^^^^^^^
>> > not so fast :-P
>>
>> Thanks. Will s/of // when I commit when more awake.
>> Getting even more scatter-brained :x
>
> OK, will probably push this out:
>
> --------8<--------
> Subject: [PATCH] doc/extindex: document --dedupe switch
>
> We've had it since v1.7.0 when -extindex was introduced,
> but it was never documented outside of commit messages.
> ---
> Documentation/public-inbox-extindex.pod | 26 +++++++++++++++++++++----
> 1 file changed, 22 insertions(+), 4 deletions(-)
LGTM
--
Štěpán
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-11-25 21:35 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-24 4:18 [PATCH] doc/extindex: document --dedupe switch Eric Wong
2023-11-24 12:50 ` Štěpán Němec
2023-11-24 23:58 ` Eric Wong
2023-11-25 8:36 ` Štěpán Němec
2023-11-25 11:49 ` Eric Wong
2023-11-25 20:25 ` [PATCH v3] " Eric Wong
2023-11-25 21:35 ` [PATCH] " Štěpán Němec
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).