Hello, I setup notmuch for my dad specifically for mass-extracting pdf newspapers he receives. When the pdf was stored successfully the message gets the tag attachments-extracted added. Today I wanted to go further and make use of afew specifically for mail moving to free up space on the IMAP server. I initially ignored the List-Id feature but now wanted to make use of it. Configured notmuch appropriately, configured afew. Ran notmuch reindex. And now all custom tags were wiped, especially attachments-extracted. I now can't see if the pdf of a certain message was already saved. This is 3 years of 5 newspapers a weak. I must delete the files and re-extract them. I did not expect that, especially after reading the man page which doesn't warn about resetting tags. So here my question: Is this a bug or a feature? Regards Franz
Franz Fellner <alpine.art.de@gmail.com> writes:
> Ran notmuch reindex.
> And now all custom tags were wiped, especially attachments-extracted.
> I now can't see if the pdf of a certain message was already saved.
> This is 3 years of 5 newspapers a weak. I must delete the files
> and re-extract them.
>
> I did not expect that, especially after reading the man page
> which doesn't warn about resetting tags.
>
> So here my question:
> Is this a bug or a feature?
It sounds like a bug. But it's a bug that the test suite specifically
tests for ("reindex preserves tags" in T700-reindex.sh) so I'm not sure
what is going on. To eliminate the obvious, does the test suite pass for
you?
d
On Mon May 4 07:30:38 2020, David Bremner <david@tethera.net> wrote: > Franz Fellner <alpine.art.de@gmail.com> writes: > > > Ran notmuch reindex. > > And now all custom tags were wiped, especially attachments-extracted. > > I now can't see if the pdf of a certain message was already saved. > > This is 3 years of 5 newspapers a weak. I must delete the files > > and re-extract them. > > > > I did not expect that, especially after reading the man page > > which doesn't warn about resetting tags. > > > > So here my question: > > Is this a bug or a feature? > > It sounds like a bug. But it's a bug that the test suite specifically > tests for ("reindex preserves tags" in T700-reindex.sh) so I'm not sure > what is going on. To eliminate the obvious, does the test suite pass for > you? > > d > > -- T700 passes. Wanted to know if I was wrong: [15:19] $ notmuch search tag:adz date:1M thread:00000000000099a2 Wed. 15:15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-30 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:000000000000995b Tue. 15:15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-29 als PDF (adz attachment attachments-extracted inbox news newspaper unread) thread:000000000000993c April 27 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-28 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:00000000000098e2 April 24 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-25 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:00000000000098bd April 23 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-24 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:0000000000009882 April 22 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-23 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:0000000000009858 April 21 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-22 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:00000000000097ac April 15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-16 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:0000000000009790 April 14 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-15 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:0000000000009762 April 13 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-14 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:0000000000009730 April 10 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-11 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:0000000000009710 April 09 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-10 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:00000000000096e9 April 08 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-09 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:00000000000096db April 07 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-08 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:00000000000096b9 April 06 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-07 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:000000000000966e April 03 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-04 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:0000000000009652 April 02 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-03 als PDF (adz attachment attachments-extracted inbox news newspaper) thread:0000000000009636 April 01 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-02 als PDF (adz attachment attachments-extracted inbox news newspaper) [15:19] $ notmuch reindex tag:adz date:1M [15:19] $ notmuch search tag:adz date:1M thread:00000000000099a2 Wed. 15:15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-30 als PDF (adz attachment inbox news newspaper) thread:000000000000995b Tue. 15:15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-29 als PDF (adz attachment inbox news newspaper unread) thread:000000000000993c April 27 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-28 als PDF (adz attachment inbox news newspaper) thread:00000000000098e2 April 24 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-25 als PDF (adz attachment inbox news newspaper) thread:00000000000098bd April 23 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-24 als PDF (adz attachment inbox news newspaper) thread:0000000000009882 April 22 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-23 als PDF (adz attachment inbox news newspaper) thread:0000000000009858 April 21 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-22 als PDF (adz attachment inbox news newspaper) thread:00000000000097ac April 15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-16 als PDF (adz attachment inbox news newspaper) thread:0000000000009790 April 14 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-15 als PDF (adz attachment inbox news newspaper) thread:0000000000009762 April 13 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-14 als PDF (adz attachment inbox news newspaper) thread:0000000000009730 April 10 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-11 als PDF (adz attachment inbox news newspaper) thread:0000000000009710 April 09 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-10 als PDF (adz attachment inbox news newspaper) thread:00000000000096e9 April 08 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-09 als PDF (adz attachment inbox news newspaper) thread:00000000000096db April 07 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-08 als PDF (adz attachment inbox news newspaper) thread:00000000000096b9 April 06 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-07 als PDF (adz attachment inbox news newspaper) thread:000000000000966e April 03 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-04 als PDF (adz attachment inbox news newspaper) thread:0000000000009652 April 02 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-03 als PDF (adz attachment inbox news newspaper) thread:0000000000009636 April 01 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-02 als PDF (adz attachment inbox news newspaper) So either I hit a case T700 doesn't cover. Played a little bit: =================== diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh index 9e795896..4e76ad3e 100755 --- a/test/T700-reindex.sh +++ b/test/T700-reindex.sh @@ -5,6 +5,7 @@ test_description='reindexing messages' add_email_corpus notmuch tag +usertag1 '*' +notmuch tag +attachments-extracted '*' notmuch search '*' | notmuch_search_sanitize > initial-threads notmuch search --output=messages '*' > initial-message-ids =================== And it fails. The tag "attachments-extracted" got removed. Got curious, it seems as soon as the additional tag starts with "attachment" "notmuch reindex" removes it. With "extracted-attachments" everything is fine. Thought it might clash with preserved tags. But "unreadable" and "inboxable" works just fine. So it has to do with special handling of "attachment". Regards Franz
Franz Fellner <alpine.art.de@gmail.com> writes:
>
> And it fails. The tag "attachments-extracted" got removed.
> Got curious, it seems as soon as the additional tag starts with "attachment"
> "notmuch reindex" removes it.
> With "extracted-attachments" everything is fine.
> Thought it might clash with preserved tags.
> But "unreadable" and "inboxable" works just fine.
> So it has to do with special handling of "attachment".
Thanks for this. I suspected there might be something funny there (since
the tag 'attachment' is handled specially), but I didn't know how. Now I
know where to look.
David
In id:1588595993-ner-8.651@TPL520 Franz Fellner reported that tags starting with 'attachment' are removed by 'notmuch reindex'. This is probably related to the use of STRNCMP_LITERAL in _notmuch_message_remove_indexed_terms. --- test/T700-reindex.sh | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh index 9e795896..7b7e52de 100755 --- a/test/T700-reindex.sh +++ b/test/T700-reindex.sh @@ -33,6 +33,15 @@ notmuch reindex '*' notmuch dump > OUTPUT test_expect_equal_file initial-dump OUTPUT +test_begin_subtest 'reindex preserves tags with special prefixes' +test_subtest_known_broken +notmuch tag +attachment2 +encrypted2 +signed2 '*' +notmuch dump > EXPECTED +notmuch reindex '*' +notmuch dump > OUTPUT +notmuch tag -attachment2 -encrypted2 -signed2 '*' +test_expect_equal_file EXPECTED OUTPUT + test_begin_subtest 'reindex moves a message between threads' notmuch search --output=threads id:87iqd9rn3l.fsf@vertex.dottedmag > EXPECTED # re-parent -- 2.26.2
strncmp looks for a prefix that matches, which is very much not what we want here. This fixes the bug reported by Franz Fellner in id:1588595993-ner-8.651@TPL520 --- lib/message.cc | 6 +++--- test/T700-reindex.sh | 1 - 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/message.cc b/lib/message.cc index 5c9b58b2..0fa0eb3a 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -751,9 +751,9 @@ _notmuch_message_remove_indexed_terms (notmuch_message_t *message) const char *tag = notmuch_tags_get (tags); - if (STRNCMP_LITERAL (tag, "encrypted") != 0 && - STRNCMP_LITERAL (tag, "signed") != 0 && - STRNCMP_LITERAL (tag, "attachment") != 0) { + if (strcmp (tag, "encrypted") != 0 && + strcmp (tag, "signed") != 0 && + strcmp (tag, "attachment") != 0) { std::string term = tag_prefix + tag; message->doc.add_term (term); } diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh index 7b7e52de..3d7c930d 100755 --- a/test/T700-reindex.sh +++ b/test/T700-reindex.sh @@ -34,7 +34,6 @@ notmuch dump > OUTPUT test_expect_equal_file initial-dump OUTPUT test_begin_subtest 'reindex preserves tags with special prefixes' -test_subtest_known_broken notmuch tag +attachment2 +encrypted2 +signed2 '*' notmuch dump > EXPECTED notmuch reindex '*' -- 2.26.2
On Mon, May 04 2020, David Bremner wrote: > In id:1588595993-ner-8.651@TPL520 Franz Fellner reported that tags > starting with 'attachment' are removed by 'notmuch reindex'. This is > probably related to the use of STRNCMP_LITERAL in Haa, I looked this briefly but failed to see it is STRNCMP_LITERAL, not STRCMP_LITERAL (the latter could be optimized strcmp using memcmp w/ constant len) Series LGTM (I'm trying to look away that 'we' passive ;) Tomi > _notmuch_message_remove_indexed_terms. > --- > test/T700-reindex.sh | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh > index 9e795896..7b7e52de 100755 > --- a/test/T700-reindex.sh > +++ b/test/T700-reindex.sh > @@ -33,6 +33,15 @@ notmuch reindex '*' > notmuch dump > OUTPUT > test_expect_equal_file initial-dump OUTPUT > > +test_begin_subtest 'reindex preserves tags with special prefixes' > +test_subtest_known_broken > +notmuch tag +attachment2 +encrypted2 +signed2 '*' > +notmuch dump > EXPECTED > +notmuch reindex '*' > +notmuch dump > OUTPUT > +notmuch tag -attachment2 -encrypted2 -signed2 '*' > +test_expect_equal_file EXPECTED OUTPUT > + > test_begin_subtest 'reindex moves a message between threads' > notmuch search --output=threads id:87iqd9rn3l.fsf@vertex.dottedmag > EXPECTED > # re-parent > -- > 2.26.2 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > https://notmuchmail.org/mailman/listinfo/notmuch
Thank you very much.
I confirm that the patch fixes the issue.
Regards
Franz
On Mon May 4 11:00:24 2020, David Bremner <david@tethera.net> wrote:
> strncmp looks for a prefix that matches, which is very much not what
> we want here. This fixes the bug reported by Franz Fellner in
> id:1588595993-ner-8.651@TPL520
> ---
> lib/message.cc | 6 +++---
> test/T700-reindex.sh | 1 -
> 2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/lib/message.cc b/lib/message.cc
> index 5c9b58b2..0fa0eb3a 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -751,9 +751,9 @@ _notmuch_message_remove_indexed_terms (notmuch_message_t *message)
>
> const char *tag = notmuch_tags_get (tags);
>
> - if (STRNCMP_LITERAL (tag, "encrypted") != 0 &&
> - STRNCMP_LITERAL (tag, "signed") != 0 &&
> - STRNCMP_LITERAL (tag, "attachment") != 0) {
> + if (strcmp (tag, "encrypted") != 0 &&
> + strcmp (tag, "signed") != 0 &&
> + strcmp (tag, "attachment") != 0) {
> std::string term = tag_prefix + tag;
> message->doc.add_term (term);
> }
> diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh
> index 7b7e52de..3d7c930d 100755
> --- a/test/T700-reindex.sh
> +++ b/test/T700-reindex.sh
> @@ -34,7 +34,6 @@ notmuch dump > OUTPUT
> test_expect_equal_file initial-dump OUTPUT
>
> test_begin_subtest 'reindex preserves tags with special prefixes'
> -test_subtest_known_broken
> notmuch tag +attachment2 +encrypted2 +signed2 '*'
> notmuch dump > EXPECTED
> notmuch reindex '*'
> --
> 2.26.2
>
>
>
--
Tomi Ollila <tomi.ollila@iki.fi> writes:
> On Mon, May 04 2020, David Bremner wrote:
>
>> In id:1588595993-ner-8.651@TPL520 Franz Fellner reported that tags
>> starting with 'attachment' are removed by 'notmuch reindex'. This is
>> probably related to the use of STRNCMP_LITERAL in
>
> Haa, I looked this briefly but failed to see it is STRNCMP_LITERAL,
> not STRCMP_LITERAL (the latter could be optimized strcmp using memcmp
> w/ constant len)
>
> Series LGTM (I'm trying to look away that 'we' passive ;)
We have pushed.
d