unofficial mirror of notmuch@notmuchmail.org
 help / color / Atom feed
* notmuch reindex wipes existing tags
@ 2020-05-04  7:25 Franz Fellner
  2020-05-04 10:30 ` David Bremner
  0 siblings, 1 reply; 9+ messages in thread
From: Franz Fellner @ 2020-05-04  7:25 UTC (permalink / raw)
  To: notmuch

Hello,

I setup notmuch for my dad specifically for mass-extracting
pdf newspapers he receives.
When the pdf was stored successfully the message gets the 
tag attachments-extracted added.
Today I wanted to go further and make use of afew specifically
for mail moving to free up space on the IMAP server.
I initially ignored the List-Id feature but now wanted to make
use of it. Configured notmuch appropriately, configured afew.
Ran notmuch reindex.
And now all custom tags were wiped, especially attachments-extracted.
I now can't see if the pdf of a certain message was already saved.
This is 3 years of 5 newspapers a weak. I must delete the files
and re-extract them.

I did not expect that, especially after reading the man page
which doesn't warn about resetting tags.

So here my question:
Is this a bug or a feature?

Regards
Franz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: notmuch reindex wipes existing tags
  2020-05-04  7:25 notmuch reindex wipes existing tags Franz Fellner
@ 2020-05-04 10:30 ` David Bremner
  2020-05-04 12:39   ` Franz Fellner
  0 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2020-05-04 10:30 UTC (permalink / raw)
  To: Franz Fellner, notmuch

Franz Fellner <alpine.art.de@gmail.com> writes:

> Ran notmuch reindex.
> And now all custom tags were wiped, especially attachments-extracted.
> I now can't see if the pdf of a certain message was already saved.
> This is 3 years of 5 newspapers a weak. I must delete the files
> and re-extract them.
>
> I did not expect that, especially after reading the man page
> which doesn't warn about resetting tags.
>
> So here my question:
> Is this a bug or a feature?

It sounds like a bug. But it's a bug that the test suite specifically
tests for ("reindex preserves tags" in T700-reindex.sh) so I'm not sure
what is going on. To eliminate the obvious, does the test suite pass for
you?

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: notmuch reindex wipes existing tags
  2020-05-04 10:30 ` David Bremner
@ 2020-05-04 12:39   ` Franz Fellner
  2020-05-04 13:39     ` David Bremner
  0 siblings, 1 reply; 9+ messages in thread
From: Franz Fellner @ 2020-05-04 12:39 UTC (permalink / raw)
  To: David Bremner, notmuch

On Mon May  4 07:30:38 2020, David Bremner <david@tethera.net> wrote:
> Franz Fellner <alpine.art.de@gmail.com> writes:
> 
> > Ran notmuch reindex.
> > And now all custom tags were wiped, especially attachments-extracted.
> > I now can't see if the pdf of a certain message was already saved.
> > This is 3 years of 5 newspapers a weak. I must delete the files
> > and re-extract them.
> >
> > I did not expect that, especially after reading the man page
> > which doesn't warn about resetting tags.
> >
> > So here my question:
> > Is this a bug or a feature?
> 
> It sounds like a bug. But it's a bug that the test suite specifically
> tests for ("reindex preserves tags" in T700-reindex.sh) so I'm not sure
> what is going on. To eliminate the obvious, does the test suite pass for
> you?
> 
> d
> 
> 
-- 

T700 passes.

Wanted to know if I was wrong:

[15:19] $ notmuch search tag:adz date:1M
thread:00000000000099a2   Wed. 15:15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-30 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:000000000000995b   Tue. 15:15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-29 als PDF (adz attachment attachments-extracted inbox news newspaper unread)
thread:000000000000993c     April 27 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-28 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:00000000000098e2     April 24 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-25 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:00000000000098bd     April 23 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-24 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:0000000000009882     April 22 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-23 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:0000000000009858     April 21 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-22 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:00000000000097ac     April 15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-16 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:0000000000009790     April 14 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-15 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:0000000000009762     April 13 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-14 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:0000000000009730     April 10 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-11 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:0000000000009710     April 09 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-10 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:00000000000096e9     April 08 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-09 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:00000000000096db     April 07 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-08 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:00000000000096b9     April 06 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-07 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:000000000000966e     April 03 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-04 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:0000000000009652     April 02 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-03 als PDF (adz attachment attachments-extracted inbox news newspaper)
thread:0000000000009636     April 01 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-02 als PDF (adz attachment attachments-extracted inbox news newspaper)
[15:19] $ notmuch reindex tag:adz date:1M
[15:19] $ notmuch search tag:adz date:1M
thread:00000000000099a2   Wed. 15:15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-30 als PDF (adz attachment inbox news newspaper)
thread:000000000000995b   Tue. 15:15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-29 als PDF (adz attachment inbox news newspaper unread)
thread:000000000000993c     April 27 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-28 als PDF (adz attachment inbox news newspaper)
thread:00000000000098e2     April 24 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-25 als PDF (adz attachment inbox news newspaper)
thread:00000000000098bd     April 23 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-24 als PDF (adz attachment inbox news newspaper)
thread:0000000000009882     April 22 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-23 als PDF (adz attachment inbox news newspaper)
thread:0000000000009858     April 21 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-22 als PDF (adz attachment inbox news newspaper)
thread:00000000000097ac     April 15 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-16 als PDF (adz attachment inbox news newspaper)
thread:0000000000009790     April 14 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-15 als PDF (adz attachment inbox news newspaper)
thread:0000000000009762     April 13 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-14 als PDF (adz attachment inbox news newspaper)
thread:0000000000009730     April 10 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-11 als PDF (adz attachment inbox news newspaper)
thread:0000000000009710     April 09 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-10 als PDF (adz attachment inbox news newspaper)
thread:00000000000096e9     April 08 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-09 als PDF (adz attachment inbox news newspaper)
thread:00000000000096db     April 07 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-08 als PDF (adz attachment inbox news newspaper)
thread:00000000000096b9     April 06 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-07 als PDF (adz attachment inbox news newspaper)
thread:000000000000966e     April 03 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-04 als PDF (adz attachment inbox news newspaper)
thread:0000000000009652     April 02 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-03 als PDF (adz attachment inbox news newspaper)
thread:0000000000009636     April 01 [1/1] ADZ PDF Versand; Ihre ADZ vom 2020-04-02 als PDF (adz attachment inbox news newspaper)

So either I hit a case T700 doesn't cover.

Played a little bit:

===================
diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh
index 9e795896..4e76ad3e 100755
--- a/test/T700-reindex.sh
+++ b/test/T700-reindex.sh
@@ -5,6 +5,7 @@ test_description='reindexing messages'
 add_email_corpus
 
 notmuch tag +usertag1 '*'
+notmuch tag +attachments-extracted '*'
 
 notmuch search '*' | notmuch_search_sanitize > initial-threads
 notmuch search --output=messages '*' > initial-message-ids
===================

And it fails. The tag "attachments-extracted" got removed.
Got curious, it seems as soon as the additional tag starts with "attachment"
"notmuch reindex" removes it.
With "extracted-attachments" everything is fine.
Thought it might clash with preserved tags.
But "unreadable" and "inboxable" works just fine.
So it has to do with special handling of "attachment".

Regards
Franz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: notmuch reindex wipes existing tags
  2020-05-04 12:39   ` Franz Fellner
@ 2020-05-04 13:39     ` David Bremner
  2020-05-04 14:00       ` [PATCH 1/2] test: known broken test for reindex tag preservation David Bremner
  0 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2020-05-04 13:39 UTC (permalink / raw)
  To: Franz Fellner, notmuch

Franz Fellner <alpine.art.de@gmail.com> writes:

>
> And it fails. The tag "attachments-extracted" got removed.
> Got curious, it seems as soon as the additional tag starts with "attachment"
> "notmuch reindex" removes it.
> With "extracted-attachments" everything is fine.
> Thought it might clash with preserved tags.
> But "unreadable" and "inboxable" works just fine.
> So it has to do with special handling of "attachment".

Thanks for this. I suspected there might be something funny there (since
the tag 'attachment' is handled specially), but I didn't know how. Now I
know where to look.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] test: known broken test for reindex tag preservation
  2020-05-04 13:39     ` David Bremner
@ 2020-05-04 14:00       ` David Bremner
  2020-05-04 14:00         ` [PATCH 2/2] lib: replace STRNCMP_LITERAL in __message_remove_indexed_terms David Bremner
  2020-05-04 14:16         ` [PATCH 1/2] test: known broken test for reindex tag preservation Tomi Ollila
  0 siblings, 2 replies; 9+ messages in thread
From: David Bremner @ 2020-05-04 14:00 UTC (permalink / raw)
  To: David Bremner, Franz Fellner, notmuch

In id:1588595993-ner-8.651@TPL520 Franz Fellner reported that tags
starting with 'attachment' are removed by 'notmuch reindex'. This is
probably related to the use of STRNCMP_LITERAL in
_notmuch_message_remove_indexed_terms.
---
 test/T700-reindex.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh
index 9e795896..7b7e52de 100755
--- a/test/T700-reindex.sh
+++ b/test/T700-reindex.sh
@@ -33,6 +33,15 @@ notmuch reindex '*'
 notmuch dump > OUTPUT
 test_expect_equal_file initial-dump OUTPUT
 
+test_begin_subtest 'reindex preserves tags with special prefixes'
+test_subtest_known_broken
+notmuch tag +attachment2 +encrypted2 +signed2  '*'
+notmuch dump > EXPECTED
+notmuch reindex '*'
+notmuch dump > OUTPUT
+notmuch tag -attachment2 -encrypted2 -signed2  '*'
+test_expect_equal_file EXPECTED OUTPUT
+
 test_begin_subtest 'reindex moves a message between threads'
 notmuch search --output=threads id:87iqd9rn3l.fsf@vertex.dottedmag > EXPECTED
 # re-parent
-- 
2.26.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] lib: replace STRNCMP_LITERAL in __message_remove_indexed_terms
  2020-05-04 14:00       ` [PATCH 1/2] test: known broken test for reindex tag preservation David Bremner
@ 2020-05-04 14:00         ` David Bremner
  2020-05-04 15:16           ` Franz Fellner
  2020-05-04 14:16         ` [PATCH 1/2] test: known broken test for reindex tag preservation Tomi Ollila
  1 sibling, 1 reply; 9+ messages in thread
From: David Bremner @ 2020-05-04 14:00 UTC (permalink / raw)
  To: David Bremner, Franz Fellner, notmuch

strncmp looks for a prefix that matches, which is very much not what
we want here. This fixes the bug reported by Franz Fellner in
id:1588595993-ner-8.651@TPL520
---
 lib/message.cc       | 6 +++---
 test/T700-reindex.sh | 1 -
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 5c9b58b2..0fa0eb3a 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -751,9 +751,9 @@ _notmuch_message_remove_indexed_terms (notmuch_message_t *message)
 
 	const char *tag = notmuch_tags_get (tags);
 
-	if (STRNCMP_LITERAL (tag, "encrypted") != 0 &&
-	    STRNCMP_LITERAL (tag, "signed") != 0 &&
-	    STRNCMP_LITERAL (tag, "attachment") != 0) {
+	if (strcmp (tag, "encrypted") != 0 &&
+	    strcmp (tag, "signed") != 0 &&
+	    strcmp (tag, "attachment") != 0) {
 	    std::string term = tag_prefix + tag;
 	    message->doc.add_term (term);
 	}
diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh
index 7b7e52de..3d7c930d 100755
--- a/test/T700-reindex.sh
+++ b/test/T700-reindex.sh
@@ -34,7 +34,6 @@ notmuch dump > OUTPUT
 test_expect_equal_file initial-dump OUTPUT
 
 test_begin_subtest 'reindex preserves tags with special prefixes'
-test_subtest_known_broken
 notmuch tag +attachment2 +encrypted2 +signed2  '*'
 notmuch dump > EXPECTED
 notmuch reindex '*'
-- 
2.26.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] test: known broken test for reindex tag preservation
  2020-05-04 14:00       ` [PATCH 1/2] test: known broken test for reindex tag preservation David Bremner
  2020-05-04 14:00         ` [PATCH 2/2] lib: replace STRNCMP_LITERAL in __message_remove_indexed_terms David Bremner
@ 2020-05-04 14:16         ` Tomi Ollila
  2020-05-04 22:47           ` David Bremner
  1 sibling, 1 reply; 9+ messages in thread
From: Tomi Ollila @ 2020-05-04 14:16 UTC (permalink / raw)
  To: David Bremner, David Bremner, Franz Fellner, notmuch

On Mon, May 04 2020, David Bremner wrote:

> In id:1588595993-ner-8.651@TPL520 Franz Fellner reported that tags
> starting with 'attachment' are removed by 'notmuch reindex'. This is
> probably related to the use of STRNCMP_LITERAL in

Haa, I looked this briefly but failed to see it is STRNCMP_LITERAL,
not STRCMP_LITERAL (the latter could be optimized strcmp using memcmp
w/ constant len)

Series LGTM (I'm trying to look away that 'we' passive ;)

Tomi

> _notmuch_message_remove_indexed_terms.
> ---
>  test/T700-reindex.sh | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh
> index 9e795896..7b7e52de 100755
> --- a/test/T700-reindex.sh
> +++ b/test/T700-reindex.sh
> @@ -33,6 +33,15 @@ notmuch reindex '*'
>  notmuch dump > OUTPUT
>  test_expect_equal_file initial-dump OUTPUT
>  
> +test_begin_subtest 'reindex preserves tags with special prefixes'
> +test_subtest_known_broken
> +notmuch tag +attachment2 +encrypted2 +signed2  '*'
> +notmuch dump > EXPECTED
> +notmuch reindex '*'
> +notmuch dump > OUTPUT
> +notmuch tag -attachment2 -encrypted2 -signed2  '*'
> +test_expect_equal_file EXPECTED OUTPUT
> +
>  test_begin_subtest 'reindex moves a message between threads'
>  notmuch search --output=threads id:87iqd9rn3l.fsf@vertex.dottedmag > EXPECTED
>  # re-parent
> -- 
> 2.26.2
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] lib: replace STRNCMP_LITERAL in __message_remove_indexed_terms
  2020-05-04 14:00         ` [PATCH 2/2] lib: replace STRNCMP_LITERAL in __message_remove_indexed_terms David Bremner
@ 2020-05-04 15:16           ` Franz Fellner
  0 siblings, 0 replies; 9+ messages in thread
From: Franz Fellner @ 2020-05-04 15:16 UTC (permalink / raw)
  To: David Bremner, David Bremner, notmuch

Thank you very much.
I confirm that the patch fixes the issue.

Regards
Franz

On Mon May  4 11:00:24 2020, David Bremner <david@tethera.net> wrote:
> strncmp looks for a prefix that matches, which is very much not what
> we want here. This fixes the bug reported by Franz Fellner in
> id:1588595993-ner-8.651@TPL520
> ---
>  lib/message.cc       | 6 +++---
>  test/T700-reindex.sh | 1 -
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/message.cc b/lib/message.cc
> index 5c9b58b2..0fa0eb3a 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -751,9 +751,9 @@ _notmuch_message_remove_indexed_terms (notmuch_message_t *message)
>  
>         const char *tag = notmuch_tags_get (tags);
>  
> -       if (STRNCMP_LITERAL (tag, "encrypted") != 0 &&
> -           STRNCMP_LITERAL (tag, "signed") != 0 &&
> -           STRNCMP_LITERAL (tag, "attachment") != 0) {
> +       if (strcmp (tag, "encrypted") != 0 &&
> +           strcmp (tag, "signed") != 0 &&
> +           strcmp (tag, "attachment") != 0) {
>             std::string term = tag_prefix + tag;
>             message->doc.add_term (term);
>         }
> diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh
> index 7b7e52de..3d7c930d 100755
> --- a/test/T700-reindex.sh
> +++ b/test/T700-reindex.sh
> @@ -34,7 +34,6 @@ notmuch dump > OUTPUT
>  test_expect_equal_file initial-dump OUTPUT
>  
>  test_begin_subtest 'reindex preserves tags with special prefixes'
> -test_subtest_known_broken
>  notmuch tag +attachment2 +encrypted2 +signed2  '*'
>  notmuch dump > EXPECTED
>  notmuch reindex '*'
> -- 
> 2.26.2
> 
> 
> 
-- 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] test: known broken test for reindex tag preservation
  2020-05-04 14:16         ` [PATCH 1/2] test: known broken test for reindex tag preservation Tomi Ollila
@ 2020-05-04 22:47           ` David Bremner
  0 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2020-05-04 22:47 UTC (permalink / raw)
  To: Tomi Ollila, Franz Fellner, notmuch

Tomi Ollila <tomi.ollila@iki.fi> writes:

> On Mon, May 04 2020, David Bremner wrote:
>
>> In id:1588595993-ner-8.651@TPL520 Franz Fellner reported that tags
>> starting with 'attachment' are removed by 'notmuch reindex'. This is
>> probably related to the use of STRNCMP_LITERAL in
>
> Haa, I looked this briefly but failed to see it is STRNCMP_LITERAL,
> not STRCMP_LITERAL (the latter could be optimized strcmp using memcmp
> w/ constant len)
>
> Series LGTM (I'm trying to look away that 'we' passive ;)

We have pushed.

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04  7:25 notmuch reindex wipes existing tags Franz Fellner
2020-05-04 10:30 ` David Bremner
2020-05-04 12:39   ` Franz Fellner
2020-05-04 13:39     ` David Bremner
2020-05-04 14:00       ` [PATCH 1/2] test: known broken test for reindex tag preservation David Bremner
2020-05-04 14:00         ` [PATCH 2/2] lib: replace STRNCMP_LITERAL in __message_remove_indexed_terms David Bremner
2020-05-04 15:16           ` Franz Fellner
2020-05-04 14:16         ` [PATCH 1/2] test: known broken test for reindex tag preservation Tomi Ollila
2020-05-04 22:47           ` David Bremner

unofficial mirror of notmuch@notmuchmail.org

Archives are clonable:
	git clone --mirror https://yhetil.org/notmuch/0 notmuch/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 notmuch notmuch/ https://yhetil.org/notmuch \
		notmuch@notmuchmail.org
	public-inbox-index notmuch

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.mail.notmuch.general
	nntp://news.gmane.io/gmane.mail.notmuch.general


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git