* second round of indexing all files @ 2017-04-02 13:16 David Bremner 2017-04-02 13:16 ` [rfc patch v2 1/5] lib: add definitions for notmuch_param_t David Bremner ` (5 more replies) 0 siblings, 6 replies; 14+ messages in thread From: David Bremner @ 2017-04-02 13:16 UTC (permalink / raw) To: notmuch This adds in a "notmuch reindex" command so that deleting the terms from deleted files can be accomplished. There are still several UI issues to deal with (i.e. we return an arbitrary file, not necessarily the one matched). The reindex command is a simplified version of one the that dkg originally wrote for his series on indexing encrypted messages. I've ripped out all the encryption related stuff here. I've also postulated (but not yet written) a more generic way of handling index options, roughly modeled on our command-line-options code. I hope that this will allow fewer functions, and a more static API at the library level; at this point it's just a sketch of an idea. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [rfc patch v2 1/5] lib: add definitions for notmuch_param_t 2017-04-02 13:16 second round of indexing all files David Bremner @ 2017-04-02 13:16 ` David Bremner 2017-04-02 13:16 ` [rfc patch v2 2/5] added notmuch_message_reindex David Bremner ` (4 subsequent siblings) 5 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-02 13:16 UTC (permalink / raw) To: notmuch This is not an opaque struct because we envision using static initialization much like the command-line-options.h structures. --- lib/notmuch.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/lib/notmuch.h b/lib/notmuch.h index d374dc96..fc00f96d 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -219,6 +219,23 @@ typedef struct _notmuch_filenames notmuch_filenames_t; typedef struct _notmuch_config_list notmuch_config_list_t; #endif /* __DOXYGEN__ */ +enum notmuch_param_type { + NOTMUCH_PARAM_END = 0, + NOTMUCH_PARAM_BOOLEAN, + NOTMUCH_PARAM_INT, + NOTMUCH_PARAM_STRING +}; + +typedef struct notmuch_param_desc { + enum notmuch_param_type param_type; + int key; + union { + notmuch_bool_t bool_val; + int int_val; + const char *string_val; + }; +} notmuch_param_t; + /** * Create a new, empty notmuch database located at 'path'. * -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [rfc patch v2 2/5] added notmuch_message_reindex 2017-04-02 13:16 second round of indexing all files David Bremner 2017-04-02 13:16 ` [rfc patch v2 1/5] lib: add definitions for notmuch_param_t David Bremner @ 2017-04-02 13:16 ` David Bremner 2017-04-02 13:16 ` [rfc patch v2 3/5] add "notmuch reindex" subcommand David Bremner ` (3 subsequent siblings) 5 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-02 13:16 UTC (permalink / raw) To: notmuch; +Cc: Daniel Kahn Gillmor From: Daniel Kahn Gillmor <dkg@fifthhorseman.net> This new function asks the database to reindex a given message. The parameter `indexopts` is currently ignored, but is intended to provide an extensible API to support e.g. changing the encryption or filtering status (e.g. whether and how certain non-plaintext parts are indexed). Since we have no way of distinguising terms added (without prefix) from the headers and terms added from the body, we just save the tags and properties, remove the message from the database entirely, and add it back into the database in full, re-adding tags and properties as needed. --- lib/message.cc | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++- lib/notmuch.h | 14 ++++++++ 2 files changed, 115 insertions(+), 1 deletion(-) diff --git a/lib/message.cc b/lib/message.cc index f8215a49..d68e4c66 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -579,7 +579,9 @@ void _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix) { Xapian::TermIterator i; - size_t prefix_len = strlen (prefix); + size_t prefix_len = 0; + + prefix_len = strlen (prefix); while (1) { i = message->doc.termlist_begin (); @@ -1872,3 +1874,101 @@ _notmuch_message_frozen (notmuch_message_t *message) { return message->frozen; } + +notmuch_status_t +notmuch_message_reindex (notmuch_message_t *message, + notmuch_param_t unused (*indexopts)) +{ + notmuch_database_t *notmuch = NULL; + notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS, status; + notmuch_tags_t *tags = NULL; + notmuch_message_properties_t *properties = NULL; + notmuch_filenames_t *filenames, *orig_filenames = NULL; + const char *filename = NULL, *tag = NULL, *propkey = NULL; + notmuch_message_t *newmsg = NULL; + notmuch_bool_t readded = FALSE, skip; + const char *autotags[] = { + "attachment", + "encrypted", + "signed" }; + + if (message == NULL) + return NOTMUCH_STATUS_NULL_POINTER; + + notmuch = _notmuch_message_database (message); + + /* cache tags, properties, and filenames */ + tags = notmuch_message_get_tags (message); + properties = notmuch_message_get_properties (message, "", FALSE); + filenames = notmuch_message_get_filenames (message); + orig_filenames = notmuch_message_get_filenames (message); + + /* walk through filenames, removing them until the message is gone */ + for ( ; notmuch_filenames_valid (filenames); + notmuch_filenames_move_to_next (filenames)) { + filename = notmuch_filenames_get (filenames); + + ret = notmuch_database_remove_message (notmuch, filename); + if (ret != NOTMUCH_STATUS_SUCCESS && + ret != NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID) + return ret; + } + if (ret != NOTMUCH_STATUS_SUCCESS) + return ret; + + /* re-add the filenames with the associated indexopts */ + for (; notmuch_filenames_valid (orig_filenames); + notmuch_filenames_move_to_next (orig_filenames)) { + filename = notmuch_filenames_get (orig_filenames); + + status = notmuch_database_add_message(notmuch, + filename, + readded ? NULL : &newmsg); + if (status == NOTMUCH_STATUS_SUCCESS || + status == NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID) { + if (!readded) { + /* re-add tags */ + for (; notmuch_tags_valid (tags); + notmuch_tags_move_to_next (tags)) { + tag = notmuch_tags_get (tags); + skip = FALSE; + + for (size_t i = 0; i < ARRAY_SIZE (autotags); i++) + if (strcmp (tag, autotags[i]) == 0) + skip = TRUE; + + if (!skip) { + status = notmuch_message_add_tag (newmsg, tag); + if (status != NOTMUCH_STATUS_SUCCESS) + ret = status; + } + } + /* re-add properties */ + for (; notmuch_message_properties_valid (properties); + notmuch_message_properties_move_to_next (properties)) { + propkey = notmuch_message_properties_key (properties); + skip = FALSE; + + if (!skip) { + status = notmuch_message_add_property (newmsg, propkey, + notmuch_message_properties_value (properties)); + if (status != NOTMUCH_STATUS_SUCCESS) + ret = status; + } + } + readded = TRUE; + } + } else { + /* if we failed to add this filename, go ahead and try the + * next one as though it were first, but report the + * error... */ + ret = status; + } + } + if (newmsg) + notmuch_message_destroy (newmsg); + + /* should we also destroy the incoming message object? at the + * moment, we leave that to the caller */ + return ret; +} diff --git a/lib/notmuch.h b/lib/notmuch.h index fc00f96d..1f31efed 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -1389,6 +1389,20 @@ notmuch_filenames_t * notmuch_message_get_filenames (notmuch_message_t *message); /** + * Re-index the e-mail corresponding to 'message' using the supplied index options + * + * Returns the status of the re-index operation. (see the return + * codes documented in notmuch_database_add_message) + * + * After reindexing, the user should discard the message object passed + * in here by calling notmuch_message_destroy, since it refers to the + * original message, not to the reindexed message. + */ +notmuch_status_t +notmuch_message_reindex (notmuch_message_t *message, + notmuch_param_t *indexopts); + +/** * Message flags. */ typedef enum _notmuch_message_flag { -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [rfc patch v2 3/5] add "notmuch reindex" subcommand 2017-04-02 13:16 second round of indexing all files David Bremner 2017-04-02 13:16 ` [rfc patch v2 1/5] lib: add definitions for notmuch_param_t David Bremner 2017-04-02 13:16 ` [rfc patch v2 2/5] added notmuch_message_reindex David Bremner @ 2017-04-02 13:16 ` David Bremner 2017-04-02 13:16 ` [rfc patch v2 4/5] test: add known broken test for duplicate message id David Bremner ` (2 subsequent siblings) 5 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-02 13:16 UTC (permalink / raw) To: notmuch; +Cc: Daniel Kahn Gillmor From: Daniel Kahn Gillmor <dkg@fifthhorseman.net> This new subcommand takes a set of search terms, and re-indexes the list of matching messages. --- Makefile.local | 1 + doc/conf.py | 4 ++ doc/index.rst | 1 + doc/man1/notmuch-reindex.rst | 29 +++++++++ doc/man1/notmuch.rst | 4 +- doc/man7/notmuch-search-terms.rst | 7 +- notmuch-client.h | 3 + notmuch-reindex.c | 132 ++++++++++++++++++++++++++++++++++++++ notmuch.c | 2 + test/T700-reindex.sh | 21 ++++++ 10 files changed, 200 insertions(+), 4 deletions(-) create mode 100644 doc/man1/notmuch-reindex.rst create mode 100644 notmuch-reindex.c create mode 100755 test/T700-reindex.sh diff --git a/Makefile.local b/Makefile.local index 03eafaaa..c6e272bc 100644 --- a/Makefile.local +++ b/Makefile.local @@ -222,6 +222,7 @@ notmuch_client_srcs = \ notmuch-dump.c \ notmuch-insert.c \ notmuch-new.c \ + notmuch-reindex.c \ notmuch-reply.c \ notmuch-restore.c \ notmuch-search.c \ diff --git a/doc/conf.py b/doc/conf.py index a3d82696..aa864b3c 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -95,6 +95,10 @@ man_pages = [ u'incorporate new mail into the notmuch database', [notmuch_authors], 1), + ('man1/notmuch-reindex', 'notmuch-reindex', + u're-index matching messages', + [notmuch_authors], 1), + ('man1/notmuch-reply', 'notmuch-reply', u'constructs a reply template for a set of messages', [notmuch_authors], 1), diff --git a/doc/index.rst b/doc/index.rst index 344606d9..aa6c9f40 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -18,6 +18,7 @@ Contents: man5/notmuch-hooks man1/notmuch-insert man1/notmuch-new + man1/notmuch-reindex man1/notmuch-reply man1/notmuch-restore man1/notmuch-search diff --git a/doc/man1/notmuch-reindex.rst b/doc/man1/notmuch-reindex.rst new file mode 100644 index 00000000..6c786b85 --- /dev/null +++ b/doc/man1/notmuch-reindex.rst @@ -0,0 +1,29 @@ +=========== +notmuch-reindex +=========== + +SYNOPSIS +======== + +**notmuch** **reindex** [*option* ...] <*search-term*> ... + +DESCRIPTION +=========== + +Re-index all messages matching the search terms. + +See **notmuch-search-terms(7)** for details of the supported syntax for +<*search-term*\ >. + +The **reindex** command searches for all messages matching the +supplied search terms, and re-creates the full-text index on these +messages using the supplied options. + +SEE ALSO +======== + +**notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**, +**notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**, +**notmuch-new(1)**, +**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**, +**notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)** diff --git a/doc/man1/notmuch.rst b/doc/man1/notmuch.rst index fbd7f381..b2a8376e 100644 --- a/doc/man1/notmuch.rst +++ b/doc/man1/notmuch.rst @@ -149,8 +149,8 @@ SEE ALSO **notmuch-address(1)**, **notmuch-compact(1)**, **notmuch-config(1)**, **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**, -**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reply(1)**, -**notmuch-restore(1)**, **notmuch-search(1)**, +**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reindex(1)**, +**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)** The notmuch website: **https://notmuchmail.org** diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst index 47cab48d..dd76972e 100644 --- a/doc/man7/notmuch-search-terms.rst +++ b/doc/man7/notmuch-search-terms.rst @@ -9,6 +9,8 @@ SYNOPSIS **notmuch** **dump** [--format=(batch-tag|sup)] [--] [--output=<*file*>] [--] [<*search-term*> ...] +**notmuch** **reindex** [option ...] <*search-term*> ... + **notmuch** **search** [option ...] <*search-term*> ... **notmuch** **show** [option ...] <*search-term*> ... @@ -421,5 +423,6 @@ SEE ALSO **notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**, -**notmuch-new(1)**, **notmuch-reply(1)**, **notmuch-restore(1)**, -**notmuch-search(1)**, **notmuch-show(1)**, **notmuch-tag(1)** +**notmuch-new(1)**, **notmuch-reindex(1)**, **notmuch-reply(1)**, +**notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-show(1)**, +**notmuch-tag(1)** diff --git a/notmuch-client.h b/notmuch-client.h index a6f70eae..ab7138c6 100644 --- a/notmuch-client.h +++ b/notmuch-client.h @@ -196,6 +196,9 @@ int notmuch_insert_command (notmuch_config_t *config, int argc, char *argv[]); int +notmuch_reindex_command (notmuch_config_t *config, int argc, char *argv[]); + +int notmuch_reply_command (notmuch_config_t *config, int argc, char *argv[]); int diff --git a/notmuch-reindex.c b/notmuch-reindex.c new file mode 100644 index 00000000..836a90a1 --- /dev/null +++ b/notmuch-reindex.c @@ -0,0 +1,132 @@ +/* notmuch - Not much of an email program, (just index and search) + * + * Copyright © 2016 Daniel Kahn Gillmor + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/ . + * + * Author: Daniel Kahn Gillmor <dkg@fifthhorseman.net> + */ + +#include "notmuch-client.h" +#include "string-util.h" + +static volatile sig_atomic_t interrupted; + +static void +handle_sigint (unused (int sig)) +{ + static char msg[] = "Stopping... \n"; + + /* This write is "opportunistic", so it's okay to ignore the + * result. It is not required for correctness, and if it does + * fail or produce a short write, we want to get out of the signal + * handler as quickly as possible, not retry it. */ + IGNORE_RESULT (write (2, msg, sizeof (msg) - 1)); + interrupted = 1; +} + +/* reindex all messages matching 'query_string' using the passed-in indexopts + */ +static int +reindex_query (notmuch_database_t *notmuch, const char *query_string, + notmuch_param_t *indexopts) +{ + notmuch_query_t *query; + notmuch_messages_t *messages; + notmuch_message_t *message; + notmuch_status_t status; + + int ret = NOTMUCH_STATUS_SUCCESS; + + query = notmuch_query_create (notmuch, query_string); + if (query == NULL) { + fprintf (stderr, "Out of memory.\n"); + return 1; + } + + /* reindexing is not interested in any special sort order */ + notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED); + + status = notmuch_query_search_messages (query, &messages); + if (print_status_query ("notmuch reindex", query, status)) + return status; + + for (; + notmuch_messages_valid (messages) && ! interrupted; + notmuch_messages_move_to_next (messages)) { + message = notmuch_messages_get (messages); + + notmuch_message_reindex(message, indexopts); + notmuch_message_destroy (message); + if (ret != NOTMUCH_STATUS_SUCCESS) + break; + } + + notmuch_query_destroy (query); + + return ret || interrupted; +} + +int +notmuch_reindex_command (notmuch_config_t *config, int argc, char *argv[]) +{ + char *query_string = NULL; + notmuch_database_t *notmuch; + struct sigaction action; + notmuch_bool_t try_decrypt = FALSE; + int opt_index; + int ret; + notmuch_param_t *indexopts = NULL; + + /* Set up our handler for SIGINT */ + memset (&action, 0, sizeof (struct sigaction)); + action.sa_handler = handle_sigint; + sigemptyset (&action.sa_mask); + action.sa_flags = SA_RESTART; + sigaction (SIGINT, &action, NULL); + + notmuch_opt_desc_t options[] = { + { NOTMUCH_OPT_INHERIT, (void *) ¬much_shared_options, NULL, 0, 0 }, + { 0, 0, 0, 0, 0 } + }; + + opt_index = parse_arguments (argc, argv, options, 1); + if (opt_index < 0) + return EXIT_FAILURE; + + notmuch_process_shared_options (argv[0]); + + if (notmuch_database_open (notmuch_config_get_database_path (config), + NOTMUCH_DATABASE_MODE_READ_WRITE, ¬much)) + return EXIT_FAILURE; + + notmuch_exit_if_unmatched_db_uuid (notmuch); + + query_string = query_string_from_args (config, argc-opt_index, argv+opt_index); + if (query_string == NULL) { + fprintf (stderr, "Out of memory\n"); + return EXIT_FAILURE; + } + + if (*query_string == '\0') { + fprintf (stderr, "Error: notmuch reindex requires at least one search term.\n"); + return EXIT_FAILURE; + } + + ret = reindex_query (notmuch, query_string, indexopts); + + notmuch_database_destroy (notmuch); + + return ret || interrupted ? EXIT_FAILURE : EXIT_SUCCESS; +} diff --git a/notmuch.c b/notmuch.c index 8e332ce6..201c7454 100644 --- a/notmuch.c +++ b/notmuch.c @@ -123,6 +123,8 @@ static command_t commands[] = { "Restore the tags from the given dump file (see 'dump')." }, { "compact", notmuch_compact_command, NOTMUCH_CONFIG_OPEN, "Compact the notmuch database." }, + { "reindex", notmuch_reindex_command, NOTMUCH_CONFIG_OPEN, + "Re-index all messages matching the search terms." }, { "config", notmuch_config_command, NOTMUCH_CONFIG_OPEN, "Get or set settings in the notmuch configuration file." }, { "help", notmuch_help_command, NOTMUCH_CONFIG_CREATE, /* create but don't save config */ diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh new file mode 100755 index 00000000..32385a72 --- /dev/null +++ b/test/T700-reindex.sh @@ -0,0 +1,21 @@ +#!/usr/bin/env bash +test_description='reindexing messages' +. ./test-lib.sh || exit 1 + +add_email_corpus + +test_begin_subtest 'reindex preserves message-ids' +notmuch search --output=messages '*' > EXPECTED +# remove duplicate file +rm $MAIL_DIR/bar/18:2, +notmuch reindex '*' +notmuch search --output=messages '*' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest 'reindex preserves tags' +notmuch dump > EXPECTED +notmuch reindex '*' +notmuch dump > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_done -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [rfc patch v2 4/5] test: add known broken test for duplicate message id 2017-04-02 13:16 second round of indexing all files David Bremner ` (2 preceding siblings ...) 2017-04-02 13:16 ` [rfc patch v2 3/5] add "notmuch reindex" subcommand David Bremner @ 2017-04-02 13:16 ` David Bremner 2017-04-02 13:16 ` [rfc patch v2 5/5] lib: index message files with duplicate message-ids David Bremner 2017-04-04 1:47 ` third round of indexing all files David Bremner 5 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-02 13:16 UTC (permalink / raw) To: notmuch There are many other problems that could be tested, but this one we have some hope of fixing because it doesn't require UI changes, just indexing changes. --- test/T670-duplicate-mid.sh | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100755 test/T670-duplicate-mid.sh diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh new file mode 100755 index 00000000..88bd12cb --- /dev/null +++ b/test/T670-duplicate-mid.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash +test_description="duplicate message ids" +. ./test-lib.sh || exit 1 + +add_message '[id]="id:duplicate"' '[subject]="message 1"' +add_message '[id]="id:duplicate"' '[subject]="message 2"' + +test_begin_subtest 'Search for second subject' +test_subtest_known_broken +cat <<EOF >EXPECTED +MAIL_DIR/msg-001 +MAIL_DIR/msg-002 +EOF +notmuch search --output=files subject:'"message 2"' | notmuch_dir_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_done -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [rfc patch v2 5/5] lib: index message files with duplicate message-ids 2017-04-02 13:16 second round of indexing all files David Bremner ` (3 preceding siblings ...) 2017-04-02 13:16 ` [rfc patch v2 4/5] test: add known broken test for duplicate message id David Bremner @ 2017-04-02 13:16 ` David Bremner 2017-04-04 1:47 ` third round of indexing all files David Bremner 5 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-02 13:16 UTC (permalink / raw) To: notmuch The corresponding xapian document just gets more terms added to it, but this doesn't seem to break anything. --- lib/database.cc | 3 +++ test/T670-duplicate-mid.sh | 22 +++++++++++++++++++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 5bc131a3..3b9f7828 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -2582,6 +2582,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch, if (ret) goto DONE; } else { + ret = _notmuch_message_index_file (message, message_file); + if (ret) + goto DONE; ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; } diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh index 88bd12cb..2c77e11e 100755 --- a/test/T670-duplicate-mid.sh +++ b/test/T670-duplicate-mid.sh @@ -2,11 +2,10 @@ test_description="duplicate message ids" . ./test-lib.sh || exit 1 -add_message '[id]="id:duplicate"' '[subject]="message 1"' -add_message '[id]="id:duplicate"' '[subject]="message 2"' +add_message '[id]="duplicate"' '[subject]="message 1"' +add_message '[id]="duplicate"' '[subject]="message 2"' test_begin_subtest 'Search for second subject' -test_subtest_known_broken cat <<EOF >EXPECTED MAIL_DIR/msg-001 MAIL_DIR/msg-002 @@ -14,4 +13,21 @@ EOF notmuch search --output=files subject:'"message 2"' | notmuch_dir_sanitize > OUTPUT test_expect_equal_file EXPECTED OUTPUT +add_message '[id]="duplicate"' '[body]="sekrit"' +test_begin_subtest 'search for body in duplicate file' +cat <<EOF >EXPECTED +MAIL_DIR/msg-001 +MAIL_DIR/msg-002 +MAIL_DIR/msg-003 +EOF +notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest 'reindex removes terms from duplicate file' +rm $MAIL_DIR/msg-003 +notmuch reindex id:duplicate +cp /dev/null EXPECTED +notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + test_done -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* third round of indexing all files 2017-04-02 13:16 second round of indexing all files David Bremner ` (4 preceding siblings ...) 2017-04-02 13:16 ` [rfc patch v2 5/5] lib: index message files with duplicate message-ids David Bremner @ 2017-04-04 1:47 ` David Bremner 2017-04-04 1:47 ` [rfc patch v3 1/6] lib: add definitions for notmuch_param_t David Bremner ` (6 more replies) 5 siblings, 7 replies; 14+ messages in thread From: David Bremner @ 2017-04-04 1:47 UTC (permalink / raw) To: David Bremner, notmuch It seems noticeably faster (on the order of 30-50% faster) and the code is quite a bit simpler to adapt the approach in [1] to only delete the terms we are going to re-add via indexing. This obsoletes the previous series at [2]. It still has all of the issues mentioned there UI-wise, and the question of the index options design probably needs more thought. This is new in this round [rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms This is has been pretty drastically rewritten compared to daniel's version [3] [rfc patch v3 3/6] added notmuch_message_reindex This is the same, except I added simple performance tests [rfc patch v3 4/6] add "notmuch reindex" subcommand [1]: id:1471178598-9639-1-git-send-email-david@tethera.net [2]: id:20170402131646.29884-1-david@tethera.net [3]: id:20170402131646.29884-3-david@tethera.net ^ permalink raw reply [flat|nested] 14+ messages in thread
* [rfc patch v3 1/6] lib: add definitions for notmuch_param_t 2017-04-04 1:47 ` third round of indexing all files David Bremner @ 2017-04-04 1:47 ` David Bremner 2017-04-04 1:47 ` [rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms David Bremner ` (5 subsequent siblings) 6 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-04 1:47 UTC (permalink / raw) To: David Bremner, notmuch This is not an opaque struct because we envision using static initialization much like the command-line-options.h structures. --- lib/notmuch.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/lib/notmuch.h b/lib/notmuch.h index d374dc96..fc00f96d 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -219,6 +219,23 @@ typedef struct _notmuch_filenames notmuch_filenames_t; typedef struct _notmuch_config_list notmuch_config_list_t; #endif /* __DOXYGEN__ */ +enum notmuch_param_type { + NOTMUCH_PARAM_END = 0, + NOTMUCH_PARAM_BOOLEAN, + NOTMUCH_PARAM_INT, + NOTMUCH_PARAM_STRING +}; + +typedef struct notmuch_param_desc { + enum notmuch_param_type param_type; + int key; + union { + notmuch_bool_t bool_val; + int int_val; + const char *string_val; + }; +} notmuch_param_t; + /** * Create a new, empty notmuch database located at 'path'. * -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms 2017-04-04 1:47 ` third round of indexing all files David Bremner 2017-04-04 1:47 ` [rfc patch v3 1/6] lib: add definitions for notmuch_param_t David Bremner @ 2017-04-04 1:47 ` David Bremner 2017-04-04 1:47 ` [rfc patch v3 3/6] added notmuch_message_reindex David Bremner ` (4 subsequent siblings) 6 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-04 1:47 UTC (permalink / raw) To: David Bremner, notmuch Testing will be provided via use in notmuch_message_reindex --- lib/message.cc | 44 ++++++++++++++++++++++++++++++++++++++++++++ lib/notmuch-private.h | 2 ++ lib/notmuch.h | 4 ++++ 3 files changed, 50 insertions(+) diff --git a/lib/message.cc b/lib/message.cc index f8215a49..a7bd38ac 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -599,6 +599,50 @@ _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix) } } + +/* Remove all terms generated by indexing, i.e. not tags or + * properties, along with any automatic tags*/ +notmuch_private_status_t +_notmuch_message_remove_indexed_terms (notmuch_message_t *message) +{ + Xapian::TermIterator i; + + const std::string tag_prefix = _find_prefix ("tag"); + const std::string property_prefix = _find_prefix ("property"); + + for (i = message->doc.termlist_begin (); + i != message->doc.termlist_end (); i++) { + + const std::string term = *i; + + if (term.compare (0, property_prefix.size (), property_prefix) == 0) + continue; + + if (term.compare (0, tag_prefix.size (), tag_prefix) == 0 && + term.compare (1, strlen("encrypted"), "encrypted") != 0 && + term.compare (1, strlen("signed"), "signed") != 0 && + term.compare (1, strlen("attachment"), "attachment") != 0) + continue; + + try { + message->doc.remove_term ((*i)); + message->modified = TRUE; + } catch (const Xapian::InvalidArgumentError) { + /* Ignore failure to remove non-existent term. */ + } catch (const Xapian::Error &error) { + notmuch_database_t *notmuch = message->notmuch; + + if (!notmuch->exception_reported) { + _notmuch_database_log(_notmuch_message_database (message), "A Xapian exception occurred creating message: %s\n", + error.get_msg().c_str()); + notmuch->exception_reported = TRUE; + } + return NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION; + } + } + return NOTMUCH_PRIVATE_STATUS_SUCCESS; +} + /* Return true if p points at "new" or "cur". */ static bool is_maildir (const char *p) { diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h index 8587e86c..1198d932 100644 --- a/lib/notmuch-private.h +++ b/lib/notmuch-private.h @@ -509,6 +509,8 @@ _notmuch_message_add_reply (notmuch_message_t *message, notmuch_database_t * _notmuch_message_database (notmuch_message_t *message); +void +_notmuch_message_remove_unprefixed_terms (notmuch_message_t *message); /* sha1.c */ char * diff --git a/lib/notmuch.h b/lib/notmuch.h index fc00f96d..33e9fd24 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -1685,6 +1685,10 @@ notmuch_message_thaw (notmuch_message_t *message); void notmuch_message_destroy (notmuch_message_t *message); +/* for testing */ + +void +notmuch_test_clear_terms(notmuch_message_t *message); /** * @name Message Properties * -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [rfc patch v3 3/6] added notmuch_message_reindex 2017-04-04 1:47 ` third round of indexing all files David Bremner 2017-04-04 1:47 ` [rfc patch v3 1/6] lib: add definitions for notmuch_param_t David Bremner 2017-04-04 1:47 ` [rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms David Bremner @ 2017-04-04 1:47 ` David Bremner 2017-04-04 1:47 ` [rfc patch v3 4/6] add "notmuch reindex" subcommand David Bremner ` (3 subsequent siblings) 6 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-04 1:47 UTC (permalink / raw) To: David Bremner, notmuch; +Cc: Daniel Kahn Gillmor From: Daniel Kahn Gillmor <dkg@fifthhorseman.net> This new function asks the database to reindex a given message. The parameter `indexopts` is currently ignored, but is intended to provide an extensible API to support e.g. changing the encryption or filtering status (e.g. whether and how certain non-plaintext parts are indexed). --- lib/message.cc | 46 +++++++++++++++++++++++++++++++++++++++++++++- lib/notmuch.h | 14 ++++++++++++++ 2 files changed, 59 insertions(+), 1 deletion(-) diff --git a/lib/message.cc b/lib/message.cc index a7bd38ac..193eedb2 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -579,7 +579,9 @@ void _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix) { Xapian::TermIterator i; - size_t prefix_len = strlen (prefix); + size_t prefix_len = 0; + + prefix_len = strlen (prefix); while (1) { i = message->doc.termlist_begin (); @@ -1916,3 +1918,45 @@ _notmuch_message_frozen (notmuch_message_t *message) { return message->frozen; } + +notmuch_status_t +notmuch_message_reindex (notmuch_message_t *message, + notmuch_param_t unused (*indexopts)) +{ + notmuch_database_t *notmuch = NULL; + notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS, status; + notmuch_private_status_t private_status; + notmuch_filenames_t *orig_filenames = NULL; + const char *filename = NULL; + + if (message == NULL) + return NOTMUCH_STATUS_NULL_POINTER; + + notmuch = _notmuch_message_database (message); + + orig_filenames = notmuch_message_get_filenames (message); + + private_status = _notmuch_message_remove_indexed_terms (message); + if (private_status) + return COERCE_STATUS(private_status, "error removing terms"); + + /* re-add the filenames with the associated indexopts */ + for (; notmuch_filenames_valid (orig_filenames); + notmuch_filenames_move_to_next (orig_filenames)) { + filename = notmuch_filenames_get (orig_filenames); + + status = notmuch_database_add_message(notmuch, + filename, + &message); + if (status != NOTMUCH_STATUS_SUCCESS && + status != NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID) { + /* if we failed to add this filename, go ahead and try the + * next one as though it were first, but report the + * error... */ + ret = status; + } + } + + /* XXX TODO destroy orig_filenames? */ + return ret; +} diff --git a/lib/notmuch.h b/lib/notmuch.h index 33e9fd24..11818018 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -1389,6 +1389,20 @@ notmuch_filenames_t * notmuch_message_get_filenames (notmuch_message_t *message); /** + * Re-index the e-mail corresponding to 'message' using the supplied index options + * + * Returns the status of the re-index operation. (see the return + * codes documented in notmuch_database_add_message) + * + * After reindexing, the user should discard the message object passed + * in here by calling notmuch_message_destroy, since it refers to the + * original message, not to the reindexed message. + */ +notmuch_status_t +notmuch_message_reindex (notmuch_message_t *message, + notmuch_param_t *indexopts); + +/** * Message flags. */ typedef enum _notmuch_message_flag { -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [rfc patch v3 4/6] add "notmuch reindex" subcommand 2017-04-04 1:47 ` third round of indexing all files David Bremner ` (2 preceding siblings ...) 2017-04-04 1:47 ` [rfc patch v3 3/6] added notmuch_message_reindex David Bremner @ 2017-04-04 1:47 ` David Bremner 2017-04-04 1:47 ` [rfc patch v3 5/6] test: add known broken test for duplicate message id David Bremner ` (2 subsequent siblings) 6 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-04 1:47 UTC (permalink / raw) To: David Bremner, notmuch; +Cc: Daniel Kahn Gillmor From: Daniel Kahn Gillmor <dkg@fifthhorseman.net> This new subcommand takes a set of search terms, and re-indexes the list of matching messages. --- Makefile.local | 1 + doc/conf.py | 4 ++ doc/index.rst | 1 + doc/man1/notmuch-reindex.rst | 29 +++++++++ doc/man1/notmuch.rst | 4 +- doc/man7/notmuch-search-terms.rst | 7 +- notmuch-client.h | 3 + notmuch-reindex.c | 131 ++++++++++++++++++++++++++++++++++++++ notmuch.c | 2 + performance-test/M04-reindex.sh | 11 ++++ performance-test/T03-reindex.sh | 13 ++++ test/T700-reindex.sh | 21 ++++++ 12 files changed, 223 insertions(+), 4 deletions(-) create mode 100644 doc/man1/notmuch-reindex.rst create mode 100644 notmuch-reindex.c create mode 100755 performance-test/M04-reindex.sh create mode 100755 performance-test/T03-reindex.sh create mode 100755 test/T700-reindex.sh diff --git a/Makefile.local b/Makefile.local index 03eafaaa..c6e272bc 100644 --- a/Makefile.local +++ b/Makefile.local @@ -222,6 +222,7 @@ notmuch_client_srcs = \ notmuch-dump.c \ notmuch-insert.c \ notmuch-new.c \ + notmuch-reindex.c \ notmuch-reply.c \ notmuch-restore.c \ notmuch-search.c \ diff --git a/doc/conf.py b/doc/conf.py index a3d82696..aa864b3c 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -95,6 +95,10 @@ man_pages = [ u'incorporate new mail into the notmuch database', [notmuch_authors], 1), + ('man1/notmuch-reindex', 'notmuch-reindex', + u're-index matching messages', + [notmuch_authors], 1), + ('man1/notmuch-reply', 'notmuch-reply', u'constructs a reply template for a set of messages', [notmuch_authors], 1), diff --git a/doc/index.rst b/doc/index.rst index 344606d9..aa6c9f40 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -18,6 +18,7 @@ Contents: man5/notmuch-hooks man1/notmuch-insert man1/notmuch-new + man1/notmuch-reindex man1/notmuch-reply man1/notmuch-restore man1/notmuch-search diff --git a/doc/man1/notmuch-reindex.rst b/doc/man1/notmuch-reindex.rst new file mode 100644 index 00000000..6c786b85 --- /dev/null +++ b/doc/man1/notmuch-reindex.rst @@ -0,0 +1,29 @@ +=========== +notmuch-reindex +=========== + +SYNOPSIS +======== + +**notmuch** **reindex** [*option* ...] <*search-term*> ... + +DESCRIPTION +=========== + +Re-index all messages matching the search terms. + +See **notmuch-search-terms(7)** for details of the supported syntax for +<*search-term*\ >. + +The **reindex** command searches for all messages matching the +supplied search terms, and re-creates the full-text index on these +messages using the supplied options. + +SEE ALSO +======== + +**notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**, +**notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**, +**notmuch-new(1)**, +**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**, +**notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)** diff --git a/doc/man1/notmuch.rst b/doc/man1/notmuch.rst index fbd7f381..b2a8376e 100644 --- a/doc/man1/notmuch.rst +++ b/doc/man1/notmuch.rst @@ -149,8 +149,8 @@ SEE ALSO **notmuch-address(1)**, **notmuch-compact(1)**, **notmuch-config(1)**, **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**, -**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reply(1)**, -**notmuch-restore(1)**, **notmuch-search(1)**, +**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reindex(1)**, +**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)** The notmuch website: **https://notmuchmail.org** diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst index 47cab48d..dd76972e 100644 --- a/doc/man7/notmuch-search-terms.rst +++ b/doc/man7/notmuch-search-terms.rst @@ -9,6 +9,8 @@ SYNOPSIS **notmuch** **dump** [--format=(batch-tag|sup)] [--] [--output=<*file*>] [--] [<*search-term*> ...] +**notmuch** **reindex** [option ...] <*search-term*> ... + **notmuch** **search** [option ...] <*search-term*> ... **notmuch** **show** [option ...] <*search-term*> ... @@ -421,5 +423,6 @@ SEE ALSO **notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**, -**notmuch-new(1)**, **notmuch-reply(1)**, **notmuch-restore(1)**, -**notmuch-search(1)**, **notmuch-show(1)**, **notmuch-tag(1)** +**notmuch-new(1)**, **notmuch-reindex(1)**, **notmuch-reply(1)**, +**notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-show(1)**, +**notmuch-tag(1)** diff --git a/notmuch-client.h b/notmuch-client.h index a6f70eae..ab7138c6 100644 --- a/notmuch-client.h +++ b/notmuch-client.h @@ -196,6 +196,9 @@ int notmuch_insert_command (notmuch_config_t *config, int argc, char *argv[]); int +notmuch_reindex_command (notmuch_config_t *config, int argc, char *argv[]); + +int notmuch_reply_command (notmuch_config_t *config, int argc, char *argv[]); int diff --git a/notmuch-reindex.c b/notmuch-reindex.c new file mode 100644 index 00000000..8b536375 --- /dev/null +++ b/notmuch-reindex.c @@ -0,0 +1,131 @@ +/* notmuch - Not much of an email program, (just index and search) + * + * Copyright © 2016 Daniel Kahn Gillmor + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/ . + * + * Author: Daniel Kahn Gillmor <dkg@fifthhorseman.net> + */ + +#include "notmuch-client.h" +#include "string-util.h" + +static volatile sig_atomic_t interrupted; + +static void +handle_sigint (unused (int sig)) +{ + static char msg[] = "Stopping... \n"; + + /* This write is "opportunistic", so it's okay to ignore the + * result. It is not required for correctness, and if it does + * fail or produce a short write, we want to get out of the signal + * handler as quickly as possible, not retry it. */ + IGNORE_RESULT (write (2, msg, sizeof (msg) - 1)); + interrupted = 1; +} + +/* reindex all messages matching 'query_string' using the passed-in indexopts + */ +static int +reindex_query (notmuch_database_t *notmuch, const char *query_string, + notmuch_param_t *indexopts) +{ + notmuch_query_t *query; + notmuch_messages_t *messages; + notmuch_message_t *message; + notmuch_status_t status; + + int ret = NOTMUCH_STATUS_SUCCESS; + + query = notmuch_query_create (notmuch, query_string); + if (query == NULL) { + fprintf (stderr, "Out of memory.\n"); + return 1; + } + + /* reindexing is not interested in any special sort order */ + notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED); + + status = notmuch_query_search_messages (query, &messages); + if (print_status_query ("notmuch reindex", query, status)) + return status; + + for (; + notmuch_messages_valid (messages) && ! interrupted; + notmuch_messages_move_to_next (messages)) { + message = notmuch_messages_get (messages); + + notmuch_message_reindex(message, indexopts); + notmuch_message_destroy (message); + if (ret != NOTMUCH_STATUS_SUCCESS) + break; + } + + notmuch_query_destroy (query); + + return ret || interrupted; +} + +int +notmuch_reindex_command (notmuch_config_t *config, int argc, char *argv[]) +{ + char *query_string = NULL; + notmuch_database_t *notmuch; + struct sigaction action; + int opt_index; + int ret; + notmuch_param_t *indexopts = NULL; + + /* Set up our handler for SIGINT */ + memset (&action, 0, sizeof (struct sigaction)); + action.sa_handler = handle_sigint; + sigemptyset (&action.sa_mask); + action.sa_flags = SA_RESTART; + sigaction (SIGINT, &action, NULL); + + notmuch_opt_desc_t options[] = { + { NOTMUCH_OPT_INHERIT, (void *) ¬much_shared_options, NULL, 0, 0 }, + { 0, 0, 0, 0, 0 } + }; + + opt_index = parse_arguments (argc, argv, options, 1); + if (opt_index < 0) + return EXIT_FAILURE; + + notmuch_process_shared_options (argv[0]); + + if (notmuch_database_open (notmuch_config_get_database_path (config), + NOTMUCH_DATABASE_MODE_READ_WRITE, ¬much)) + return EXIT_FAILURE; + + notmuch_exit_if_unmatched_db_uuid (notmuch); + + query_string = query_string_from_args (config, argc-opt_index, argv+opt_index); + if (query_string == NULL) { + fprintf (stderr, "Out of memory\n"); + return EXIT_FAILURE; + } + + if (*query_string == '\0') { + fprintf (stderr, "Error: notmuch reindex requires at least one search term.\n"); + return EXIT_FAILURE; + } + + ret = reindex_query (notmuch, query_string, indexopts); + + notmuch_database_destroy (notmuch); + + return ret || interrupted ? EXIT_FAILURE : EXIT_SUCCESS; +} diff --git a/notmuch.c b/notmuch.c index 8e332ce6..201c7454 100644 --- a/notmuch.c +++ b/notmuch.c @@ -123,6 +123,8 @@ static command_t commands[] = { "Restore the tags from the given dump file (see 'dump')." }, { "compact", notmuch_compact_command, NOTMUCH_CONFIG_OPEN, "Compact the notmuch database." }, + { "reindex", notmuch_reindex_command, NOTMUCH_CONFIG_OPEN, + "Re-index all messages matching the search terms." }, { "config", notmuch_config_command, NOTMUCH_CONFIG_OPEN, "Get or set settings in the notmuch configuration file." }, { "help", notmuch_help_command, NOTMUCH_CONFIG_CREATE, /* create but don't save config */ diff --git a/performance-test/M04-reindex.sh b/performance-test/M04-reindex.sh new file mode 100755 index 00000000..d36e061b --- /dev/null +++ b/performance-test/M04-reindex.sh @@ -0,0 +1,11 @@ +#!/bin/bash + +test_description='reindex' + +. ./perf-test-lib.sh || exit 1 + +memory_start + +memory_run 'reindex *' "notmuch reindex '*'" + +memory_done diff --git a/performance-test/T03-reindex.sh b/performance-test/T03-reindex.sh new file mode 100755 index 00000000..7af2d22d --- /dev/null +++ b/performance-test/T03-reindex.sh @@ -0,0 +1,13 @@ +#!/bin/bash + +test_description='tagging' + +. ./perf-test-lib.sh || exit 1 + +time_start + +time_run 'reindex *' "notmuch reindex '*'" +time_run 'reindex *' "notmuch reindex '*'" +time_run 'reindex *' "notmuch reindex '*'" + +time_done diff --git a/test/T700-reindex.sh b/test/T700-reindex.sh new file mode 100755 index 00000000..32385a72 --- /dev/null +++ b/test/T700-reindex.sh @@ -0,0 +1,21 @@ +#!/usr/bin/env bash +test_description='reindexing messages' +. ./test-lib.sh || exit 1 + +add_email_corpus + +test_begin_subtest 'reindex preserves message-ids' +notmuch search --output=messages '*' > EXPECTED +# remove duplicate file +rm $MAIL_DIR/bar/18:2, +notmuch reindex '*' +notmuch search --output=messages '*' > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest 'reindex preserves tags' +notmuch dump > EXPECTED +notmuch reindex '*' +notmuch dump > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_done -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [rfc patch v3 5/6] test: add known broken test for duplicate message id 2017-04-04 1:47 ` third round of indexing all files David Bremner ` (3 preceding siblings ...) 2017-04-04 1:47 ` [rfc patch v3 4/6] add "notmuch reindex" subcommand David Bremner @ 2017-04-04 1:47 ` David Bremner 2017-04-04 1:47 ` [rfc patch v3 6/6] lib: index message files with duplicate message-ids David Bremner 2017-04-04 11:10 ` third round of indexing all files David Bremner 6 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-04 1:47 UTC (permalink / raw) To: David Bremner, notmuch There are many other problems that could be tested, but this one we have some hope of fixing because it doesn't require UI changes, just indexing changes. --- test/T670-duplicate-mid.sh | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100755 test/T670-duplicate-mid.sh diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh new file mode 100755 index 00000000..88bd12cb --- /dev/null +++ b/test/T670-duplicate-mid.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash +test_description="duplicate message ids" +. ./test-lib.sh || exit 1 + +add_message '[id]="id:duplicate"' '[subject]="message 1"' +add_message '[id]="id:duplicate"' '[subject]="message 2"' + +test_begin_subtest 'Search for second subject' +test_subtest_known_broken +cat <<EOF >EXPECTED +MAIL_DIR/msg-001 +MAIL_DIR/msg-002 +EOF +notmuch search --output=files subject:'"message 2"' | notmuch_dir_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_done -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [rfc patch v3 6/6] lib: index message files with duplicate message-ids 2017-04-04 1:47 ` third round of indexing all files David Bremner ` (4 preceding siblings ...) 2017-04-04 1:47 ` [rfc patch v3 5/6] test: add known broken test for duplicate message id David Bremner @ 2017-04-04 1:47 ` David Bremner 2017-04-04 11:10 ` third round of indexing all files David Bremner 6 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-04 1:47 UTC (permalink / raw) To: David Bremner, notmuch The corresponding xapian document just gets more terms added to it, but this doesn't seem to break anything. --- lib/database.cc | 3 +++ test/T670-duplicate-mid.sh | 22 +++++++++++++++++++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 5bc131a3..3b9f7828 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -2582,6 +2582,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch, if (ret) goto DONE; } else { + ret = _notmuch_message_index_file (message, message_file); + if (ret) + goto DONE; ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; } diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh index 88bd12cb..2c77e11e 100755 --- a/test/T670-duplicate-mid.sh +++ b/test/T670-duplicate-mid.sh @@ -2,11 +2,10 @@ test_description="duplicate message ids" . ./test-lib.sh || exit 1 -add_message '[id]="id:duplicate"' '[subject]="message 1"' -add_message '[id]="id:duplicate"' '[subject]="message 2"' +add_message '[id]="duplicate"' '[subject]="message 1"' +add_message '[id]="duplicate"' '[subject]="message 2"' test_begin_subtest 'Search for second subject' -test_subtest_known_broken cat <<EOF >EXPECTED MAIL_DIR/msg-001 MAIL_DIR/msg-002 @@ -14,4 +13,21 @@ EOF notmuch search --output=files subject:'"message 2"' | notmuch_dir_sanitize > OUTPUT test_expect_equal_file EXPECTED OUTPUT +add_message '[id]="duplicate"' '[body]="sekrit"' +test_begin_subtest 'search for body in duplicate file' +cat <<EOF >EXPECTED +MAIL_DIR/msg-001 +MAIL_DIR/msg-002 +MAIL_DIR/msg-003 +EOF +notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest 'reindex removes terms from duplicate file' +rm $MAIL_DIR/msg-003 +notmuch reindex id:duplicate +cp /dev/null EXPECTED +notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + test_done -- 2.11.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: third round of indexing all files 2017-04-04 1:47 ` third round of indexing all files David Bremner ` (5 preceding siblings ...) 2017-04-04 1:47 ` [rfc patch v3 6/6] lib: index message files with duplicate message-ids David Bremner @ 2017-04-04 11:10 ` David Bremner 6 siblings, 0 replies; 14+ messages in thread From: David Bremner @ 2017-04-04 11:10 UTC (permalink / raw) To: notmuch David Bremner <david@tethera.net> writes: > It seems noticeably faster (on the order of 30-50% faster) and the > code is quite a bit simpler to adapt the approach in [1] to only > delete the terms we are going to re-add via indexing. > > This obsoletes the previous series at [2]. It still has all of the > issues mentioned there UI-wise, and the question of the index options > design probably needs more thought. > Some belated testing reveals this implimentation is pretty broken. It probably won't eat your database, but that's only because I forgot to add a call to _notmuch_message_sync. So I'd recommend passing on this for now. The previous approach is probably OK, although I'm going to bash at this fancier approach a bit to see if I can make it work. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-04-04 11:10 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-04-02 13:16 second round of indexing all files David Bremner 2017-04-02 13:16 ` [rfc patch v2 1/5] lib: add definitions for notmuch_param_t David Bremner 2017-04-02 13:16 ` [rfc patch v2 2/5] added notmuch_message_reindex David Bremner 2017-04-02 13:16 ` [rfc patch v2 3/5] add "notmuch reindex" subcommand David Bremner 2017-04-02 13:16 ` [rfc patch v2 4/5] test: add known broken test for duplicate message id David Bremner 2017-04-02 13:16 ` [rfc patch v2 5/5] lib: index message files with duplicate message-ids David Bremner 2017-04-04 1:47 ` third round of indexing all files David Bremner 2017-04-04 1:47 ` [rfc patch v3 1/6] lib: add definitions for notmuch_param_t David Bremner 2017-04-04 1:47 ` [rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms David Bremner 2017-04-04 1:47 ` [rfc patch v3 3/6] added notmuch_message_reindex David Bremner 2017-04-04 1:47 ` [rfc patch v3 4/6] add "notmuch reindex" subcommand David Bremner 2017-04-04 1:47 ` [rfc patch v3 5/6] test: add known broken test for duplicate message id David Bremner 2017-04-04 1:47 ` [rfc patch v3 6/6] lib: index message files with duplicate message-ids David Bremner 2017-04-04 11:10 ` third round of indexing all files David Bremner
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).