unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Add new dump/restore format and batch tagging.
@ 2012-11-18 19:04 david
  2012-11-18 19:04 ` [PATCH 01/16] hex-escape: (en|de)code strings to/from restricted character set david
                   ` (16 more replies)
  0 siblings, 17 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch

This is a patch series with some history, if you will forgive the
semi-inadvertant pun. I list that at that end, for the curious.

If this series goes in, in the future we might consider whether
restore --accumulate provides important functionality over batch
tagging; at the moment I suggest leaving it as the extra code to
support it is minimal, and it does support peoples existing scripts
using the old dump/restore format.

History
-------

About 1 year ago, Petter Reinholdtsen observed a problem with dumping
and restoring message-id's with spaces in them. 

    id:2flfwhht87d.fsf@diskless.uio.no

There followed a proposed fix

      id:1323808075-7417-1-git-send-email-david@tethera.net

Which Dmitry had a few helpful things to say about the hex encoding
libs.

Jani took that foundation and proposed two versions of the batch
tagging, most recently at

id:cover.1334404979.git.jani@nikula.org. 

There was some discussion with Jamie about the file format for batch
tagging in the thread

     id:cover.1333231401.git.jani@nikula.org

id:1323808075-7417-1-git-send-email-david@tethera.net
The first 6 of these patches obsolete the series

    id:1345382314-5330-1-git-send-email-david@tethera.net

which was revied by Tomi and Ethan. I think I implemented their
suggestions.

Although I didn't re-read that whole thread, I believe this version of
the patches address's Jamie's concerns by using exactly the same
format for restore and tag --batch (renamed from Jani's choice of
--stdin).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/16] hex-escape: (en|de)code strings to/from restricted character set
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 02/16] test/hex-xcode: new test binary david
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

The character set is chosen to be suitable for pathnames, and the same
as that used by contrib/nmbug

[With additions by Jani Nikula]
---
 util/Makefile.local |    2 +-
 util/hex-escape.c   |  168 +++++++++++++++++++++++++++++++++++++++++++++++++++
 util/hex-escape.h   |   41 +++++++++++++
 3 files changed, 210 insertions(+), 1 deletion(-)
 create mode 100644 util/hex-escape.c
 create mode 100644 util/hex-escape.h

diff --git a/util/Makefile.local b/util/Makefile.local
index c7cae61..3ca623e 100644
--- a/util/Makefile.local
+++ b/util/Makefile.local
@@ -3,7 +3,7 @@
 dir := util
 extra_cflags += -I$(srcdir)/$(dir)
 
-libutil_c_srcs := $(dir)/xutil.c $(dir)/error_util.c
+libutil_c_srcs := $(dir)/xutil.c $(dir)/error_util.c $(dir)/hex-escape.c
 
 libutil_modules := $(libutil_c_srcs:.c=.o)
 
diff --git a/util/hex-escape.c b/util/hex-escape.c
new file mode 100644
index 0000000..d8905d0
--- /dev/null
+++ b/util/hex-escape.c
@@ -0,0 +1,168 @@
+/* hex-escape.c -  Manage encoding and decoding of byte strings into path names
+ *
+ * Copyright (c) 2011 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner <david@tethera.net>
+ */
+
+#include <assert.h>
+#include <string.h>
+#include <talloc.h>
+#include <ctype.h>
+#include "error_util.h"
+#include "hex-escape.h"
+
+static const size_t default_buf_size = 1024;
+
+static const char *output_charset =
+    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+-_@=.:,";
+
+static const char escape_char = '%';
+
+static int
+is_output (char c)
+{
+    return (strchr (output_charset, c) != NULL);
+}
+
+static int
+maybe_realloc (void *ctx, size_t needed, char **out, size_t *out_size)
+{
+    if (*out_size < needed) {
+
+	if (*out == NULL)
+	    *out = talloc_size (ctx, needed);
+	else
+	    *out = talloc_realloc (ctx, *out, char, needed);
+
+	if (*out == NULL)
+	    return 0;
+
+	*out_size = needed;
+    }
+    return 1;
+}
+
+hex_status_t
+hex_encode (void *ctx, const char *in, char **out, size_t *out_size)
+{
+
+    const unsigned char *p;
+    char *q;
+
+    size_t escape_count = 0;
+    size_t len = 0;
+    size_t needed;
+
+    assert (ctx); assert (in); assert (out); assert (out_size);
+
+    for (p = (unsigned char *) in; *p; p++) {
+	escape_count += (!is_output (*p));
+	len++;
+    }
+
+    needed = len + escape_count * 2 + 1;
+
+    if (*out == NULL)
+	*out_size = 0;
+
+    if (!maybe_realloc (ctx, needed, out, out_size))
+	return HEX_OUT_OF_MEMORY;
+
+    q = *out;
+    p = (unsigned char *) in;
+
+    while (*p) {
+	if (is_output (*p)) {
+	    *q++ = *p++;
+	} else {
+	    sprintf (q, "%%%02x", *p++);
+	    q += 3;
+	}
+    }
+
+    *q = '\0';
+    return HEX_SUCCESS;
+}
+
+/* Hex decode 'in' to 'out'.
+ *
+ * This must succeed for in == out to support hex_decode_inplace().
+ */
+static hex_status_t
+hex_decode_internal (const char *in, unsigned char *out)
+{
+    char buf[3];
+
+    while (*in) {
+	if (*in == escape_char) {
+	    char *endp;
+
+	    /* This also handles unexpected end-of-string. */
+	    if (!isxdigit ((unsigned char) in[1]) ||
+		!isxdigit ((unsigned char) in[2]))
+		return HEX_SYNTAX_ERROR;
+
+	    buf[0] = in[1];
+	    buf[1] = in[2];
+	    buf[2] = '\0';
+
+	    *out = strtoul (buf, &endp, 16);
+
+	    if (endp != buf + 2)
+		return HEX_SYNTAX_ERROR;
+
+	    in += 3;
+	    out++;
+	} else {
+	    *out++ = *in++;
+	}
+    }
+
+    *out = '\0';
+
+    return HEX_SUCCESS;
+}
+
+hex_status_t
+hex_decode_inplace (char *s)
+{
+    /* A decoded string is never longer than the encoded one, so it is
+     * safe to decode a string onto itself. */
+    return hex_decode_internal (s, (unsigned char *) s);
+}
+
+hex_status_t
+hex_decode (void *ctx, const char *in, char **out, size_t * out_size)
+{
+    const char *p;
+    size_t escape_count = 0;
+    size_t needed = 0;
+
+    assert (ctx); assert (in); assert (out); assert (out_size);
+
+    size_t len = strlen (in);
+
+    for (p = in; *p; p++)
+	escape_count += (*p == escape_char);
+
+    needed = len - escape_count * 2 + 1;
+
+    if (!maybe_realloc (ctx, needed, out, out_size))
+	return HEX_OUT_OF_MEMORY;
+
+    return hex_decode_internal (in, (unsigned char *) *out);
+}
diff --git a/util/hex-escape.h b/util/hex-escape.h
new file mode 100644
index 0000000..5182042
--- /dev/null
+++ b/util/hex-escape.h
@@ -0,0 +1,41 @@
+#ifndef _HEX_ESCAPE_H
+#define _HEX_ESCAPE_H
+
+typedef enum hex_status {
+    HEX_SUCCESS = 0,
+    HEX_SYNTAX_ERROR,
+    HEX_OUT_OF_MEMORY
+} hex_status_t;
+
+/*
+ * The API for hex_encode() and hex_decode() is modelled on that for
+ * getline.
+ *
+ * If 'out' points to a NULL pointer a char array of the appropriate
+ * size is allocated using talloc, and out_size is updated.
+ *
+ * If 'out' points to a non-NULL pointer, it assumed to describe an
+ * existing char array, with the size given in *out_size.  This array
+ * may be resized by talloc_realloc if needed; in this case *out_size
+ * will also be updated.
+ *
+ * Note that it is an error to pass a NULL pointer for any parameter
+ * of these routines.
+ */
+
+hex_status_t
+hex_encode (void *talloc_ctx, const char *in, char **out,
+            size_t *out_size);
+
+hex_status_t
+hex_decode (void *talloc_ctx, const char *in, char **out,
+            size_t *out_size);
+
+/*
+ * Non-allocating hex decode to decode 's' in-place. The length of the
+ * result is always equal to or shorter than the length of the
+ * original.
+ */
+hex_status_t
+hex_decode_inplace (char *s);
+#endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/16] test/hex-xcode: new test binary
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
  2012-11-18 19:04 ` [PATCH 01/16] hex-escape: (en|de)code strings to/from restricted character set david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 03/16] test/hex-escaping: new test for hex escaping routines david
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

This program is used both as a test-bed/unit-tester for
../util/hex-escape.c, and also as a utility in future tests of dump
and restore.
---
 test/.gitignore     |    1 +
 test/Makefile.local |   13 ++++++-
 test/basic          |    1 +
 test/hex-xcode.c    |  103 +++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 116 insertions(+), 2 deletions(-)
 create mode 100644 test/hex-xcode.c

diff --git a/test/.gitignore b/test/.gitignore
index e63c689..be7ab5e 100644
--- a/test/.gitignore
+++ b/test/.gitignore
@@ -3,4 +3,5 @@ corpus.mail
 smtp-dummy
 symbol-test
 arg-test
+hex-xcode
 tmp.*
diff --git a/test/Makefile.local b/test/Makefile.local
index 9ae130a..8da4c56 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -13,6 +13,9 @@ smtp_dummy_modules = $(smtp_dummy_srcs:.c=.o)
 $(dir)/arg-test: $(dir)/arg-test.o command-line-arguments.o util/libutil.a
 	$(call quiet,CC) -I. $^ -o $@
 
+$(dir)/hex-xcode: $(dir)/hex-xcode.o command-line-arguments.o util/libutil.a
+	$(call quiet,CC) -I. $^ -o $@ -ltalloc
+
 $(dir)/smtp-dummy: $(smtp_dummy_modules)
 	$(call quiet,CC) $^ -o $@
 
@@ -24,8 +27,13 @@ $(dir)/parse-time: $(dir)/parse-time.o parse-time-string/parse-time-string.o
 
 .PHONY: test check
 
-test-binaries: $(dir)/arg-test $(dir)/smtp-dummy $(dir)/symbol-test \
-	$(dir)/parse-time
+TEST_BINARIES=$(dir)/arg-test \
+	      $(dir)/hex-xcode \
+	      $(dir)/parse-time \
+	      $(dir)/smtp-dummy \
+	      $(dir)/symbol-test
+
+test-binaries: $(TEST_BINARIES)
 
 test:	all test-binaries
 	@${dir}/notmuch-test $(OPTIONS)
@@ -36,5 +44,6 @@ SRCS := $(SRCS) $(smtp_dummy_srcs)
 CLEAN := $(CLEAN) $(dir)/smtp-dummy $(dir)/smtp-dummy.o \
 	 $(dir)/symbol-test $(dir)/symbol-test.o \
 	 $(dir)/arg-test $(dir)/arg-test.o \
+	 $(dir)/hex-xcode $(dir)/hex-xcode.o \
 	 $(dir)/parse-time $(dir)/parse-time.o \
 	 $(dir)/corpus.mail $(dir)/test-results $(dir)/tmp.*
diff --git a/test/basic b/test/basic
index 1b842d2..2a571ac 100755
--- a/test/basic
+++ b/test/basic
@@ -56,6 +56,7 @@ tests_in_suite=$(for i in $TESTS; do echo $i; done | sort)
 available=$(find "$TEST_DIRECTORY" -maxdepth 1 -type f -perm +111  \
     ! -name aggregate-results.sh	\
     ! -name arg-test			\
+    ! -name hex-xcode			\
     ! -name notmuch-test		\
     ! -name parse-time			\
     ! -name smtp-dummy			\
diff --git a/test/hex-xcode.c b/test/hex-xcode.c
new file mode 100644
index 0000000..eec6541
--- /dev/null
+++ b/test/hex-xcode.c
@@ -0,0 +1,103 @@
+/* No, nothing to to with IDE from Apple Inc.
+   testbed for ../util/hex-escape.c.
+
+   usage:
+   hex-xcode [--direction=(encode|decode)] [--omit-newline] < file
+   hex-xcode [--direction=(encode|decode)] [--omit-newline] arg1 arg2 arg3 ...
+
+ */
+
+#include "notmuch-client.h"
+#include "hex-escape.h"
+#include <assert.h>
+
+
+enum direction {
+    ENCODE,
+    DECODE
+};
+
+static int
+xcode (void *ctx, enum direction dir, char *in, char **buf_p, size_t *size_p)
+{
+    hex_status_t status;
+
+    if (dir == ENCODE)
+	status = hex_encode (ctx, in, buf_p, size_p);
+    else
+	status = hex_decode (ctx, in, buf_p, size_p);
+
+    if (status == HEX_SUCCESS)
+	fputs (*buf_p, stdout);
+
+    return status;
+}
+
+
+int
+main (int argc, char **argv)
+{
+
+
+    enum direction dir = DECODE;
+    int omit_newline = FALSE;
+
+    notmuch_opt_desc_t options[] = {
+	{ NOTMUCH_OPT_KEYWORD, &dir, "direction", 'd',
+	  (notmuch_keyword_t []){ { "encode", ENCODE },
+				  { "decode", DECODE },
+				  { 0, 0 } } },
+	{ NOTMUCH_OPT_BOOLEAN, &omit_newline, "omit-newline", 'n', 0 },
+	{ 0, 0, 0, 0, 0 }
+    };
+
+    int opt_index = parse_arguments (argc, argv, options, 1);
+
+    if (opt_index < 0)
+	exit (1);
+
+    void *ctx = talloc_new (NULL);
+
+    char *line = NULL;
+    size_t line_size;
+    ssize_t line_len;
+
+    char *buffer = NULL;
+    size_t buf_size = 0;
+
+    notmuch_bool_t read_stdin = TRUE;
+
+    for (; opt_index < argc; opt_index++) {
+
+	if (xcode (ctx, dir, argv[opt_index],
+		   &buffer, &buf_size) != HEX_SUCCESS)
+	    return 1;
+
+	if (!omit_newline)
+	    putchar ('\n');
+
+	read_stdin = FALSE;
+    }
+
+    if (!read_stdin)
+	return 0;
+
+    while ((line_len = getline (&line, &line_size, stdin)) != -1) {
+
+	chomp_newline (line);
+
+	if (xcode (ctx, dir, line, &buffer, &buf_size) != HEX_SUCCESS)
+	    return 1;
+
+	if (!omit_newline)
+	    putchar ('\n');
+
+    }
+
+    if (line)
+	free (line);
+
+    talloc_free (ctx);
+
+    return 0;
+}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/16] test/hex-escaping: new test for hex escaping routines
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
  2012-11-18 19:04 ` [PATCH 01/16] hex-escape: (en|de)code strings to/from restricted character set david
  2012-11-18 19:04 ` [PATCH 02/16] test/hex-xcode: new test binary david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 04/16] test: add database routines for testing david
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

These are more like unit tests, to (try to) make sure the library
functionality is working before building more complicated things on
top of it.
---
 test/hex-escaping |   26 ++++++++++++++++++++++++++
 test/notmuch-test |    1 +
 2 files changed, 27 insertions(+)
 create mode 100755 test/hex-escaping

diff --git a/test/hex-escaping b/test/hex-escaping
new file mode 100755
index 0000000..f34cc8c
--- /dev/null
+++ b/test/hex-escaping
@@ -0,0 +1,26 @@
+#!/usr/bin/env bash
+test_description="hex encoding and decoding"
+. ./test-lib.sh
+
+test_begin_subtest "round trip"
+find $TEST_DIRECTORY/corpus -type f -print | sort | xargs cat > EXPECTED
+$TEST_DIRECTORY/hex-xcode --direction=encode < EXPECTED | $TEST_DIRECTORY/hex-xcode --direction=decode > OUTPUT
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "punctuation"
+tag1='comic_swear=$&^%$^%\\//-+$^%$'
+tag_enc1=$($TEST_DIRECTORY/hex-xcode --direction=encode "$tag1")
+test_expect_equal "$tag_enc1" "comic_swear=%24%26%5e%25%24%5e%25%5c%5c%2f%2f-+%24%5e%25%24"
+
+test_begin_subtest "round trip newlines"
+printf 'this\n tag\t has\n spaces\n' > EXPECTED.$test_count
+$TEST_DIRECTORY/hex-xcode --direction=encode  < EXPECTED.$test_count |\
+	$TEST_DIRECTORY/hex-xcode --direction=decode > OUTPUT.$test_count
+test_expect_equal_file EXPECTED.$test_count OUTPUT.$test_count
+
+test_begin_subtest "round trip 8bit chars"
+echo '%c3%91%c3%a5%c3%b0%c3%a3%c3%a5%c3%a9-%c3%8f%c3%8a' > EXPECTED.$test_count
+$TEST_DIRECTORY/hex-xcode --direction=decode  < EXPECTED.$test_count |\
+	$TEST_DIRECTORY/hex-xcode --direction=encode > OUTPUT.$test_count
+test_expect_equal_file EXPECTED.$test_count OUTPUT.$test_count
+test_done
diff --git a/test/notmuch-test b/test/notmuch-test
index 9a1b375..d2e90e2 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -60,6 +60,7 @@ TESTS="
   emacs-hello
   emacs-show
   missing-headers
+  hex-escaping
   parse-time-string
   search-date
 "
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/16] test: add database routines for testing.
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (2 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 03/16] test/hex-escaping: new test for hex escaping routines david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 05/16] test: add generator for random "stub" messages david
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

Initially, provide a way to create "stub" messages in the notmuch
database without corresponding files.  This is essentially cut and
paste from lib/database.cc. This is a seperate file since we don't
want to export these symbols from libnotmuch or bloat the library with
non-exported code.
---
 test/Makefile.local  |    1 +
 test/database-test.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++
 test/database-test.h |   21 +++++++++++++++
 3 files changed, 94 insertions(+)
 create mode 100644 test/database-test.c
 create mode 100644 test/database-test.h

diff --git a/test/Makefile.local b/test/Makefile.local
index 8da4c56..8479f91 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -45,5 +45,6 @@ CLEAN := $(CLEAN) $(dir)/smtp-dummy $(dir)/smtp-dummy.o \
 	 $(dir)/symbol-test $(dir)/symbol-test.o \
 	 $(dir)/arg-test $(dir)/arg-test.o \
 	 $(dir)/hex-xcode $(dir)/hex-xcode.o \
+	 $(dir)/database-test.o \
 	 $(dir)/parse-time $(dir)/parse-time.o \
 	 $(dir)/corpus.mail $(dir)/test-results $(dir)/tmp.*
diff --git a/test/database-test.c b/test/database-test.c
new file mode 100644
index 0000000..f0f1c8e
--- /dev/null
+++ b/test/database-test.c
@@ -0,0 +1,72 @@
+/*
+ * Database routines intended only for testing, not exported from
+ * library.
+ *
+ * Copyright (c) 2012 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner <david@tethera.net>
+ */
+
+#include "notmuch-private.h"
+#include "database-test.h"
+
+notmuch_status_t
+notmuch_database_add_stub_message (notmuch_database_t *notmuch,
+				   const char *message_id,
+				   const char **tags)
+{
+    const char **tag;
+    notmuch_status_t ret;
+    notmuch_private_status_t private_status;
+    notmuch_message_t *message;
+
+    ret = _notmuch_database_ensure_writable (notmuch);
+    if (ret)
+	return ret;
+
+    message = _notmuch_message_create_for_message_id (notmuch,
+						      message_id,
+						      &private_status);
+    if (message == NULL) {
+	return COERCE_STATUS (private_status,
+			      "Unexpected status value from _notmuch_message_create_for_message_id");
+
+    }
+
+    if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+	_notmuch_message_add_term (message, "type", "mail");
+    } else {
+	return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+    }
+
+    if (tags) {
+	ret = notmuch_message_freeze (message);
+	if (ret)
+	    return ret;
+
+	for (tag = tags; *tag; tag++) {
+	    ret = notmuch_message_add_tag (message, *tag);
+	    if (ret)
+		return ret;
+	}
+    }
+
+    ret = notmuch_message_thaw (message);
+    if (ret)
+	return ret;
+
+    return NOTMUCH_STATUS_SUCCESS;
+}
diff --git a/test/database-test.h b/test/database-test.h
new file mode 100644
index 0000000..84f7988
--- /dev/null
+++ b/test/database-test.h
@@ -0,0 +1,21 @@
+#ifndef _DATABASE_TEST_H
+#define _DATABASE_TEST_H
+/* Add a new stub message to the given notmuch database.
+ *
+ * At least the following return values are possible:
+ *
+ * NOTMUCH_STATUS_SUCCESS: Message successfully added to database.
+ *
+ * NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID: Message has the same message
+ *	ID as another message already in the database.
+ *
+ * NOTMUCH_STATUS_READ_ONLY_DATABASE: Database was opened in read-only
+ *	mode so no message can be added.
+ */
+
+notmuch_status_t
+notmuch_database_add_stub_message (notmuch_database_t *database,
+				   const char *message_id,
+				   const char **tag_list);
+
+#endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/16] test: add generator for random "stub" messages
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (3 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 04/16] test: add database routines for testing david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 06/16] test: add broken roundtrip test david
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

Initial use case is testing dump and restore, so we only have
message-ids and tags.

The message ID's are nothing like RFC compliant, but it doesn't seem
any harder to roundtrip random UTF-8 strings than RFC-compliant ones.

Tags are UTF-8, even though notmuch is in principle more generous than
that.

updated for id:m2wr04ocro.fsf@guru.guru-group.fi

- talk about Unicode value rather some specific encoding
- call talloc_realloc less times
---
 test/.gitignore      |    1 +
 test/Makefile.local  |   10 +++
 test/basic           |    1 +
 test/random-corpus.c |  197 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 209 insertions(+)
 create mode 100644 test/random-corpus.c

diff --git a/test/.gitignore b/test/.gitignore
index be7ab5e..1eff7ce 100644
--- a/test/.gitignore
+++ b/test/.gitignore
@@ -4,4 +4,5 @@ smtp-dummy
 symbol-test
 arg-test
 hex-xcode
+random-corpus
 tmp.*
diff --git a/test/Makefile.local b/test/Makefile.local
index 8479f91..6a9f15e 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -16,6 +16,14 @@ $(dir)/arg-test: $(dir)/arg-test.o command-line-arguments.o util/libutil.a
 $(dir)/hex-xcode: $(dir)/hex-xcode.o command-line-arguments.o util/libutil.a
 	$(call quiet,CC) -I. $^ -o $@ -ltalloc
 
+random_corpus_deps =  $(dir)/random-corpus.o  $(dir)/database-test.o \
+			notmuch-config.o command-line-arguments.o \
+			lib/libnotmuch.a util/libutil.a \
+			parse-time-string/libparse-time-string.a
+
+$(dir)/random-corpus: $(random_corpus_deps)
+	$(call quiet,CC) $(CFLAGS_FINAL) $^ -o $@ $(CONFIGURE_LDFLAGS)
+
 $(dir)/smtp-dummy: $(smtp_dummy_modules)
 	$(call quiet,CC) $^ -o $@
 
@@ -29,6 +37,7 @@ $(dir)/parse-time: $(dir)/parse-time.o parse-time-string/parse-time-string.o
 
 TEST_BINARIES=$(dir)/arg-test \
 	      $(dir)/hex-xcode \
+	      $(dir)/random-corpus \
 	      $(dir)/parse-time \
 	      $(dir)/smtp-dummy \
 	      $(dir)/symbol-test
@@ -46,5 +55,6 @@ CLEAN := $(CLEAN) $(dir)/smtp-dummy $(dir)/smtp-dummy.o \
 	 $(dir)/arg-test $(dir)/arg-test.o \
 	 $(dir)/hex-xcode $(dir)/hex-xcode.o \
 	 $(dir)/database-test.o \
+	 $(dir)/random-corpus $(dir)/random-corpus.o \
 	 $(dir)/parse-time $(dir)/parse-time.o \
 	 $(dir)/corpus.mail $(dir)/test-results $(dir)/tmp.*
diff --git a/test/basic b/test/basic
index 2a571ac..f93469f 100755
--- a/test/basic
+++ b/test/basic
@@ -59,6 +59,7 @@ available=$(find "$TEST_DIRECTORY" -maxdepth 1 -type f -perm +111  \
     ! -name hex-xcode			\
     ! -name notmuch-test		\
     ! -name parse-time			\
+    ! -name random-corpus		\
     ! -name smtp-dummy			\
     ! -name symbol-test			\
     ! -name test-verbose		\
diff --git a/test/random-corpus.c b/test/random-corpus.c
new file mode 100644
index 0000000..91e98b1
--- /dev/null
+++ b/test/random-corpus.c
@@ -0,0 +1,197 @@
+/*
+ * Generate a random corpus of stub messages.
+ *
+ * Initial use case is testing dump and restore, so we only have
+ * message-ids and tags.
+ *
+ * Generated message-id's and tags are intentionally nasty.
+ *
+ * Copyright (c) 2012 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner <david@tethera.net>
+ */
+
+#include <stdlib.h>
+#include <assert.h>
+#include <talloc.h>
+#include <string.h>
+#include <glib.h>
+#include <math.h>
+
+#include "notmuch-client.h"
+#include "command-line-arguments.h"
+#include "database-test.h"
+
+/* Current largest Unicode value defined. Note that most of these will
+ * be printed as boxes in most fonts.
+ */
+
+#define GLYPH_MAX 0x10FFFE
+
+static gunichar
+random_unichar ()
+{
+    int start = 1, stop = GLYPH_MAX;
+    int class = random() % 2;
+
+    /*
+     *  Choose about half ascii as test characters, as ascii
+     *  punctation and whitespace is the main cause of problems for
+     *  the (old) restore parser
+    */
+    switch (class) {
+    case 0:
+	/* ascii */
+	start = 0x01;
+	stop = 0x7f;
+	break;
+    case 1:
+	/* the rest of unicode */
+	start = 0x80;
+	stop = GLYPH_MAX;
+    }
+
+    if (start == stop)
+	return start;
+    else
+	return start + (random() % (stop - start + 1));
+}
+
+static char *
+random_utf8_string (void *ctx, size_t char_count)
+{
+    size_t offset = 0;
+    size_t i;
+
+    gchar *buf = NULL;
+    size_t buf_size = 0;
+
+    for (i = 0; i < char_count; i++) {
+	gunichar randomchar;
+	size_t written;
+
+	/* 6 for one glyph, one for null, one for luck */
+	while (buf_size - offset < 8) {
+	    buf_size = 2 * buf_size + 8;
+	    buf = talloc_realloc (ctx, buf, gchar, buf_size);
+	}
+
+	randomchar = random_unichar();
+
+	written = g_unichar_to_utf8 (randomchar, buf + offset);
+
+	if (written <= 0) {
+	    fprintf (stderr, "error converting to utf8\n");
+	    exit (1);
+	}
+
+	offset += written;
+
+    }
+    buf[offset] = 0;
+    return buf;
+}
+
+
+int
+main (int argc, char **argv)
+{
+
+    void *ctx = talloc_new (NULL);
+
+    char *config_path  = NULL;
+    notmuch_config_t *config;
+    notmuch_database_t *notmuch;
+
+    int num_messages = 500;
+    int max_tags = 10;
+    // leave room for UTF-8 encoding.
+    int tag_len = NOTMUCH_TAG_MAX / 6;
+    // NOTMUCH_MESSAGE_ID_MAX is not exported, so we make a
+    // conservative guess.
+    int message_id_len = (NOTMUCH_TAG_MAX - 20) / 6;
+
+    int seed = 734569;
+
+    notmuch_opt_desc_t options[] = {
+	{ NOTMUCH_OPT_STRING, &config_path, "config-path", 'c', 0 },
+	{ NOTMUCH_OPT_INT, &num_messages, "num-messages", 'n', 0 },
+	{ NOTMUCH_OPT_INT, &max_tags, "max-tags", 'm', 0 },
+	{ NOTMUCH_OPT_INT, &message_id_len, "message-id-len", 'M', 0 },
+	{ NOTMUCH_OPT_INT, &tag_len, "tag-len", 't', 0 },
+	{ NOTMUCH_OPT_INT, &seed, "seed", 's', 0 },
+	{ 0, 0, 0, 0, 0 }
+    };
+
+    int opt_index = parse_arguments (argc, argv, options, 1);
+
+    if (opt_index < 0)
+	exit (1);
+
+    if (config_path == NULL) {
+	fprintf (stderr, "configuration path must be specified");
+	exit (1);
+    }
+
+    config = notmuch_config_open (ctx, config_path, NULL);
+    if (config == NULL)
+	return 1;
+
+    if (notmuch_database_open (notmuch_config_get_database_path (config),
+			       NOTMUCH_DATABASE_MODE_READ_WRITE, &notmuch))
+	return 1;
+
+    srandom (seed);
+
+    int count;
+    for (count = 0; count < num_messages; count++) {
+	int j;
+	int num_tags = random () % (max_tags + 1);
+	int this_mid_len = random () % message_id_len + 1;
+	const char **tag_list;
+	char *mid;
+	notmuch_status_t status;
+
+	do {
+	    mid = random_utf8_string (ctx, this_mid_len);
+
+	    tag_list = talloc_realloc (ctx, NULL, const char *, num_tags + 2);
+
+	    tag_list[0] = "random-corpus";
+
+	    for (j = 0; j < num_tags; j++) {
+		int this_tag_len = random () % tag_len + 1;
+
+		tag_list[j + 1] = random_utf8_string (ctx, this_tag_len);
+	    }
+
+	    tag_list[j + 1] = NULL;
+
+	    status = notmuch_database_add_stub_message (notmuch, mid, tag_list);
+	} while (status == NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID);
+
+	if (status != NOTMUCH_STATUS_SUCCESS) {
+	    fprintf (stderr, "error %d adding message", status);
+	    exit (status);
+	}
+    }
+
+    notmuch_database_destroy (notmuch);
+
+    talloc_free (ctx);
+
+    return 0;
+}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/16] test: add broken roundtrip test
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (4 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 05/16] test: add generator for random "stub" messages david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 07/16] notmuch-dump: add --format=(batch-tag|sup) david
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

We demonstrate the current notmuch restore parser being confused by
message-id's and tags containing non alpha numeric characters
(particularly space and parentheses are problematic because they are
not escaped by notmuch dump).

We save the files as hex escaped on disk so that the output from the
failing test will not confuse the terminal emulator of people running
the test.
---
 test/dump-restore |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/test/dump-restore b/test/dump-restore
index b05399c..a2204fb 100755
--- a/test/dump-restore
+++ b/test/dump-restore
@@ -85,4 +85,13 @@ test_begin_subtest "dump --output=outfile -- from:cworth"
 notmuch dump --output=dump-outfile-dash-inbox.actual -- from:cworth
 test_expect_equal_file dump-cworth.expected dump-outfile-dash-inbox.actual
 
+test_expect_success 'roundtripping random message-ids and tags' \
+    'test_subtest_known_broken &&
+     ${TEST_DIRECTORY}/random-corpus --num-messages=10 --config-path=${NOTMUCH_CONFIG} &&
+     notmuch dump | ${TEST_DIRECTORY}/hex-xcode --direction=encode > EXPECTED.$test_count &&
+     notmuch tag -random-corpus tag:random-corpus &&
+     ${TEST_DIRECTORY}/hex-xcode --direction=decode < EXPECTED.$test_count | notmuch restore 2>/dev/null &&
+     notmuch dump | ${TEST_DIRECTORY}/hex-xcode --direction=encode > OUTPUT.$test_count &&
+     test_cmp EXPECTED.$test_count OUTPUT.$test_count'
+
 test_done
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/16] notmuch-dump: add --format=(batch-tag|sup)
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (5 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 06/16] test: add broken roundtrip test david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 08/16] tag-util.[ch]: New files for common tagging routines david
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

sup is the old format, and remains the default, at least until
restore is converted to parse this format.

Each line of the batch- format is valid for notmuch tag, (with the
"notmuch tag" omitted from the front of the line). The dump format
only uses query strings of a single message-id.

Each space seperated tag/message-id is 'hex-encoded' to remove
troubling characters.  In particular this format won't have the same
problem with e.g. spaces in message-ids or tags; they will be
round-trip-able.
---
 dump-restore-private.h |   13 +++++++++++++
 notmuch-dump.c         |   42 ++++++++++++++++++++++++++++++++++++------
 2 files changed, 49 insertions(+), 6 deletions(-)
 create mode 100644 dump-restore-private.h

diff --git a/dump-restore-private.h b/dump-restore-private.h
new file mode 100644
index 0000000..896a004
--- /dev/null
+++ b/dump-restore-private.h
@@ -0,0 +1,13 @@
+#ifndef DUMP_RESTORE_PRIVATE_H
+#define DUMP_RESTORE_PRIVATE_H
+
+#include "hex-escape.h"
+#include "command-line-arguments.h"
+
+typedef enum dump_formats {
+    DUMP_FORMAT_AUTO,
+    DUMP_FORMAT_BATCH_TAG,
+    DUMP_FORMAT_SUP
+} dump_format_t;
+
+#endif
diff --git a/notmuch-dump.c b/notmuch-dump.c
index 88f598a..045ca9e 100644
--- a/notmuch-dump.c
+++ b/notmuch-dump.c
@@ -19,6 +19,7 @@
  */
 
 #include "notmuch-client.h"
+#include "dump-restore-private.h"
 
 int
 notmuch_dump_command (unused (void *ctx), int argc, char *argv[])
@@ -43,7 +44,13 @@ notmuch_dump_command (unused (void *ctx), int argc, char *argv[])
     char *output_file_name = NULL;
     int opt_index;
 
+    int output_format = DUMP_FORMAT_SUP;
+
     notmuch_opt_desc_t options[] = {
+	{ NOTMUCH_OPT_KEYWORD, &output_format, "format", 'f',
+	  (notmuch_keyword_t []){ { "sup", DUMP_FORMAT_SUP },
+				  { "batch-tag", DUMP_FORMAT_BATCH_TAG },
+				  { 0, 0 } } },
 	{ NOTMUCH_OPT_STRING, &output_file_name, "output", 'o', 0  },
 	{ 0, 0, 0, 0, 0 }
     };
@@ -83,27 +90,50 @@ notmuch_dump_command (unused (void *ctx), int argc, char *argv[])
      */
     notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED);
 
+    char *buffer = NULL;
+    size_t buffer_size = 0;
+
     for (messages = notmuch_query_search_messages (query);
 	 notmuch_messages_valid (messages);
 	 notmuch_messages_move_to_next (messages)) {
 	int first = 1;
+	const char *message_id;
+
 	message = notmuch_messages_get (messages);
+	message_id = notmuch_message_get_message_id (message);
 
-	fprintf (output,
-		 "%s (", notmuch_message_get_message_id (message));
+	if (output_format == DUMP_FORMAT_SUP) {
+	    fprintf (output, "%s (", message_id);
+	}
 
 	for (tags = notmuch_message_get_tags (message);
 	     notmuch_tags_valid (tags);
 	     notmuch_tags_move_to_next (tags)) {
-	    if (! first)
-		fprintf (output, " ");
+	    const char *tag_str = notmuch_tags_get (tags);
 
-	    fprintf (output, "%s", notmuch_tags_get (tags));
+	    if (! first)
+		fputs (" ", output);
 
 	    first = 0;
+
+	    if (output_format == DUMP_FORMAT_SUP) {
+		fputs (tag_str, output);
+	    } else {
+		if (hex_encode (notmuch, tag_str,
+				&buffer, &buffer_size) != HEX_SUCCESS)
+		    return 1;
+		fprintf (output, "+%s", buffer);
+	    }
 	}
 
-	fprintf (output, ")\n");
+	if (output_format == DUMP_FORMAT_SUP) {
+	    fputs (")\n", output);
+	} else {
+	    if (hex_encode (notmuch, message_id,
+			    &buffer, &buffer_size) != HEX_SUCCESS)
+		return 1;
+	    fprintf (output, " -- id:%s\n", buffer);
+	}
 
 	notmuch_message_destroy (message);
     }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/16] tag-util.[ch]: New files for common tagging routines
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (6 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 07/16] notmuch-dump: add --format=(batch-tag|sup) david
@ 2012-11-18 19:04 ` david
  2012-11-22 12:22   ` Mark Walters
  2012-11-18 19:04 ` [PATCH 09/16] cli: add support for batch tagging operations to "notmuch tag" david
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

These are meant to be shared between notmuch-tag and notmuch-restore.

The bulk of the routines implement a "tag operation list" abstract
data type act as a structured representation of a set of tag
operations (typically coming from a single tag command or line of
input).
---
 Makefile.local |    1 +
 tag-util.c     |  273 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tag-util.h     |   71 +++++++++++++++
 3 files changed, 345 insertions(+)
 create mode 100644 tag-util.c
 create mode 100644 tag-util.h

diff --git a/Makefile.local b/Makefile.local
index 2b91946..854867d 100644
--- a/Makefile.local
+++ b/Makefile.local
@@ -274,6 +274,7 @@ notmuch_client_srcs =		\
 	query-string.c		\
 	mime-node.c		\
 	crypto.c		\
+	tag-util.c
 
 notmuch_client_modules = $(notmuch_client_srcs:.c=.o)
 
diff --git a/tag-util.c b/tag-util.c
new file mode 100644
index 0000000..5329b1f
--- /dev/null
+++ b/tag-util.c
@@ -0,0 +1,273 @@
+#include "tag-util.h"
+#include "hex-escape.h"
+
+static char *
+strtok_len (char *s, const char *delim, size_t *len)
+{
+    /* skip initial delims */
+    s += strspn (s, delim);
+
+    /* length of token */
+    *len = strcspn (s, delim);
+
+    return *len ? s : NULL;
+}
+
+/* Tag messages according to 'input', which must consist of lines of
+ * the format:
+ *
+ * +<tag>|-<tag> [...] [--] <search-terms>
+ *
+ * Each line is interpreted similarly to "notmuch tag" command line
+ * arguments. The delimiter is one or more spaces ' '. Any characters
+ * in <tag> and <search-terms> MAY be hex encoded with %NN where NN is
+ * the hexadecimal value of the character. Any ' ' and '%' characters
+ * in <tag> and <search-terms> MUST be hex encoded (using %20 and %25,
+ * respectively). Any characters that are not part of <tag> or
+ * <search-terms> MUST NOT be hex encoded.
+ *
+ * Leading and trailing space ' ' is ignored. Empty lines and lines
+ * beginning with '#' are ignored.
+ */
+int
+parse_tag_line (void *ctx, char *line,
+		char **query_string,
+		tag_op_list_t *tag_ops)
+{
+    char *tok = line;
+    size_t tok_len = 0;
+
+    chomp_newline (line);
+    tag_op_list_reset (tag_ops);
+
+    /* Parse tags. */
+    while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) {
+	notmuch_bool_t remove;
+	char *tag;
+
+	/* Optional explicit end of tags marker. */
+	if (strncmp (tok, "--", tok_len) == 0) {
+	    tok = strtok_len (tok + tok_len, " ", &tok_len);
+	    break;
+	}
+
+	/* Implicit end of tags. */
+	if (*tok != '-' && *tok != '+')
+	    break;
+
+	/* If tag is terminated by NUL, there's no query string. */
+	if (*(tok + tok_len) == '\0') {
+	    tok = NULL;
+	    break;
+	}
+
+	/* Terminate, and start next token after terminator. */
+	*(tok + tok_len++) = '\0';
+
+	remove = (*tok == '-');
+	tag = tok + 1;
+
+	/* Refuse empty tags. */
+	if (*tag == '\0') {
+	    tok = NULL;
+	    break;
+	}
+
+	/* Decode tag. */
+	if (hex_decode_inplace (tag) != HEX_SUCCESS) {
+	    tok = NULL;
+	    break;
+	}
+
+	if (tag_op_list_append (ctx, tag_ops, tag, remove))
+	    return -1;
+    }
+
+    if (tok == NULL || tag_ops->count == 0) {
+	/* FIXME: line has been modified! */
+	fprintf (stderr, "Warning: Ignoring invalid input line: %s\n",
+		 line);
+	return 1;
+    }
+
+    /* tok now points to the query string */
+    if (hex_decode_inplace (tok) != HEX_SUCCESS) {
+	/* FIXME: line has been modified! */
+	fprintf (stderr, "Warning: Ignoring invalid input line: %s\n",
+		 line);
+	return 1;
+    }
+
+    *query_string = tok;
+
+    return 0;
+}
+
+static inline void
+message_error (notmuch_message_t *message,
+	       notmuch_status_t status,
+	       const char *format, ...)
+{
+    va_list va_args;
+
+    va_start (va_args, format);
+
+    vfprintf (stderr, format, va_args);
+    fprintf (stderr, "Message-ID: %s\n", notmuch_message_get_message_id (message));
+    fprintf (stderr, "Status: %s\n", notmuch_status_to_string (status));
+}
+
+notmuch_status_t
+tag_op_list_apply (notmuch_message_t *message,
+		   tag_op_list_t *list,
+		   tag_op_flag_t flags)
+{
+    int i;
+
+    notmuch_status_t status = 0;
+    tag_operation_t *tag_ops = list->ops;
+
+    status = notmuch_message_freeze (message);
+    if (status) {
+	message_error (message, status, "freezing message");
+	return status;
+    }
+
+    if (flags & TAG_FLAG_REMOVE_ALL) {
+	status = notmuch_message_remove_all_tags (message);
+	if (status) {
+	    message_error (message, status, "removing all tags" );
+	    return status;
+	}
+    }
+
+    for (i = 0; i < list->count; i++) {
+	if (tag_ops[i].remove) {
+	    status = notmuch_message_remove_tag (message, tag_ops[i].tag);
+	    if (status) {
+		message_error (message, status, "removing tag %s", tag_ops[i].tag);
+		return status;
+	    }
+	} else {
+	    status = notmuch_message_add_tag (message, tag_ops[i].tag);
+	    if (status) {
+		message_error (message, status, "adding tag %s", tag_ops[i].tag);
+		return status;
+	    }
+
+	}
+    }
+
+    status = notmuch_message_thaw (message);
+    if (status) {
+	message_error (message, status, "thawing message");
+	return status;
+    }
+
+
+    if (flags & TAG_FLAG_MAILDIR_SYNC) {
+	status = notmuch_message_tags_to_maildir_flags (message);
+	if (status) {
+	    message_error (message, status, "synching tags to maildir");
+	    return status;
+	}
+    }
+
+    return NOTMUCH_STATUS_SUCCESS;
+
+}
+
+
+/* Array of tagging operations (add or remove), terminated with an
+ * empty element. Size will be increased as necessary. */
+
+tag_op_list_t *
+tag_op_list_create (void *ctx)
+{
+    tag_op_list_t *list;
+
+    list = talloc (ctx, tag_op_list_t);
+    if (list == NULL)
+	return NULL;
+
+    list->size = TAG_OP_LIST_INITIAL_SIZE;
+    list->count = 0;
+
+    list->ops = talloc_array (ctx, tag_operation_t, list->size);
+    if (list->ops == NULL)
+	return NULL;
+
+    return list;
+}
+
+
+int
+tag_op_list_append (void *ctx,
+		    tag_op_list_t *list,
+		    const char *tag,
+		    notmuch_bool_t remove)
+{
+    list->ops[list->count].tag = tag;
+    list->ops[list->count].remove = remove;
+    list->count++;
+
+    /* Make room for terminating empty element and potential
+     * new tags, if necessary. This should be a fairly rare
+     * case, considering the initial array size. */
+    if (list->count == list->size) {
+	list->size *= 2;
+	list->ops = talloc_realloc (ctx, list->ops, tag_operation_t,
+				    list->size);
+	if (list->ops == NULL) {
+	    fprintf (stderr, "Out of memory.\n");
+	    return 1;
+	}
+    }
+
+    list->ops[list->count].tag = NULL;
+
+    return 0;
+}
+
+
+/* WARNING: modifies it's string argument */
+
+int
+tag_op_list_from_string (void *ctx,
+			 char *tag_string,
+			 notmuch_bool_t remove,
+			 tag_op_list_t *tag_ops)
+{
+    char *tok = tag_string;
+    size_t tok_len = 0;
+
+    tag_op_list_reset (tag_ops);
+
+/* Parse tags. */
+    while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) {
+
+	if (*(tok + tok_len) != '\0') {
+	    *(tok + tok_len) = '\0';
+	    tok_len++;
+	}
+
+	if (tag_op_list_append (ctx, tag_ops, tok, remove))
+	    return -1;
+    }
+
+    return 0;
+
+}
+
+notmuch_bool_t
+tag_op_list_empty (tag_op_list_t *list)
+{
+    return (list->count == 0);
+}
+
+
+void
+tag_op_list_reset (tag_op_list_t *list)
+{
+    list->count = 0;
+}
diff --git a/tag-util.h b/tag-util.h
new file mode 100644
index 0000000..b381b8e
--- /dev/null
+++ b/tag-util.h
@@ -0,0 +1,71 @@
+#ifndef _TAG_UTIL_H
+#define _TAG_UTIL_H
+
+#include "notmuch-client.h"
+
+typedef struct {
+    const char *tag;
+    notmuch_bool_t remove;
+} tag_operation_t;
+
+#define TAG_OP_LIST_INITIAL_SIZE 10
+
+typedef struct {
+    tag_operation_t *ops;
+    int count;
+    int size;
+} tag_op_list_t;
+
+/* Use powers of 2 */
+typedef enum  { TAG_FLAG_NONE = 0,
+		TAG_FLAG_MAILDIR_SYNC = 1,
+		TAG_FLAG_REMOVE_ALL = 2 } tag_op_flag_t;
+
+
+typedef int (*tag_callback_t)(void *ctx,
+			      notmuch_database_t *notmuch,
+			      const char *query_string,
+			      tag_op_list_t *tag_ops,
+			      tag_op_flag_t flags);
+
+int
+parse_tag_stream (void *ctx,
+		  notmuch_database_t *notmuch,
+		  FILE *input,
+		  tag_callback_t callback,
+		  tag_op_flag_t flags,
+		  volatile sig_atomic_t *interrupted);
+
+/*
+ * Returns: 0 for OK, 1 for skipped line, -1 for fatal(ish) error.
+ */
+int
+parse_tag_line (void *ctx, char *line, char **query_str, tag_op_list_t *ops);
+
+tag_op_list_t
+*tag_op_list_create (void *ctx);
+
+int
+tag_op_list_append (void *ctx,
+		    tag_op_list_t *list,
+		    const char *tag,
+		    notmuch_bool_t remove);
+
+notmuch_status_t
+tag_op_list_apply (notmuch_message_t *message,
+		   tag_op_list_t *tag_ops,
+		   tag_op_flag_t flags);
+
+int
+tag_op_list_from_string (void *ctx,
+			 char *tag_string,
+			 notmuch_bool_t remove,
+			 tag_op_list_t *tag_ops);
+
+notmuch_bool_t
+tag_op_list_empty (tag_op_list_t *list);
+
+void
+tag_op_list_reset (tag_op_list_t *list);
+
+#endif
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/16] cli: add support for batch tagging operations to "notmuch tag"
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (7 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 08/16] tag-util.[ch]: New files for common tagging routines david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 10/16] test: add test for notmuch tag --batch option david
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch

From: Jani Nikula <jani@nikula.org>

Add support for batch tagging operations through stdin to "notmuch
tag". This can be enabled with the new --stdin command line option to
"notmuch new". The input must consist of lines of the format:

+<tag>|-<tag> [...] [--] <search-terms>

Each line is interpreted similarly to "notmuch tag" command line
arguments. The delimiter is one or more spaces ' '. Any characters in
<tag> and <search-terms> MAY be hex encoded with %NN where NN is the
hexadecimal value of the character. Any ' ' and '%' characters in
<tag> and <search-terms> MUST be hex encoded (using %20 and %25,
respectively). Any characters that are not part of <tag> or
<search-terms> MUST NOT be hex encoded.

Leading and trailing space ' ' is ignored. Empty lines and lines
beginning with '#' are ignored.

Signed-off-by: Jani Nikula <jani@nikula.org>

Hacked-like-crazy-by: David Bremner <david@tethera.net>
---
 notmuch-tag.c |  177 +++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 109 insertions(+), 68 deletions(-)

diff --git a/notmuch-tag.c b/notmuch-tag.c
index 88d559b..ca120d5 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -19,6 +19,7 @@
  */
 
 #include "notmuch-client.h"
+#include "tag-util.h"
 
 static volatile sig_atomic_t interrupted;
 
@@ -54,14 +55,9 @@ _escape_tag (char *buf, const char *tag)
     return buf;
 }
 
-typedef struct {
-    const char *tag;
-    notmuch_bool_t remove;
-} tag_operation_t;
-
 static char *
 _optimize_tag_query (void *ctx, const char *orig_query_string,
-		     const tag_operation_t *tag_ops)
+		     const tag_op_list_t *list)
 {
     /* This is subtler than it looks.  Xapian ignores the '-' operator
      * at the beginning both queries and parenthesized groups and,
@@ -75,6 +71,7 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,
     const char *join = "";
     int i;
     unsigned int max_tag_len = 0;
+    const tag_operation_t *tag_ops = list->ops;
 
     /* Don't optimize if there are no tag changes. */
     if (tag_ops[0].tag == NULL)
@@ -116,12 +113,11 @@ _optimize_tag_query (void *ctx, const char *orig_query_string,
  * element. */
 static int
 tag_query (void *ctx, notmuch_database_t *notmuch, const char *query_string,
-	   tag_operation_t *tag_ops, notmuch_bool_t synchronize_flags)
+	   tag_op_list_t *tag_ops, tag_op_flag_t flags)
 {
     notmuch_query_t *query;
     notmuch_messages_t *messages;
     notmuch_message_t *message;
-    int i;
 
     /* Optimize the query so it excludes messages that already have
      * the specified set of tags. */
@@ -144,21 +140,7 @@ tag_query (void *ctx, notmuch_database_t *notmuch, const char *query_string,
 	 notmuch_messages_valid (messages) && ! interrupted;
 	 notmuch_messages_move_to_next (messages)) {
 	message = notmuch_messages_get (messages);
-
-	notmuch_message_freeze (message);
-
-	for (i = 0; tag_ops[i].tag; i++) {
-	    if (tag_ops[i].remove)
-		notmuch_message_remove_tag (message, tag_ops[i].tag);
-	    else
-		notmuch_message_add_tag (message, tag_ops[i].tag);
-	}
-
-	notmuch_message_thaw (message);
-
-	if (synchronize_flags)
-	    notmuch_message_tags_to_maildir_flags (message);
-
+	tag_op_list_apply (message, tag_ops, flags);
 	notmuch_message_destroy (message);
     }
 
@@ -170,15 +152,17 @@ tag_query (void *ctx, notmuch_database_t *notmuch, const char *query_string,
 int
 notmuch_tag_command (void *ctx, int argc, char *argv[])
 {
-    tag_operation_t *tag_ops;
-    int tag_ops_count = 0;
-    char *query_string;
+    tag_op_list_t *tag_ops = NULL;
+    char *query_string = NULL;
     notmuch_config_t *config;
     notmuch_database_t *notmuch;
     struct sigaction action;
-    notmuch_bool_t synchronize_flags;
-    int i;
-    int ret;
+    tag_op_flag_t synchronize_flags = TAG_FLAG_NONE;
+    notmuch_bool_t batch = FALSE;
+    FILE *input = stdin;
+    char *input_file_name = NULL;
+    int i, opt_index;
+    int ret = 0;
 
     /* Setup our handler for SIGINT */
     memset (&action, 0, sizeof (struct sigaction));
@@ -187,53 +171,76 @@ notmuch_tag_command (void *ctx, int argc, char *argv[])
     action.sa_flags = SA_RESTART;
     sigaction (SIGINT, &action, NULL);
 
-    argc--; argv++; /* skip subcommand argument */
+    notmuch_opt_desc_t options[] = {
+	{ NOTMUCH_OPT_BOOLEAN, &batch, "batch", 0, 0 },
+	{ NOTMUCH_OPT_STRING, &input_file_name, "input", 'i', 0 },
+	{ 0, 0, 0, 0, 0 }
+    };
 
-    /* Array of tagging operations (add or remove), terminated with an
-     * empty element. */
-    tag_ops = talloc_array (ctx, tag_operation_t, argc + 1);
-    if (tag_ops == NULL) {
-	fprintf (stderr, "Out of memory.\n");
+    opt_index = parse_arguments (argc, argv, options, 1);
+    if (opt_index < 0)
 	return 1;
+
+    if (input_file_name) {
+	batch = TRUE;
+	input = fopen (input_file_name, "r");
+	if (input == NULL) {
+	    fprintf (stderr, "Error opening %s for reading: %s\n",
+		     input_file_name, strerror (errno));
+	    return 1;
+	}
     }
 
-    for (i = 0; i < argc; i++) {
-	if (strcmp (argv[i], "--") == 0) {
-	    i++;
-	    break;
+    if (batch) {
+	if (opt_index != argc) {
+	    fprintf (stderr, "Can't specify both cmdline and stdin!\n");
+	    return 1;
 	}
-	if (argv[i][0] == '+' || argv[i][0] == '-') {
-	    if (argv[i][0] == '+' && argv[i][1] == '\0') {
-		fprintf (stderr, "Error: tag names cannot be empty.\n");
-		return 1;
+    } else {
+	/* Array of tagging operations (add or remove), terminated with an
+	 * empty element. */
+	tag_ops = tag_op_list_create (ctx);
+	if (tag_ops == NULL) {
+	    fprintf (stderr, "Out of memory.\n");
+	    return 1;
+	}
+
+	for (i = opt_index; i < argc; i++) {
+	    if (strcmp (argv[i], "--") == 0) {
+		i++;
+		break;
 	    }
-	    if (argv[i][0] == '+' && argv[i][1] == '-') {
-		/* This disallows adding the non-removable tag "-" and
-		 * enables notmuch tag to take long options in the
-		 * future. */
-		fprintf (stderr, "Error: tag names must not start with '-'.\n");
-		return 1;
+	    if (argv[i][0] == '+' || argv[i][0] == '-') {
+		/* FIXME: checks should be consistent with those in batch tagging */
+		if (argv[i][0] == '+' && argv[i][1] == '\0') {
+		    fprintf (stderr, "Error: tag names cannot be empty.\n");
+		    return 1;
+		}
+		if (argv[i][0] == '+' && argv[i][1] == '-') {
+		    /* This disallows adding the non-removable tag "-" and
+		     * enables notmuch tag to take long options in the
+		     * future. */
+		    fprintf (stderr, "Error: tag names must not start with '-'.\n");
+		    return 1;
+		}
+		tag_op_list_append (ctx, tag_ops,
+				    argv[i] + 1, (argv[i][0] == '-'));
+	    } else {
+		break;
 	    }
-	    tag_ops[tag_ops_count].tag = argv[i] + 1;
-	    tag_ops[tag_ops_count].remove = (argv[i][0] == '-');
-	    tag_ops_count++;
-	} else {
-	    break;
 	}
-    }
-
-    tag_ops[tag_ops_count].tag = NULL;
 
-    if (tag_ops_count == 0) {
-	fprintf (stderr, "Error: 'notmuch tag' requires at least one tag to add or remove.\n");
-	return 1;
-    }
+	if (tag_op_list_empty (tag_ops)) {
+	    fprintf (stderr, "Error: 'notmuch tag' requires at least one tag to add or remove.\n");
+	    return 1;
+	}
 
-    query_string = query_string_from_args (ctx, argc - i, &argv[i]);
+	query_string = query_string_from_args (ctx, argc - i, &argv[i]);
 
-    if (*query_string == '\0') {
-	fprintf (stderr, "Error: notmuch tag requires at least one search term.\n");
-	return 1;
+	if (*query_string == '\0') {
+	    fprintf (stderr, "Error: notmuch tag requires at least one search term.\n");
+	    return 1;
+	}
     }
 
     config = notmuch_config_open (ctx, NULL, NULL);
@@ -244,11 +251,45 @@ notmuch_tag_command (void *ctx, int argc, char *argv[])
 			       NOTMUCH_DATABASE_MODE_READ_WRITE, &notmuch))
 	return 1;
 
-    synchronize_flags = notmuch_config_get_maildir_synchronize_flags (config);
+    if (notmuch_config_get_maildir_synchronize_flags (config))
+	synchronize_flags = TAG_FLAG_MAILDIR_SYNC;
+
+    if (batch) {
+
+	char *line = NULL;
+	size_t line_size = 0;
+	ssize_t line_len;
+	int ret = 0;
+	tag_op_list_t *tag_ops;
+
+	tag_ops = tag_op_list_create (ctx);
+	if (tag_ops == NULL) {
+	    fprintf (stderr, "Out of memory.\n");
+	    return 1;
+	}
+
+	while ((line_len = getline (&line, &line_size, input)) != -1 &&
+	       ! interrupted) {
+
+	    char *query_string;
+
+	    ret =  parse_tag_line (ctx, line, &query_string, tag_ops);
 
-    ret = tag_query (ctx, notmuch, query_string, tag_ops, synchronize_flags);
+	    if (ret > 0)
+		continue;
+
+	    if (ret < 0 || tag_query (ctx, notmuch, query_string,
+				      tag_ops, synchronize_flags))
+		break;
+	}
+
+	if (line)
+	    free (line);
+    } else
+	ret = tag_query (ctx, notmuch, query_string, tag_ops, synchronize_flags);
 
     notmuch_database_destroy (notmuch);
 
-    return ret;
+    return ret || interrupted;
+
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/16] test: add test for notmuch tag --batch option
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (8 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 09/16] cli: add support for batch tagging operations to "notmuch tag" david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 11/16] notmuch-restore: add support for input format 'batch-tag' david
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch

From: Jani Nikula <jani@nikula.org>

Basic test of functionality, along with all combinations of options.
---
 test/tagging |   35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/test/tagging b/test/tagging
index 980ff92..574bc10 100755
--- a/test/tagging
+++ b/test/tagging
@@ -46,6 +46,41 @@ test_expect_equal "$output" "\
 thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; One (:\"  inbox tag1 unread)
 thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Two (inbox tag1 tag4 unread)"
 
+test_begin_subtest "--batch"
+notmuch tag --batch <<EOF
+# %20 is a space in tag
+-:"%20 -tag1 +tag5 +tag6 -- One
++tag1 -tag1 -tag4 +tag4 -- Two
+-tag6 One
++tag5 Two
+EOF
+output=$(notmuch search \* | notmuch_search_sanitize)
+test_expect_equal "$output" "\
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; One (inbox tag5 unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Two (inbox tag4 tag5 unread)"
+
+> batch.in  <<EOF
+# %20 is a space in tag
+-:"%20 -tag1 +tag5 +tag6 -- One
++tag1 -tag1 -tag4 +tag4 -- Two
+-tag6 One
++tag5 Two
+EOF
+
+test_begin_subtest "--input"
+notmuch tag --input=batch.in
+output=$(notmuch search \* | notmuch_search_sanitize)
+test_expect_equal "$output" "\
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; One (inbox tag5 unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Two (inbox tag4 tag5 unread)"
+
+test_begin_subtest "--batch --input"
+notmuch tag --batch --input=batch.in
+output=$(notmuch search \* | notmuch_search_sanitize)
+test_expect_equal "$output" "\
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; One (inbox tag5 unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Two (inbox tag4 tag5 unread)"
+
 test_expect_code 1 "Empty tag names" 'notmuch tag + One'
 
 test_expect_code 1 "Tag name beginning with -" 'notmuch tag +- One'
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/16] notmuch-restore: add support for input format 'batch-tag'
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (9 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 10/16] test: add test for notmuch tag --batch option david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 12/16] test: update dump-restore roundtripping test for batch-tag format david
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

This is the same as the batch input for notmuch tag, except by default
it removes all tags before modifying a given message id and only "id:"
is supported.
---
 notmuch-restore.c |  180 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 105 insertions(+), 75 deletions(-)

diff --git a/notmuch-restore.c b/notmuch-restore.c
index f03dcac..1776e99 100644
--- a/notmuch-restore.c
+++ b/notmuch-restore.c
@@ -19,18 +19,21 @@
  */
 
 #include "notmuch-client.h"
+#include "dump-restore-private.h"
+#include "tag-util.h"
+
+static volatile sig_atomic_t interrupted;
+static regex_t regex;
 
 static int
-tag_message (notmuch_database_t *notmuch, const char *message_id,
-	     char *file_tags, notmuch_bool_t remove_all,
-	     notmuch_bool_t synchronize_flags)
+tag_message (unused (void *ctx),
+	     notmuch_database_t *notmuch,
+	     const char *message_id,
+	     tag_op_list_t *tag_ops,
+	     tag_op_flag_t flags)
 {
     notmuch_status_t status;
-    notmuch_tags_t *db_tags;
-    char *db_tags_str;
     notmuch_message_t *message = NULL;
-    const char *tag;
-    char *next;
     int ret = 0;
 
     status = notmuch_database_find_message (notmuch, message_id, &message);
@@ -44,55 +47,46 @@ tag_message (notmuch_database_t *notmuch, const char *message_id,
 
     /* In order to detect missing messages, this check/optimization is
      * intentionally done *after* first finding the message. */
-    if (! remove_all && (file_tags == NULL || *file_tags == '\0'))
-	goto DONE;
-
-    db_tags_str = NULL;
-    for (db_tags = notmuch_message_get_tags (message);
-	 notmuch_tags_valid (db_tags);
-	 notmuch_tags_move_to_next (db_tags)) {
-	tag = notmuch_tags_get (db_tags);
-
-	if (db_tags_str)
-	    db_tags_str = talloc_asprintf_append (db_tags_str, " %s", tag);
-	else
-	    db_tags_str = talloc_strdup (message, tag);
-    }
+    if ( (flags & TAG_FLAG_REMOVE_ALL) || ! (tag_op_list_empty (tag_ops)))
+	tag_op_list_apply (message, tag_ops, flags);
 
-    if (((file_tags == NULL || *file_tags == '\0') &&
-	 (db_tags_str == NULL || *db_tags_str == '\0')) ||
-	(file_tags && db_tags_str && strcmp (file_tags, db_tags_str) == 0))
-	goto DONE;
+    if (message)
+	notmuch_message_destroy (message);
 
-    notmuch_message_freeze (message);
+    return ret;
+}
 
-    if (remove_all)
-	notmuch_message_remove_all_tags (message);
+static int
+parse_sup_line (void *ctx, char *line,
+		char **query_str, tag_op_list_t *tag_ops)
+{
 
-    next = file_tags;
-    while (next) {
-	tag = strsep (&next, " ");
-	if (*tag == '\0')
-	    continue;
-	status = notmuch_message_add_tag (message, tag);
-	if (status) {
-	    fprintf (stderr, "Error applying tag %s to message %s:\n",
-		     tag, message_id);
-	    fprintf (stderr, "%s\n", notmuch_status_to_string (status));
-	    ret = 1;
-	}
-    }
+    regmatch_t match[3];
+    char *file_tags;
+    int rerr;
 
-    notmuch_message_thaw (message);
+    tag_op_list_reset (tag_ops);
 
-    if (synchronize_flags)
-	notmuch_message_tags_to_maildir_flags (message);
+    chomp_newline (line);
 
-  DONE:
-    if (message)
-	notmuch_message_destroy (message);
+    /* Silently ignore blank lines */
+    if (line[0] == '\0') {
+	return 1;
+    }
 
-    return ret;
+    rerr = xregexec (&regex, line, 3, match, 0);
+    if (rerr == REG_NOMATCH) {
+	fprintf (stderr, "Warning: Ignoring invalid input line: %s\n",
+		 line);
+	return 1;
+    }
+
+    *query_str = talloc_strndup (ctx, line + match[1].rm_so,
+				 match[1].rm_eo - match[1].rm_so);
+    file_tags = talloc_strndup (ctx, line + match[2].rm_so,
+				match[2].rm_eo - match[2].rm_so);
+
+    return tag_op_list_from_string (ctx, file_tags, FALSE, tag_ops);
 }
 
 int
@@ -100,16 +94,19 @@ notmuch_restore_command (unused (void *ctx), int argc, char *argv[])
 {
     notmuch_config_t *config;
     notmuch_database_t *notmuch;
-    notmuch_bool_t synchronize_flags;
     notmuch_bool_t accumulate = FALSE;
+    tag_op_flag_t flags = 0;
+    tag_op_list_t *tag_ops;
+
     char *input_file_name = NULL;
     FILE *input = stdin;
     char *line = NULL;
     size_t line_size;
     ssize_t line_len;
-    regex_t regex;
-    int rerr;
+
+    int ret = 0;
     int opt_index;
+    int input_format = DUMP_FORMAT_AUTO;
 
     config = notmuch_config_open (ctx, NULL, NULL);
     if (config == NULL)
@@ -119,9 +116,15 @@ notmuch_restore_command (unused (void *ctx), int argc, char *argv[])
 			       NOTMUCH_DATABASE_MODE_READ_WRITE, &notmuch))
 	return 1;
 
-    synchronize_flags = notmuch_config_get_maildir_synchronize_flags (config);
+    if (notmuch_config_get_maildir_synchronize_flags (config))
+	flags |= TAG_FLAG_MAILDIR_SYNC;
 
     notmuch_opt_desc_t options[] = {
+	{ NOTMUCH_OPT_KEYWORD, &input_format, "format", 'f',
+	  (notmuch_keyword_t []){ { "auto", DUMP_FORMAT_AUTO },
+				  { "batch-tag", DUMP_FORMAT_BATCH_TAG },
+				  { "sup", DUMP_FORMAT_SUP },
+				  { 0, 0 } } },
 	{ NOTMUCH_OPT_STRING, &input_file_name, "input", 'i', 0 },
 	{ NOTMUCH_OPT_BOOLEAN,  &accumulate, "accumulate", 'a', 0 },
 	{ 0, 0, 0, 0, 0 }
@@ -134,6 +137,9 @@ notmuch_restore_command (unused (void *ctx), int argc, char *argv[])
 	return 1;
     }
 
+    if (! accumulate)
+	flags |= TAG_FLAG_REMOVE_ALL;
+
     if (input_file_name) {
 	input = fopen (input_file_name, "r");
 	if (input == NULL) {
@@ -154,35 +160,58 @@ notmuch_restore_command (unused (void *ctx), int argc, char *argv[])
      * non-space characters for the message-id, then one or more
      * spaces, then a list of space-separated tags as a sequence of
      * characters within literal '(' and ')'. */
-    if ( xregcomp (&regex,
-		   "^([^ ]+) \\(([^)]*)\\)$",
-		   REG_EXTENDED) )
-	INTERNAL_ERROR ("compile time constant regex failed.");
+    char *p;
 
-    while ((line_len = getline (&line, &line_size, input)) != -1) {
-	regmatch_t match[3];
-	char *message_id, *file_tags;
+    line_len = getline (&line, &line_size, input);
+    if (line_len == 0)
+	return 0;
 
-	chomp_newline (line);
+    for (p = line; *p; p++) {
+	if (*p == '(')
+	    input_format = DUMP_FORMAT_SUP;
+    }
 
-	rerr = xregexec (&regex, line, 3, match, 0);
-	if (rerr == REG_NOMATCH) {
-	    fprintf (stderr, "Warning: Ignoring invalid input line: %s\n",
-		     line);
-	    continue;
+    if (input_format == DUMP_FORMAT_AUTO)
+	input_format = DUMP_FORMAT_BATCH_TAG;
+
+    if (input_format == DUMP_FORMAT_SUP)
+	if ( xregcomp (&regex,
+		       "^([^ ]+) \\(([^)]*)\\)$",
+		       REG_EXTENDED) )
+	    INTERNAL_ERROR ("compile time constant regex failed.");
+
+    tag_ops = tag_op_list_create (ctx);
+    if (tag_ops == NULL) {
+	fprintf (stderr, "Out of memory.\n");
+	return 1;
+    }
+
+    do {
+	char *query_string;
+
+	if (input_format == DUMP_FORMAT_SUP) {
+	    ret =  parse_sup_line (ctx, line, &query_string, tag_ops);
+	} else {
+	    ret =  parse_tag_line (ctx, line, &query_string, tag_ops);
+
+	    if (ret == 0) {
+		if ( strncmp ("id:", query_string, 3) != 0) {
+		    fprintf (stderr, "Unsupported query: %s\n", query_string);
+		    continue;
+		}
+		query_string = query_string + 3;
+	    }
 	}
 
-	message_id = xstrndup (line + match[1].rm_so,
-			       match[1].rm_eo - match[1].rm_so);
-	file_tags = xstrndup (line + match[2].rm_so,
-			      match[2].rm_eo - match[2].rm_so);
+	if (ret > 0)
+	    continue;
 
-	tag_message (notmuch, message_id, file_tags, ! accumulate,
-		     synchronize_flags);
+	if (ret < 0 || tag_message (ctx, notmuch, query_string,
+				    tag_ops, flags))
+	    break;
+
+    }  while ((line_len = getline (&line, &line_size, input)) != -1);
 
-	free (message_id);
-	free (file_tags);
-    }
 
     regfree (&regex);
 
@@ -190,8 +219,9 @@ notmuch_restore_command (unused (void *ctx), int argc, char *argv[])
 	free (line);
 
     notmuch_database_destroy (notmuch);
+
     if (input != stdin)
 	fclose (input);
 
-    return 0;
+    return ret;
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/16] test: update dump-restore roundtripping test for batch-tag format.
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (10 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 11/16] notmuch-restore: add support for input format 'batch-tag' david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 13/16] test: second set of dump/restore --format=batch-tag tests david
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

Now we can actually round trip these crazy tags and and message ids.
hex-xcode is no longer needed as it's built in.
---
 test/dump-restore |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/test/dump-restore b/test/dump-restore
index a2204fb..e08b656 100755
--- a/test/dump-restore
+++ b/test/dump-restore
@@ -85,13 +85,14 @@ test_begin_subtest "dump --output=outfile -- from:cworth"
 notmuch dump --output=dump-outfile-dash-inbox.actual -- from:cworth
 test_expect_equal_file dump-cworth.expected dump-outfile-dash-inbox.actual
 
+
 test_expect_success 'roundtripping random message-ids and tags' \
-    'test_subtest_known_broken &&
-     ${TEST_DIRECTORY}/random-corpus --num-messages=10 --config-path=${NOTMUCH_CONFIG} &&
-     notmuch dump | ${TEST_DIRECTORY}/hex-xcode --direction=encode > EXPECTED.$test_count &&
+    '${TEST_DIRECTORY}/random-corpus --num-messages=100 --config-path=${NOTMUCH_CONFIG} &&
+     notmuch dump --format=batch-tag | sort > EXPECTED.$test_count &&
      notmuch tag -random-corpus tag:random-corpus &&
-     ${TEST_DIRECTORY}/hex-xcode --direction=decode < EXPECTED.$test_count | notmuch restore 2>/dev/null &&
-     notmuch dump | ${TEST_DIRECTORY}/hex-xcode --direction=encode > OUTPUT.$test_count &&
+     notmuch restore --format=batch-tag --input=EXPECTED.$test_count &&
+     notmuch dump --format=batch-tag | sort > OUTPUT.$test_count &&
      test_cmp EXPECTED.$test_count OUTPUT.$test_count'
-
 test_done
+
+# Note the database is "poisoned" for sup format at this point.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 13/16] test: second set of dump/restore --format=batch-tag tests
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (11 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 12/16] test: update dump-restore roundtripping test for batch-tag format david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 14/16] tag-util: optimization of tag application david
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

These one need the completed functionality in notmuch-restore. Fairly
exotic tags are tested, but no weird message id's.

We test each possible input to autodetection, both explicit (with
--format=auto) and implicit (without --format).
---
 test/dump-restore |   73 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/test/dump-restore b/test/dump-restore
index e08b656..7b67d03 100755
--- a/test/dump-restore
+++ b/test/dump-restore
@@ -85,6 +85,79 @@ test_begin_subtest "dump --output=outfile -- from:cworth"
 notmuch dump --output=dump-outfile-dash-inbox.actual -- from:cworth
 test_expect_equal_file dump-cworth.expected dump-outfile-dash-inbox.actual
 
+test_begin_subtest "Check for a safe set of message-ids"
+notmuch search --output=messages from:cworth > EXPECTED.$test_count
+notmuch search --output=messages from:cworth |\
+	$TEST_DIRECTORY/hex-xcode --direction=encode > OUTPUT.$test_count
+test_expect_equal_file OUTPUT.$test_count EXPECTED.$test_count
+
+test_begin_subtest "format=batch-tag, # round-trip"
+notmuch dump --format=sup | sort > EXPECTED.$test_count
+notmuch dump --format=batch-tag | notmuch restore --format=batch-tag
+notmuch dump --format=sup | sort > OUTPUT.$test_count
+test_expect_equal_file EXPECTED.$test_count OUTPUT.$test_count
+
+tag1='comic_swear=$&^%$^%\\//-+$^%$'
+enc1=$($TEST_DIRECTORY/hex-xcode --direction=encode "$tag1")
+
+tag2=$(printf 'this\n tag\t has\n spaces')
+enc2=$($TEST_DIRECTORY/hex-xcode --direction=encode "$tag2")
+
+enc3='%c3%91%c3%a5%c3%b0%c3%a3%c3%a5%c3%a9-%c3%8f%c3%8a'
+tag3=$($TEST_DIRECTORY/hex-xcode --direction=decode $enc3)
+
+notmuch dump --format=batch-tag > BACKUP
+
+notmuch tag +"$tag1" +"$tag2" +"$tag3" -inbox -unread "*"
+
+test_begin_subtest 'format=batch-tag, round trip with strange tags'
+    notmuch dump --format=batch-tag > EXPECTED.$test_count
+    notmuch dump --format=batch-tag | notmuch restore --format=batch-tag
+    notmuch dump --format=batch-tag > OUTPUT.$test_count
+test_expect_equal_file EXPECTED.$test_count OUTPUT.$test_count
+
+test_begin_subtest 'format=batch-tag, checking encoded output'
+    cp /dev/null EXPECTED.$test_count
+    notmuch dump --format=batch-tag -- from:cworth |\
+	 awk "{ print \"+$enc1 +$enc2 +$enc3 -- \" \$5 }" > EXPECTED.$test_count
+
+    notmuch dump --format=batch-tag -- from:cworth  > OUTPUT.$test_count
+test_expect_equal_file EXPECTED.$test_count OUTPUT.$test_count
+
+test_begin_subtest 'restoring sane tags'
+    notmuch restore --format=batch-tag < BACKUP
+    notmuch dump --format=batch-tag > OUTPUT.$test_count
+test_expect_equal_file BACKUP OUTPUT.$test_count
+
+test_begin_subtest 'format=batch-tag, restore=auto'
+    notmuch dump --format=batch-tag > EXPECTED.$test_count
+    notmuch tag -inbox -unread "*"
+    notmuch restore --format=auto < EXPECTED.$test_count
+    notmuch dump --format=batch-tag > OUTPUT.$test_count
+test_expect_equal_file EXPECTED.$test_count OUTPUT.$test_count
+
+test_begin_subtest 'format=sup, restore=auto'
+    notmuch dump --format=sup > EXPECTED.$test_count
+    notmuch tag -inbox -unread "*"
+    notmuch restore --format=auto < EXPECTED.$test_count
+    notmuch dump --format=sup > OUTPUT.$test_count
+test_expect_equal_file EXPECTED.$test_count OUTPUT.$test_count
+
+test_begin_subtest 'format=batch-tag, restore=default'
+    notmuch dump --format=batch-tag > EXPECTED.$test_count
+    notmuch tag -inbox -unread "*"
+    notmuch restore < EXPECTED.$test_count
+    notmuch dump --format=batch-tag > OUTPUT.$test_count
+test_expect_equal_file EXPECTED.$test_count OUTPUT.$test_count
+
+test_begin_subtest 'format=sup, restore=default'
+    notmuch dump --format=sup > EXPECTED.$test_count
+    notmuch tag -inbox -unread "*"
+    notmuch restore < EXPECTED.$test_count
+    notmuch dump --format=sup > OUTPUT.$test_count
+test_expect_equal_file EXPECTED.$test_count OUTPUT.$test_count
+
+notmuch dump --format=batch-tag --output=save.tags
 
 test_expect_success 'roundtripping random message-ids and tags' \
     '${TEST_DIRECTORY}/random-corpus --num-messages=100 --config-path=${NOTMUCH_CONFIG} &&
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 14/16] tag-util: optimization of tag application
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (12 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 13/16] test: second set of dump/restore --format=batch-tag tests david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 15/16] man: document notmuch tag --batch, --input options david
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

The idea is not to bother with restore operations if they don't change
the set of tags. This is actually a relatively common case.

In order to avoid fancy datastructures, this method is quadratic in
the number of tags; at least on my mail database this doesn't seem to
be a big problem.
---
 notmuch-tag.c |    2 +-
 tag-util.c    |   59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tag-util.h    |    3 ++-
 3 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/notmuch-tag.c b/notmuch-tag.c
index ca120d5..33cf78d 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -140,7 +140,7 @@ tag_query (void *ctx, notmuch_database_t *notmuch, const char *query_string,
 	 notmuch_messages_valid (messages) && ! interrupted;
 	 notmuch_messages_move_to_next (messages)) {
 	message = notmuch_messages_get (messages);
-	tag_op_list_apply (message, tag_ops, flags);
+	tag_op_list_apply (message, tag_ops, flags | TAG_FLAG_PRE_OPTIMIZED);
 	notmuch_message_destroy (message);
     }
 
diff --git a/tag-util.c b/tag-util.c
index 5329b1f..de0928b 100644
--- a/tag-util.c
+++ b/tag-util.c
@@ -117,6 +117,62 @@ message_error (notmuch_message_t *message,
     fprintf (stderr, "Status: %s\n", notmuch_status_to_string (status));
 }
 
+static int
+makes_changes (notmuch_message_t *message,
+	       tag_op_list_t *list,
+	       tag_op_flag_t flags)
+{
+
+    int i;
+
+    notmuch_tags_t *tags;
+    notmuch_bool_t changes = FALSE;
+
+    /* First, do we delete an existing tag? */
+    changes = FALSE;
+    for (tags = notmuch_message_get_tags (message);
+	 ! changes && notmuch_tags_valid (tags);
+	 notmuch_tags_move_to_next (tags)) {
+	const char *cur_tag = notmuch_tags_get (tags);
+	int last_op =  (flags & TAG_FLAG_REMOVE_ALL) ? -1 : 0;
+
+	for (i = 0; i < list->count; i++) {
+	    if (strcmp (cur_tag, list->ops[i].tag) == 0) {
+		last_op = list->ops[i].remove ? -1 : 1;
+	    }
+	}
+
+	changes = (last_op == -1);
+    }
+    notmuch_tags_destroy (tags);
+
+    if (changes)
+	return TRUE;
+
+    /* Now check for adding new tags */
+    for (i = 0; i < list->count; i++) {
+	notmuch_bool_t exists = FALSE;
+
+	for (tags = notmuch_message_get_tags (message);
+	     notmuch_tags_valid (tags);
+	     notmuch_tags_move_to_next (tags)) {
+	    const char *cur_tag = notmuch_tags_get (tags);
+	    if (strcmp (cur_tag, list->ops[i].tag) == 0) {
+		exists = TRUE;
+		break;
+	    }
+	}
+	notmuch_tags_destroy (tags);
+
+	/* the following test is conservative, it's ok to think we
+	 * make changes when we don't */
+	if ( ! exists && ! list->ops[i].remove )
+	    return TRUE;
+    }
+    return FALSE;
+
+}
+
 notmuch_status_t
 tag_op_list_apply (notmuch_message_t *message,
 		   tag_op_list_t *list,
@@ -127,6 +183,9 @@ tag_op_list_apply (notmuch_message_t *message,
     notmuch_status_t status = 0;
     tag_operation_t *tag_ops = list->ops;
 
+    if (! (flags & TAG_FLAG_PRE_OPTIMIZED) && ! makes_changes (message, list, flags))
+	return NOTMUCH_STATUS_SUCCESS;
+
     status = notmuch_message_freeze (message);
     if (status) {
 	message_error (message, status, "freezing message");
diff --git a/tag-util.h b/tag-util.h
index b381b8e..998c1b9 100644
--- a/tag-util.h
+++ b/tag-util.h
@@ -19,7 +19,8 @@ typedef struct {
 /* Use powers of 2 */
 typedef enum  { TAG_FLAG_NONE = 0,
 		TAG_FLAG_MAILDIR_SYNC = 1,
-		TAG_FLAG_REMOVE_ALL = 2 } tag_op_flag_t;
+		TAG_FLAG_REMOVE_ALL = 2,
+		TAG_FLAG_PRE_OPTIMIZED = 4 } tag_op_flag_t;
 
 
 typedef int (*tag_callback_t)(void *ctx,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 15/16] man: document notmuch tag --batch, --input options
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (13 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 14/16] tag-util: optimization of tag application david
@ 2012-11-18 19:04 ` david
  2012-11-18 19:04 ` [PATCH 16/16] notmuch-{dump,restore}.1: document new format options david
  2012-11-18 21:55 ` Add new dump/restore format and batch tagging Ethan Glasser-Camp
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch

From: Jani Nikula <jani@nikula.org>

---
 man/man1/notmuch-tag.1 |   52 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/man/man1/notmuch-tag.1 b/man/man1/notmuch-tag.1
index 0f86582..751db7b 100644
--- a/man/man1/notmuch-tag.1
+++ b/man/man1/notmuch-tag.1
@@ -4,7 +4,12 @@ notmuch-tag \- add/remove tags for all messages matching the search terms
 
 .SH SYNOPSIS
 .B notmuch tag
-.RI  "+<" tag> "|\-<" tag "> [...] [\-\-] <" search-term ">..."
+.RI "+<" tag ">|\-<" tag "> [...] [\-\-] <" search-terms ">"
+
+.B notmuch tag
+.RI "--batch"
+.RI "[ --input=<" filename "> ]"
+
 
 .SH DESCRIPTION
 
@@ -29,6 +34,51 @@ updates the maildir flags according to tag changes if the
 configuration option is enabled. See \fBnotmuch-config\fR(1) for
 details.
 
+Supported options for
+.B tag
+include
+.RS 4
+.TP 4
+.BR \-\-batch
+
+Read batch tagging operations from standard input. This is more
+efficient than repeated
+.B notmuch tag
+invocations. See
+.B TAG FILE FORMAT
+below for the input format. This option is not compatible with
+specifying tagging on the command line.
+.RE
+
+.RS 4
+.TP 4
+.BR "\-\-input=" <filename>
+
+Read input from given file, instead of from stdin. Implies
+.BR --batch .
+
+.SH TAG FILE FORMAT
+
+The input must consist of lines of the format:
+
+.RI "T +<" tag ">|\-<" tag "> [...] [\-\-] <" search-terms ">"
+
+Each line is interpreted similarly to
+.B notmuch tag
+command line arguments. The delimiter is one or more spaces ' '. Any
+characters in <tag> and <search-terms>
+.B may
+be hex encoded with %NN where NN is the hexadecimal value of the
+character. Any ' ' and '%' characters in <tag> and <search-terms>
+.B must
+be hex encoded (using %20 and %25, respectively). Any characters that
+are not part of <tag> or <search-terms>
+.B must not
+be hex encoded.
+
+Leading and trailing space ' ' is ignored. Empty lines and lines
+beginning with '#' are ignored.
+
 .SH SEE ALSO
 
 \fBnotmuch\fR(1), \fBnotmuch-config\fR(1), \fBnotmuch-count\fR(1),
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 16/16] notmuch-{dump,restore}.1: document new format options
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (14 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 15/16] man: document notmuch tag --batch, --input options david
@ 2012-11-18 19:04 ` david
  2012-11-18 21:55 ` Add new dump/restore format and batch tagging Ethan Glasser-Camp
  16 siblings, 0 replies; 20+ messages in thread
From: david @ 2012-11-18 19:04 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

From: David Bremner <bremner@debian.org>

More or less arbitrarily, notmuch-dump.1 gets the more detailed
description of the format.
---
 man/man1/notmuch-dump.1    |   58 +++++++++++++++++++++++++++++++++++++++++++
 man/man1/notmuch-restore.1 |   59 +++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 111 insertions(+), 6 deletions(-)

diff --git a/man/man1/notmuch-dump.1 b/man/man1/notmuch-dump.1
index 230deec..b57c2c1 100644
--- a/man/man1/notmuch-dump.1
+++ b/man/man1/notmuch-dump.1
@@ -5,6 +5,7 @@ notmuch-dump \- creates a plain-text dump of the tags of each message
 .SH SYNOPSIS
 
 .B "notmuch dump"
+.RB  [ "\-\-format=(sup|batch-tag)"  "] [--]"
 .RI "[ --output=<" filename "> ] [--]"
 .RI "[ <" search-term ">...]"
 
@@ -19,6 +20,63 @@ recreated from the messages themselves.  The output of notmuch dump is
 therefore the only critical thing to backup (and much more friendly to
 incremental backup than the native database files.)
 
+.TP 4
+.B \-\-format=(sup|batch-tag)
+
+Notmuch restore supports two plain text dump formats, both with one message-id
+per line, followed by a list of tags.
+
+.RS 4
+.TP 4
+.B sup
+
+The
+.B sup
+dump file format is specifically chosen to be
+compatible with the format of files produced by sup-dump.
+So if you've previously been using sup for mail, then the
+.B "notmuch restore"
+command provides you a way to import all of your tags (or labels as
+sup calls them).
+Each line has the following form
+
+.RS 4
+.RI < message-id >
+.B (
+.RI < tag "> ..."
+.B )
+
+with zero or more tags are separated by spaces. Note that (malformed)
+message-ids may contained arbitrary non-null characters. Note also
+that tags with spaces will not be correctly restored with this format.
+
+.RE
+
+.RE
+.RS 4
+.TP 4
+.B batch-tag
+
+The
+.B batch-tag
+dump format is intended to more robust against malformed message-ids
+and tags containing whitespace or non-\fBascii\fR(7) characters.
+Each line has the form
+
+.RS 4
+.RI "+<" "encoded-tag" "> " "" "+<" "encoded-tag" "> ... -- " "" " <" encoded-message-id >
+
+where encoded means that every byte not matching the regex
+.B [A-Za-z0-9+-_@=.:,]
+is replace by
+.B %nn
+where nn is the two digit hex encoding.
+The astute reader will notice this is a special case of the batch input
+format for \fBnotmuch-tag\fR(1).
+
+.RE
+
+
 With no search terms, a dump of all messages in the database will be
 generated.  A "--" argument instructs notmuch that the
 remaining arguments are search terms.
diff --git a/man/man1/notmuch-restore.1 b/man/man1/notmuch-restore.1
index 2fa8733..3860829 100644
--- a/man/man1/notmuch-restore.1
+++ b/man/man1/notmuch-restore.1
@@ -6,6 +6,7 @@ notmuch-restore \- restores the tags from the given file (see notmuch dump)
 
 .B "notmuch restore"
 .RB [ "--accumulate" ]
+.RB [ "--format=(auto|batch-tag|sup)" ]
 .RI "[ --input=<" filename "> ]"
 
 .SH DESCRIPTION
@@ -15,19 +16,51 @@ Restores the tags from the given file (see
 
 The input is read from the given filename, if any, or from stdin.
 
-Note: The dump file format is specifically chosen to be
+
+Supported options for
+.B restore
+include
+.RS 4
+.TP 4
+.B \-\-accumulate
+
+The union of the existing and new tags is applied, instead of
+replacing each message's tags as they are read in from the dump file.
+
+.RE
+.RS 4
+.TP 4
+.B \-\-format=(sup|batch-tag|auto)
+
+Notmuch restore supports two plain text dump formats, with one message-id
+per line, and a list of tags.
+For details of the actual formats, see \fBnotmuch-dump\fR(1).
+
+.RS 4
+.TP 4
+.B sup
+
+The
+.B sup
+dump file format is specifically chosen to be
 compatible with the format of files produced by sup-dump.
 So if you've previously been using sup for mail, then the
 .B "notmuch restore"
 command provides you a way to import all of your tags (or labels as
 sup calls them).
 
-The --accumulate switch causes the union of the existing and new tags to be
-applied, instead of replacing each message's tags as they are read in from the
-dump file.
+.RE
+.RS 4
+.TP 4
+.B batch-tag
 
-See \fBnotmuch-search-terms\fR(7)
-for details of the supported syntax for <search-terms>.
+The
+.B batch-tag
+dump format is intended to more robust against malformed message-ids
+and tags containing whitespace or non-\fBascii\fR(7) characters.  This
+format hex-escapes all characters those outside of a small character
+set, intended to be suitable for e.g. pathnames in most UNIX-like
+systems.
 
 .B "notmuch restore"
 updates the maildir flags according to tag changes if the
@@ -36,6 +69,20 @@ configuration option is enabled. See \fBnotmuch-config\fR(1) for
 details.
 
 .RE
+
+.RS 4
+.TP 4
+.B auto
+
+This option (the default) tries to guess the format from the
+input. For correctly formed input in either supported format, this
+heuristic, based the fact that batch-tag format contains no parentheses,
+should be accurate.
+
+.RE
+
+.RE
+
 .SH SEE ALSO
 
 \fBnotmuch\fR(1), \fBnotmuch-config\fR(1), \fBnotmuch-count\fR(1),
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Add new dump/restore format and batch tagging.
  2012-11-18 19:04 Add new dump/restore format and batch tagging david
                   ` (15 preceding siblings ...)
  2012-11-18 19:04 ` [PATCH 16/16] notmuch-{dump,restore}.1: document new format options david
@ 2012-11-18 21:55 ` Ethan Glasser-Camp
  2012-11-18 22:05   ` Ethan Glasser-Camp
  16 siblings, 1 reply; 20+ messages in thread
From: Ethan Glasser-Camp @ 2012-11-18 21:55 UTC (permalink / raw)
  To: david, notmuch

david@tethera.net writes:

> which was revied by Tomi and Ethan. I think I implemented their
> suggestions.

Actually, I don't think you implemented all of mine.

- Patch 4 still has a subject line that ends in a period. I don't think
  this is mandatory for everyone but some people consider it best
  practice. You still have the spelling "seperate" (also, patch 7 has
  "seperated").

- In patch 4, I still think this would look better if you switched the
  branches.

+    if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+       _notmuch_message_add_term (message, "type", "mail");
+    } else {
+       return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+    }

- In patch 5, I still think this looks funny:

+       int num_tags = random () % (max_tags + 1);
+       int this_mid_len = random () % message_id_len + 1;

Additionally, I would like a check that message_id_len is reasonable
(more than 1, say).

- Patch 8:

+int
+parse_tag_stream (void *ctx,
+                 notmuch_database_t *notmuch,
+                 FILE *input,
+                 tag_callback_t callback,
+                 tag_op_flag_t flags,
+                 volatile sig_atomic_t *interrupted);

Am I going crazy, or does this function not get implemented?

- Patch 11: DUMP_FORMAT can change from DUMP_FORMAT_BATCH_TAG to
  DUMP_FORMAT_SUP if a paren is anywhere on the first line. I'd prefer
  this only happens if we have DUMP_FORMAT_AUTO.

You probably want to move the comment "Dump output is..." closer to the
regex.

I don't see why it's necessary to have this:

+               query_string = query_string + 3;

- Patch 13:

+    cp /dev/null EXPECTED.$test_count
+    notmuch dump --format=batch-tag -- from:cworth |\
+        awk "{ print \"+$enc1 +$enc2 +$enc3 -- \" \$5 }" > EXPECTED.$test_count

What's the purpose of the CP here? It just creates an empty file. You
could do it with touch. Why even bother since you're going to fill it
with stuff in a second? Actually, care to explain the dump and sed call?
It looks like you're using this dump to encode the message IDs. If
format=batch-tag skips a message for some reason or terminates early,
the test won't fail.

- Patch 16:

+message-ids may contained arbitrary non-null characters. Note also

I think this should be "may contain", or something else entirely if
you're trying to describe past behavior of sup?

Ethan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Add new dump/restore format and batch tagging.
  2012-11-18 21:55 ` Add new dump/restore format and batch tagging Ethan Glasser-Camp
@ 2012-11-18 22:05   ` Ethan Glasser-Camp
  0 siblings, 0 replies; 20+ messages in thread
From: Ethan Glasser-Camp @ 2012-11-18 22:05 UTC (permalink / raw)
  To: Ethan Glasser-Camp, david, notmuch

Ethan Glasser-Camp <ethan.glasser.camp@gmail.com> writes:

> - Patch 4 still has a subject line that ends in a period. I don't think
>   this is mandatory for everyone but some people consider it best
>   practice.

Best practice, of course, would be to remove the period at the end of
the subject line. Patch 12 also has one.

Ethan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 08/16] tag-util.[ch]: New files for common tagging routines
  2012-11-18 19:04 ` [PATCH 08/16] tag-util.[ch]: New files for common tagging routines david
@ 2012-11-22 12:22   ` Mark Walters
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Walters @ 2012-11-22 12:22 UTC (permalink / raw)
  To: david, notmuch; +Cc: David Bremner


Hi

I am slowly working my way through this series. I have some queries
comments and things that confused me in this patch.

One longer term concern (but I have not got far enough to see if it
could be a problem): could someone use dump and then not be able to
restore because some of the tags are now banned (eg starting with -
etc)? Obviously anything which fails currently is not a concern, only if
some cases stop some currently working cases.


One particular comment is that most of the functions do not have a
comment defining them in tag-util.h. So for example (at this point in
reading the series) I am not sure of why we need both parse_tag_line and
tag_op_list_from_string.


On Sun, 18 Nov 2012, david@tethera.net wrote:
> From: David Bremner <bremner@debian.org>
>
> These are meant to be shared between notmuch-tag and notmuch-restore.
>
> The bulk of the routines implement a "tag operation list" abstract
> data type act as a structured representation of a set of tag
> operations (typically coming from a single tag command or line of
> input).
> ---
>  Makefile.local |    1 +
>  tag-util.c     |  273 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tag-util.h     |   71 +++++++++++++++
>  3 files changed, 345 insertions(+)
>  create mode 100644 tag-util.c
>  create mode 100644 tag-util.h
>
> diff --git a/Makefile.local b/Makefile.local
> index 2b91946..854867d 100644
> --- a/Makefile.local
> +++ b/Makefile.local
> @@ -274,6 +274,7 @@ notmuch_client_srcs =		\
>  	query-string.c		\
>  	mime-node.c		\
>  	crypto.c		\
> +	tag-util.c
>  
>  notmuch_client_modules = $(notmuch_client_srcs:.c=.o)
>  
> diff --git a/tag-util.c b/tag-util.c
> new file mode 100644
> index 0000000..5329b1f
> --- /dev/null
> +++ b/tag-util.c
> @@ -0,0 +1,273 @@
> +#include "tag-util.h"
> +#include "hex-escape.h"
> +
> +static char *
> +strtok_len (char *s, const char *delim, size_t *len)
> +{
> +    /* skip initial delims */
> +    s += strspn (s, delim);
> +
> +    /* length of token */
> +    *len = strcspn (s, delim);
> +
> +    return *len ? s : NULL;
> +}
> +
> +/* Tag messages according to 'input', which must consist of lines of
> + * the format:
> + *
> + * +<tag>|-<tag> [...] [--] <search-terms>
> + *
> + * Each line is interpreted similarly to "notmuch tag" command line
> + * arguments. The delimiter is one or more spaces ' '. Any characters
> + * in <tag> and <search-terms> MAY be hex encoded with %NN where NN is
> + * the hexadecimal value of the character. Any ' ' and '%' characters
> + * in <tag> and <search-terms> MUST be hex encoded (using %20 and %25,
> + * respectively). Any characters that are not part of <tag> or
> + * <search-terms> MUST NOT be hex encoded.
> + *
> + * Leading and trailing space ' ' is ignored. Empty lines and lines
> + * beginning with '#' are ignored.
> + */

The comment says lines starting with '#' are ignored but I don't see
anything actually ignoring them.

> +int
> +parse_tag_line (void *ctx, char *line,
> +		char **query_string,
> +		tag_op_list_t *tag_ops)
> +{
> +    char *tok = line;
> +    size_t tok_len = 0;
> +
> +    chomp_newline (line);
> +    tag_op_list_reset (tag_ops);
> +
> +    /* Parse tags. */
> +    while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) {
> +	notmuch_bool_t remove;
> +	char *tag;
> +
> +	/* Optional explicit end of tags marker. */
> +	if (strncmp (tok, "--", tok_len) == 0) {
> +	    tok = strtok_len (tok + tok_len, " ", &tok_len);
> +	    break;
> +	}
> +
> +	/* Implicit end of tags. */
> +	if (*tok != '-' && *tok != '+')
> +	    break;
> +
> +	/* If tag is terminated by NUL, there's no query string. */
> +	if (*(tok + tok_len) == '\0') {
> +	    tok = NULL;
> +	    break;
> +	}
> +
> +	/* Terminate, and start next token after terminator. */
> +	*(tok + tok_len++) = '\0';
> +
> +	remove = (*tok == '-');
> +	tag = tok + 1;
> +
> +	/* Refuse empty tags. */
> +	if (*tag == '\0') {
> +	    tok = NULL;
> +	    break;
> +	}
> +
> +	/* Decode tag. */
> +	if (hex_decode_inplace (tag) != HEX_SUCCESS) {
> +	    tok = NULL;
> +	    break;
> +	}
> +
> +	if (tag_op_list_append (ctx, tag_ops, tag, remove))
> +	    return -1;
> +    }
> +
> +    if (tok == NULL || tag_ops->count == 0) {
> +	/* FIXME: line has been modified! */
> +	fprintf (stderr, "Warning: Ignoring invalid input line: %s\n",
> +		 line);
> +	return 1;
> +    }
> +
> +    /* tok now points to the query string */
> +    if (hex_decode_inplace (tok) != HEX_SUCCESS) {
> +	/* FIXME: line has been modified! */
> +	fprintf (stderr, "Warning: Ignoring invalid input line: %s\n",
> +		 line);
> +	return 1;
> +    }
> +
> +    *query_string = tok;
> +
> +    return 0;
> +}
> +
> +static inline void
> +message_error (notmuch_message_t *message,
> +	       notmuch_status_t status,
> +	       const char *format, ...)
> +{
> +    va_list va_args;
> +
> +    va_start (va_args, format);
> +
> +    vfprintf (stderr, format, va_args);
> +    fprintf (stderr, "Message-ID: %s\n", notmuch_message_get_message_id (message));
> +    fprintf (stderr, "Status: %s\n", notmuch_status_to_string (status));
> +}
> +
> +notmuch_status_t
> +tag_op_list_apply (notmuch_message_t *message,
> +		   tag_op_list_t *list,
> +		   tag_op_flag_t flags)
> +{
> +    int i;
> +
> +    notmuch_status_t status = 0;
> +    tag_operation_t *tag_ops = list->ops;
> +
> +    status = notmuch_message_freeze (message);
> +    if (status) {
> +	message_error (message, status, "freezing message");
> +	return status;
> +    }
> +
> +    if (flags & TAG_FLAG_REMOVE_ALL) {
> +	status = notmuch_message_remove_all_tags (message);
> +	if (status) {
> +	    message_error (message, status, "removing all tags" );
> +	    return status;
> +	}
> +    }
> +
> +    for (i = 0; i < list->count; i++) {
> +	if (tag_ops[i].remove) {
> +	    status = notmuch_message_remove_tag (message, tag_ops[i].tag);
> +	    if (status) {
> +		message_error (message, status, "removing tag %s", tag_ops[i].tag);
> +		return status;
> +	    }
> +	} else {
> +	    status = notmuch_message_add_tag (message, tag_ops[i].tag);
> +	    if (status) {
> +		message_error (message, status, "adding tag %s", tag_ops[i].tag);
> +		return status;
> +	    }
> +
> +	}
> +    }
> +
> +    status = notmuch_message_thaw (message);
> +    if (status) {
> +	message_error (message, status, "thawing message");
> +	return status;
> +    }
> +
> +
> +    if (flags & TAG_FLAG_MAILDIR_SYNC) {
> +	status = notmuch_message_tags_to_maildir_flags (message);
> +	if (status) {
> +	    message_error (message, status, "synching tags to maildir");
> +	    return status;
> +	}
> +    }
> +
> +    return NOTMUCH_STATUS_SUCCESS;
> +
> +}
> +
> +
> +/* Array of tagging operations (add or remove), terminated with an
> + * empty element. Size will be increased as necessary. */

Is the list termination used (above you used list->count) or is it just
that you can always safely write to the last element?

> +
> +tag_op_list_t *
> +tag_op_list_create (void *ctx)
> +{
> +    tag_op_list_t *list;
> +
> +    list = talloc (ctx, tag_op_list_t);
> +    if (list == NULL)
> +	return NULL;
> +
> +    list->size = TAG_OP_LIST_INITIAL_SIZE;
> +    list->count = 0;
> +
> +    list->ops = talloc_array (ctx, tag_operation_t, list->size);
> +    if (list->ops == NULL)
> +	return NULL;
> +
> +    return list;
> +}
> +
> +
> +int
> +tag_op_list_append (void *ctx,
> +		    tag_op_list_t *list,
> +		    const char *tag,
> +		    notmuch_bool_t remove)
> +{
> +    list->ops[list->count].tag = tag;
> +    list->ops[list->count].remove = remove;
> +    list->count++;
> +
> +    /* Make room for terminating empty element and potential
> +     * new tags, if necessary. This should be a fairly rare
> +     * case, considering the initial array size. */
> +    if (list->count == list->size) {
> +	list->size *= 2;
> +	list->ops = talloc_realloc (ctx, list->ops, tag_operation_t,
> +				    list->size);
> +	if (list->ops == NULL) {
> +	    fprintf (stderr, "Out of memory.\n");
> +	    return 1;
> +	}
> +    }
> +
> +    list->ops[list->count].tag = NULL;
> +
> +    return 0;
> +}
> +
> +
> +/* WARNING: modifies it's string argument */
> +
> +int
> +tag_op_list_from_string (void *ctx,
> +			 char *tag_string,
> +			 notmuch_bool_t remove,
> +			 tag_op_list_t *tag_ops)

As mentioned above I wasn't sure why both this and parse_tag_line are
needed. Also should one use the other? Otherwise we could get different
allowed tags in different places?

> +{
> +    char *tok = tag_string;
> +    size_t tok_len = 0;
> +
> +    tag_op_list_reset (tag_ops);
> +
> +/* Parse tags. */
> +    while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) {
> +
> +	if (*(tok + tok_len) != '\0') {
> +	    *(tok + tok_len) = '\0';
> +	    tok_len++;
> +	}
> +
> +	if (tag_op_list_append (ctx, tag_ops, tok, remove))
> +	    return -1;
> +    }
> +
> +    return 0;
> +
> +}
> +
> +notmuch_bool_t
> +tag_op_list_empty (tag_op_list_t *list)
> +{
> +    return (list->count == 0);
> +}
> +
> +
> +void
> +tag_op_list_reset (tag_op_list_t *list)
> +{
> +    list->count = 0;
> +}
> diff --git a/tag-util.h b/tag-util.h
> new file mode 100644
> index 0000000..b381b8e
> --- /dev/null
> +++ b/tag-util.h
> @@ -0,0 +1,71 @@
> +#ifndef _TAG_UTIL_H
> +#define _TAG_UTIL_H
> +
> +#include "notmuch-client.h"
> +
> +typedef struct {
> +    const char *tag;
> +    notmuch_bool_t remove;
> +} tag_operation_t;
> +
> +#define TAG_OP_LIST_INITIAL_SIZE 10
> +
> +typedef struct {
> +    tag_operation_t *ops;
> +    int count;
> +    int size;
> +} tag_op_list_t;

I thought it would be helpful to say something about *ops: e.g., "ops
points to the first element of an array of count tag_operation_t's. This
array should be empty terminated (if true)."

One trivial comment for the next patch: the commit message says "notmuch
new" rather than "notmuch tag".

Best wishes

Mark
 
> +
> +/* Use powers of 2 */
> +typedef enum  { TAG_FLAG_NONE = 0,
> +		TAG_FLAG_MAILDIR_SYNC = 1,
> +		TAG_FLAG_REMOVE_ALL = 2 } tag_op_flag_t;
> +
> +
> +typedef int (*tag_callback_t)(void *ctx,
> +			      notmuch_database_t *notmuch,
> +			      const char *query_string,
> +			      tag_op_list_t *tag_ops,
> +			      tag_op_flag_t flags);
> +
> +int
> +parse_tag_stream (void *ctx,
> +		  notmuch_database_t *notmuch,
> +		  FILE *input,
> +		  tag_callback_t callback,
> +		  tag_op_flag_t flags,
> +		  volatile sig_atomic_t *interrupted);
> +
> +/*
> + * Returns: 0 for OK, 1 for skipped line, -1 for fatal(ish) error.
> + */
> +int
> +parse_tag_line (void *ctx, char *line, char **query_str, tag_op_list_t *ops);
> +
> +tag_op_list_t
> +*tag_op_list_create (void *ctx);
> +
> +int
> +tag_op_list_append (void *ctx,
> +		    tag_op_list_t *list,
> +		    const char *tag,
> +		    notmuch_bool_t remove);
> +
> +notmuch_status_t
> +tag_op_list_apply (notmuch_message_t *message,
> +		   tag_op_list_t *tag_ops,
> +		   tag_op_flag_t flags);
> +
> +int
> +tag_op_list_from_string (void *ctx,
> +			 char *tag_string,
> +			 notmuch_bool_t remove,
> +			 tag_op_list_t *tag_ops);
> +
> +notmuch_bool_t
> +tag_op_list_empty (tag_op_list_t *list);
> +
> +void
> +tag_op_list_reset (tag_op_list_t *list);
> +
> +#endif
> -- 
> 1.7.10.4
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-11-22 12:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-18 19:04 Add new dump/restore format and batch tagging david
2012-11-18 19:04 ` [PATCH 01/16] hex-escape: (en|de)code strings to/from restricted character set david
2012-11-18 19:04 ` [PATCH 02/16] test/hex-xcode: new test binary david
2012-11-18 19:04 ` [PATCH 03/16] test/hex-escaping: new test for hex escaping routines david
2012-11-18 19:04 ` [PATCH 04/16] test: add database routines for testing david
2012-11-18 19:04 ` [PATCH 05/16] test: add generator for random "stub" messages david
2012-11-18 19:04 ` [PATCH 06/16] test: add broken roundtrip test david
2012-11-18 19:04 ` [PATCH 07/16] notmuch-dump: add --format=(batch-tag|sup) david
2012-11-18 19:04 ` [PATCH 08/16] tag-util.[ch]: New files for common tagging routines david
2012-11-22 12:22   ` Mark Walters
2012-11-18 19:04 ` [PATCH 09/16] cli: add support for batch tagging operations to "notmuch tag" david
2012-11-18 19:04 ` [PATCH 10/16] test: add test for notmuch tag --batch option david
2012-11-18 19:04 ` [PATCH 11/16] notmuch-restore: add support for input format 'batch-tag' david
2012-11-18 19:04 ` [PATCH 12/16] test: update dump-restore roundtripping test for batch-tag format david
2012-11-18 19:04 ` [PATCH 13/16] test: second set of dump/restore --format=batch-tag tests david
2012-11-18 19:04 ` [PATCH 14/16] tag-util: optimization of tag application david
2012-11-18 19:04 ` [PATCH 15/16] man: document notmuch tag --batch, --input options david
2012-11-18 19:04 ` [PATCH 16/16] notmuch-{dump,restore}.1: document new format options david
2012-11-18 21:55 ` Add new dump/restore format and batch tagging Ethan Glasser-Camp
2012-11-18 22:05   ` Ethan Glasser-Camp

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).