unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* v5 gzip / dump restore
@ 2014-04-02  1:16 David Bremner
  2014-04-02  1:16 ` [Patch v5 1/6] dump: support gzipped output David Bremner
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: David Bremner @ 2014-04-02  1:16 UTC (permalink / raw)
  To: notmuch

This supercedes

     id:1396200700-29361-2-git-send-email-david@tethera.net

This is the first version to include the actual automatic backup
before a database upgrade.

The following are new compared to the previous version; the other
patches have only small changes, if any.

 [Patch v5 2/6] dump: when given output file name, write atomically
 [Patch v5 5/6] notmuch-new: backup tags before database upgrade
 [Patch v5 6/6] test: verify tag backup generated by database upgrade

diff versus master

 INSTALL                      | 17 +++++++---
 Makefile.local               |  2 +-
 configure                    | 28 ++++++++++++++--
 doc/man1/notmuch-dump.rst    |  3 ++
 doc/man1/notmuch-restore.rst |  8 +++++
 notmuch-client.h             |  4 ++-
 notmuch-dump.c               | 89 ++++++++++++++++++++++++++++++++++++++-----------
 notmuch-new.c                | 29 +++++++++++++++-
 notmuch-restore.c            | 37 +++++++++++---------
 test/T240-dump-restore.sh    | 26 +++++++++++++++
 test/T530-upgrade.sh         | 10 +++++-
 util/Makefile.local          |  2 +-
 util/util.h                  | 12 +++++++
 util/zlib-extra.c            | 76 +++++++++++++++++++++++++++++++++++++++++
 util/zlib-extra.h            | 11 ++++++
 15 files changed, 307 insertions(+), 47 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Patch v5 1/6] dump: support gzipped output
  2014-04-02  1:16 v5 gzip / dump restore David Bremner
@ 2014-04-02  1:16 ` David Bremner
  2014-04-02 16:50   ` Tomi Ollila
  2014-04-02  1:16 ` [Patch v5 2/6] dump: when given output file name, write atomically David Bremner
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: David Bremner @ 2014-04-02  1:16 UTC (permalink / raw)
  To: notmuch

The main goal is to support gzipped output for future internal
calls (e.g. from notmuch-new) to notmuch_database_dump.

The additional dependency is not very heavy since xapian already pulls
in zlib.
---
 INSTALL                   | 17 +++++++++++----
 Makefile.local            |  2 +-
 configure                 | 28 +++++++++++++++++++++---
 doc/man1/notmuch-dump.rst |  3 +++
 notmuch-client.h          |  4 +++-
 notmuch-dump.c            | 54 ++++++++++++++++++++++++++++++-----------------
 test/T240-dump-restore.sh | 12 +++++++++++
 7 files changed, 92 insertions(+), 28 deletions(-)

diff --git a/INSTALL b/INSTALL
index 690b0ef..df318fa 100644
--- a/INSTALL
+++ b/INSTALL
@@ -20,8 +20,8 @@ configure stage.
 
 Dependencies
 ------------
-Notmuch depends on three libraries: Xapian, GMime 2.4 or 2.6, and
-Talloc which are each described below:
+Notmuch depends on four libraries: Xapian, GMime 2.4 or 2.6,
+Talloc, and zlib which are each described below:
 
 	Xapian
 	------
@@ -60,6 +60,15 @@ Talloc which are each described below:
 
 	Talloc is available from http://talloc.samba.org/
 
+	zlib
+	----
+
+	zlib is an extremely popular compression library. It is used
+	by Xapian, so if you installed that you will already have
+	zlib. You may need to install the zlib headers separately.
+
+	zlib is available from http://zlib.net
+
 Building Documentation
 ----------------------
 
@@ -79,11 +88,11 @@ dependencies with a simple simple command line. For example:
 
   For Debian and similar:
 
-        sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev python-sphinx
+        sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev zlib1g-dev python-sphinx
 
   For Fedora and similar:
 
-	sudo yum install xapian-core-devel gmime-devel libtalloc-devel python-sphinx
+	sudo yum install xapian-core-devel gmime-devel libtalloc-devel zlib-devel python-sphinx
 
 On other systems, a similar command can be used, but the details of
 the package names may be different.
diff --git a/Makefile.local b/Makefile.local
index cb7b106..e5a20a7 100644
--- a/Makefile.local
+++ b/Makefile.local
@@ -41,7 +41,7 @@ PV_FILE=bindings/python/notmuch/version.py
 # Smash together user's values with our extra values
 FINAL_CFLAGS = -DNOTMUCH_VERSION=$(VERSION) $(CPPFLAGS) $(CFLAGS) $(WARN_CFLAGS) $(extra_cflags) $(CONFIGURE_CFLAGS)
 FINAL_CXXFLAGS = $(CPPFLAGS) $(CXXFLAGS) $(WARN_CXXFLAGS) $(extra_cflags) $(extra_cxxflags) $(CONFIGURE_CXXFLAGS)
-FINAL_NOTMUCH_LDFLAGS = $(LDFLAGS) -Lutil -lutil -Llib -lnotmuch $(AS_NEEDED_LDFLAGS) $(GMIME_LDFLAGS) $(TALLOC_LDFLAGS)
+FINAL_NOTMUCH_LDFLAGS = $(LDFLAGS) -Lutil -lutil -Llib -lnotmuch $(AS_NEEDED_LDFLAGS) $(GMIME_LDFLAGS) $(TALLOC_LDFLAGS) $(ZLIB_LDFLAGS)
 FINAL_NOTMUCH_LINKER = CC
 ifneq ($(LINKER_RESOLVES_LIBRARY_DEPENDENCIES),1)
 FINAL_NOTMUCH_LDFLAGS += $(CONFIGURE_LDFLAGS)
diff --git a/configure b/configure
index 1d430b9..d685ab3 100755
--- a/configure
+++ b/configure
@@ -340,6 +340,18 @@ else
     errors=$((errors + 1))
 fi
 
+printf "Checking for zlib development files... "
+have_zlib=0
+if pkg-config --exists zlib; then
+    printf "Yes.\n"
+    have_zlib=1
+    zlib_cflags=$(pkg-config --cflags zlib)
+    zlib_ldflags=$(pkg-config --libs zlib)
+else
+    printf "No.\n"
+    errors=$((errors + 1))
+fi
+
 printf "Checking for talloc development files... "
 if pkg-config --exists talloc; then
     printf "Yes.\n"
@@ -496,6 +508,11 @@ EOF
 	echo "	Xapian library (including development files such as headers)"
 	echo "	http://xapian.org/"
     fi
+    if [ $have_zlib -eq 0 ]; then
+	echo "	zlib library (including development files such as headers)"
+	echo "	http://zlib.net/"
+	echo
+    fi
     if [ $have_gmime -eq 0 ]; then
 	echo "	Either GMime 2.4 library" $GMIME_24_VERSION_CTR "or GMime 2.6 library" $GMIME_26_VERSION_CTR
 	echo "	(including development files such as headers)"
@@ -519,11 +536,11 @@ case a simple command will install everything you need. For example:
 
 On Debian and similar systems:
 
-	sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev
+	sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev zlib1g-dev
 
 Or on Fedora and similar systems:
 
-	sudo yum install xapian-core-devel gmime-devel libtalloc-devel
+	sudo yum install xapian-core-devel gmime-devel libtalloc-devel zlib-devel
 
 On other systems, similar commands can be used, but the details of the
 package names may be different.
@@ -844,6 +861,10 @@ XAPIAN_LDFLAGS = ${xapian_ldflags}
 GMIME_CFLAGS = ${gmime_cflags}
 GMIME_LDFLAGS = ${gmime_ldflags}
 
+# Flags needed to compile and link against zlib
+ZLIB_CFLAGS = ${zlib_cflags}
+ZLIB_LDFLAGS = ${zlib_ldflags}
+
 # Flags needed to compile and link against talloc
 TALLOC_CFLAGS = ${talloc_cflags}
 TALLOC_LDFLAGS = ${talloc_ldflags}
@@ -882,6 +903,7 @@ CONFIGURE_CFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS)      \\
 		   -DUTIL_BYTE_ORDER=\$(UTIL_BYTE_ORDER)
 
 CONFIGURE_CXXFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS)    \\
+		     \$(ZLIB_CFLAGS)					 \\
 		     \$(TALLOC_CFLAGS) -DHAVE_VALGRIND=\$(HAVE_VALGRIND) \\
 		     \$(VALGRIND_CFLAGS) \$(XAPIAN_CXXFLAGS)             \\
 		     -DHAVE_STRCASESTR=\$(HAVE_STRCASESTR)               \\
@@ -892,5 +914,5 @@ CONFIGURE_CXXFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS)    \\
 		     -DHAVE_XAPIAN_COMPACT=\$(HAVE_XAPIAN_COMPACT)       \\
 		     -DUTIL_BYTE_ORDER=\$(UTIL_BYTE_ORDER)
 
-CONFIGURE_LDFLAGS =  \$(GMIME_LDFLAGS) \$(TALLOC_LDFLAGS) \$(XAPIAN_LDFLAGS)
+CONFIGURE_LDFLAGS =  \$(GMIME_LDFLAGS) \$(TALLOC_LDFLAGS) \$(ZLIB_LDFLAGS) \$(XAPIAN_LDFLAGS)
 EOF
diff --git a/doc/man1/notmuch-dump.rst b/doc/man1/notmuch-dump.rst
index 17d1da5..d94cb4f 100644
--- a/doc/man1/notmuch-dump.rst
+++ b/doc/man1/notmuch-dump.rst
@@ -19,6 +19,9 @@ recreated from the messages themselves. The output of notmuch dump is
 therefore the only critical thing to backup (and much more friendly to
 incremental backup than the native database files.)
 
+``--gzip``
+    Compress the output in a format compatible with **gzip(1)**.
+
 ``--format=(sup|batch-tag)``
     Notmuch restore supports two plain text dump formats, both with one
     message-id per line, followed by a list of tags.
diff --git a/notmuch-client.h b/notmuch-client.h
index d110648..e1efbe0 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -450,7 +450,9 @@ typedef enum dump_formats {
 int
 notmuch_database_dump (notmuch_database_t *notmuch,
 		       const char *output_file_name,
-		       const char *query_str, dump_format_t output_format);
+		       const char *query_str,
+		       dump_format_t output_format,
+		       notmuch_bool_t gzip_output);
 
 #include "command-line-arguments.h"
 #endif
diff --git a/notmuch-dump.c b/notmuch-dump.c
index 21702d7..28342b7 100644
--- a/notmuch-dump.c
+++ b/notmuch-dump.c
@@ -21,9 +21,11 @@
 #include "notmuch-client.h"
 #include "hex-escape.h"
 #include "string-util.h"
+#include <zlib.h>
+
 
 static int
-database_dump_file (notmuch_database_t *notmuch, FILE *output,
+database_dump_file (notmuch_database_t *notmuch, gzFile output,
 		    const char *query_str, int output_format)
 {
     notmuch_query_t *query;
@@ -69,7 +71,7 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output,
 	}
 
 	if (output_format == DUMP_FORMAT_SUP) {
-	    fprintf (output, "%s (", message_id);
+	    gzprintf (output, "%s (", message_id);
 	}
 
 	for (tags = notmuch_message_get_tags (message);
@@ -78,12 +80,12 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output,
 	    const char *tag_str = notmuch_tags_get (tags);
 
 	    if (! first)
-		fputs (" ", output);
+		gzputs (output, " ");
 
 	    first = 0;
 
 	    if (output_format == DUMP_FORMAT_SUP) {
-		fputs (tag_str, output);
+		gzputs (output, tag_str);
 	    } else {
 		if (hex_encode (notmuch, tag_str,
 				&buffer, &buffer_size) != HEX_SUCCESS) {
@@ -91,12 +93,12 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output,
 			     tag_str);
 		    return EXIT_FAILURE;
 		}
-		fprintf (output, "+%s", buffer);
+		gzprintf (output, "+%s", buffer);
 	    }
 	}
 
 	if (output_format == DUMP_FORMAT_SUP) {
-	    fputs (")\n", output);
+	    gzputs (output, ")\n");
 	} else {
 	    if (make_boolean_term (notmuch, "id", message_id,
 				   &buffer, &buffer_size)) {
@@ -104,7 +106,7 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output,
 			     message_id, strerror (errno));
 		    return EXIT_FAILURE;
 	    }
-	    fprintf (output, " -- %s\n", buffer);
+	    gzprintf (output, " -- %s\n", buffer);
 	}
 
 	notmuch_message_destroy (message);
@@ -121,24 +123,36 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output,
 int
 notmuch_database_dump (notmuch_database_t *notmuch,
 		       const char *output_file_name,
-		       const char *query_str, dump_format_t output_format)
+		       const char *query_str,
+		       dump_format_t output_format,
+		       notmuch_bool_t gzip_output)
 {
-    FILE *output = stdout;
+    gzFile output;
+    const char *mode = gzip_output ? "w9" : "wT";
+
     int ret;
 
-    if (output_file_name) {
-	output = fopen (output_file_name, "w");
-	if (output == NULL) {
-	    fprintf (stderr, "Error opening %s for writing: %s\n",
-		     output_file_name, strerror (errno));
-	    return EXIT_FAILURE;
-	}
+    if (output_file_name)
+	output = gzopen (output_file_name, mode);
+    else
+	output = gzdopen (fileno (stdout), mode);
+
+    if (output == NULL) {
+	fprintf (stderr, "Error opening %s for (gzip) writing: %s\n",
+		 output_file_name ? output_file_name : "stdout", strerror (errno));
+	return EXIT_FAILURE;
     }
 
     ret = database_dump_file (notmuch, output, query_str, output_format);
 
-    if (output != stdout)
-	fclose (output);
+    if (gzflush (output, Z_FINISH)) {
+	fprintf (stderr, "Error flushing output: %s\n",
+		 gzerror (output, NULL));
+	return EXIT_FAILURE;
+    }
+
+    if (output_file_name)
+	gzclose_w (output);
 
     return ret;
 }
@@ -158,6 +172,7 @@ notmuch_dump_command (notmuch_config_t *config, int argc, char *argv[])
     int opt_index;
 
     int output_format = DUMP_FORMAT_BATCH_TAG;
+    notmuch_bool_t gzip_output = 0;
 
     notmuch_opt_desc_t options[] = {
 	{ NOTMUCH_OPT_KEYWORD, &output_format, "format", 'f',
@@ -165,6 +180,7 @@ notmuch_dump_command (notmuch_config_t *config, int argc, char *argv[])
 				  { "batch-tag", DUMP_FORMAT_BATCH_TAG },
 				  { 0, 0 } } },
 	{ NOTMUCH_OPT_STRING, &output_file_name, "output", 'o', 0  },
+	{ NOTMUCH_OPT_BOOLEAN, &gzip_output, "gzip", 'z', 0 },
 	{ 0, 0, 0, 0, 0 }
     };
 
@@ -181,7 +197,7 @@ notmuch_dump_command (notmuch_config_t *config, int argc, char *argv[])
     }
 
     ret = notmuch_database_dump (notmuch, output_file_name, query_str,
-				 output_format);
+				 output_format, gzip_output);
 
     notmuch_database_destroy (notmuch);
 
diff --git a/test/T240-dump-restore.sh b/test/T240-dump-restore.sh
index 0004438..d79aca8 100755
--- a/test/T240-dump-restore.sh
+++ b/test/T240-dump-restore.sh
@@ -68,6 +68,18 @@ test_begin_subtest "dump --output=outfile --"
 notmuch dump --output=dump-1-arg-dash.actual --
 test_expect_equal_file dump.expected dump-1-arg-dash.actual
 
+# gzipped output
+
+test_begin_subtest "dump --gzip"
+notmuch dump --gzip > dump-gzip.gz
+gunzip dump-gzip.gz
+test_expect_equal_file dump.expected dump-gzip
+
+test_begin_subtest "dump --gzip --output=outfile"
+notmuch dump --gzip --output=dump-gzip-outfile.gz
+gunzip dump-gzip-outfile.gz
+test_expect_equal_file dump.expected dump-gzip-outfile
+
 # Note, we assume all messages from cworth have a message-id
 # containing cworth.org
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Patch v5 2/6] dump: when given output file name, write atomically
  2014-04-02  1:16 v5 gzip / dump restore David Bremner
  2014-04-02  1:16 ` [Patch v5 1/6] dump: support gzipped output David Bremner
@ 2014-04-02  1:16 ` David Bremner
  2014-04-02  1:16 ` [Patch v5 3/6] util: add gz_readline David Bremner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: David Bremner @ 2014-04-02  1:16 UTC (permalink / raw)
  To: notmuch

It is useful to able to tell whether a dump completed successfully in
situtions where we don't have access to the return code.
---
 notmuch-dump.c | 61 +++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 48 insertions(+), 13 deletions(-)

diff --git a/notmuch-dump.c b/notmuch-dump.c
index 28342b7..05ed6b4 100644
--- a/notmuch-dump.c
+++ b/notmuch-dump.c
@@ -129,30 +129,65 @@ notmuch_database_dump (notmuch_database_t *notmuch,
 {
     gzFile output;
     const char *mode = gzip_output ? "w9" : "wT";
+    const char *name_for_error = output_file_name ? output_file_name : "stdout";
 
-    int ret;
+    char *tempname = NULL;
+    int outfd = -1;
+
+    int ret = -1;
+
+    if (output_file_name) {
+	tempname = talloc_asprintf (notmuch, "%s.XXXXXX", output_file_name);
+	outfd = mkstemp (tempname);
+    } else {
+	outfd = fileno (stdout);
+    }
 
-    if (output_file_name)
-	output = gzopen (output_file_name, mode);
-    else
-	output = gzdopen (fileno (stdout), mode);
+    if (outfd < 0) {
+	fprintf (stderr, "Bad output file %s\n", name_for_error);
+	goto DONE;
+    }
+
+    output = gzdopen (outfd, mode);
 
     if (output == NULL) {
 	fprintf (stderr, "Error opening %s for (gzip) writing: %s\n",
-		 output_file_name ? output_file_name : "stdout", strerror (errno));
-	return EXIT_FAILURE;
+		 name_for_error, strerror (errno));
+	goto DONE;
     }
 
     ret = database_dump_file (notmuch, output, query_str, output_format);
+    if (ret) goto DONE;
 
-    if (gzflush (output, Z_FINISH)) {
-	fprintf (stderr, "Error flushing output: %s\n",
-		 gzerror (output, NULL));
-	return EXIT_FAILURE;
+    ret = gzflush (output, Z_FINISH);
+    if (ret) {
+	fprintf (stderr, "Error flushing output: %s\n", gzerror (output, NULL));
+	goto DONE;
     }
 
-    if (output_file_name)
-	gzclose_w (output);
+    ret = fdatasync (outfd);
+    if (ret) {
+	perror ("fdatasync");
+	goto DONE;
+    }
+
+    if (output_file_name) {
+	ret = gzclose_w (output);
+	if (ret != Z_OK) {
+	    ret = EXIT_FAILURE;
+	    goto DONE;
+	}
+
+	ret = rename (tempname, output_file_name);
+	if (ret) {
+	    perror ("rename");
+	    goto DONE;
+	}
+
+    }
+ DONE:
+    if (ret != EXIT_SUCCESS && output_file_name)
+	(void) unlink (tempname);
 
     return ret;
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Patch v5 3/6] util: add gz_readline
  2014-04-02  1:16 v5 gzip / dump restore David Bremner
  2014-04-02  1:16 ` [Patch v5 1/6] dump: support gzipped output David Bremner
  2014-04-02  1:16 ` [Patch v5 2/6] dump: when given output file name, write atomically David Bremner
@ 2014-04-02  1:16 ` David Bremner
  2014-04-02  3:26   ` Austin Clements
  2014-04-02  1:16 ` [Patch v5 4/6] restore: transparently support gzipped input David Bremner
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: David Bremner @ 2014-04-02  1:16 UTC (permalink / raw)
  To: notmuch

The idea is to provide a more or less drop in replacement for readline
to read from zlib/gzip streams.  Take the opportunity to replace
malloc with talloc.
---
 util/Makefile.local |  2 +-
 util/util.h         | 12 +++++++++
 util/zlib-extra.c   | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 util/zlib-extra.h   | 11 ++++++++
 4 files changed, 100 insertions(+), 1 deletion(-)
 create mode 100644 util/util.h
 create mode 100644 util/zlib-extra.c
 create mode 100644 util/zlib-extra.h

diff --git a/util/Makefile.local b/util/Makefile.local
index 29c0ce6..e2a5b65 100644
--- a/util/Makefile.local
+++ b/util/Makefile.local
@@ -4,7 +4,7 @@ dir := util
 extra_cflags += -I$(srcdir)/$(dir)
 
 libutil_c_srcs := $(dir)/xutil.c $(dir)/error_util.c $(dir)/hex-escape.c \
-		  $(dir)/string-util.c $(dir)/talloc-extra.c
+		  $(dir)/string-util.c $(dir)/talloc-extra.c $(dir)/zlib-extra.c
 
 libutil_modules := $(libutil_c_srcs:.c=.o)
 
diff --git a/util/util.h b/util/util.h
new file mode 100644
index 0000000..8663cfc
--- /dev/null
+++ b/util/util.h
@@ -0,0 +1,12 @@
+#ifndef _UTIL_H
+#define _UTIL_H
+
+typedef enum util_status {
+    UTIL_SUCCESS = 0,
+    UTIL_ERROR = 1,
+    UTIL_OUT_OF_MEMORY,
+    UTIL_EOF,
+    UTIL_FILE,
+} util_status_t;
+
+#endif
diff --git a/util/zlib-extra.c b/util/zlib-extra.c
new file mode 100644
index 0000000..cb1eba0
--- /dev/null
+++ b/util/zlib-extra.c
@@ -0,0 +1,76 @@
+/* zlib-extra.c -  Extra or enhanced routines for compressed I/O.
+ *
+ * Copyright (c) 2014 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner <david@tethera.net>
+ */
+
+#include "zlib-extra.h"
+#include <talloc.h>
+#include <stdio.h>
+#include <string.h>
+
+/* mimic POSIX/glibc getline, but on a zlib gzFile stream, and using talloc */
+util_status_t
+gz_getline (void *talloc_ctx, char **bufptr, size_t *bufsiz, ssize_t *bytes_read,
+	    gzFile stream)
+{
+    size_t len = *bufsiz;
+    char *buf = *bufptr;
+    size_t offset = 0;
+
+    if (len == 0 || buf == NULL) {
+	/* same as getdelim from gnulib */
+	len = 120;
+	buf = talloc_size (talloc_ctx, len);
+	if (buf == NULL)
+	    return UTIL_OUT_OF_MEMORY;
+    }
+
+    while (1) {
+	if (! gzgets (stream, buf + offset, len - offset)) {
+	    int zlib_status = 0;
+	    (void) gzerror (stream, &zlib_status);
+	    switch (zlib_status) {
+	    case Z_OK:
+		/* follow getline behaviour */
+		*bytes_read = -1;
+		return UTIL_EOF;
+		break;
+	    case Z_ERRNO:
+		return UTIL_FILE;
+		break;
+	    default:
+		return UTIL_ERROR;
+	    }
+	}
+
+	offset += strlen (buf + offset);
+
+	if ( buf[offset - 1] == '\n' )
+	    break;
+
+	len *= 2;
+	buf = talloc_realloc (talloc_ctx, buf, char, len);
+	if (buf == NULL)
+	    return UTIL_OUT_OF_MEMORY;
+    }
+
+    *bufptr = buf;
+    *bufsiz = len;
+    *bytes_read = offset;
+    return UTIL_SUCCESS;
+}
diff --git a/util/zlib-extra.h b/util/zlib-extra.h
new file mode 100644
index 0000000..ed46ac1
--- /dev/null
+++ b/util/zlib-extra.h
@@ -0,0 +1,11 @@
+#ifndef _ZLIB_EXTRA_H
+#define _ZLIB_EXTRA_H
+
+#include <zlib.h>
+#include "util.h"
+/* Like getline, but read from a gzFile. Allocation is with talloc */
+util_status_t
+gz_getline (void *ctx, char **lineptr, size_t *line_size, ssize_t *bytes_read,
+	    gzFile stream);
+
+#endif
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Patch v5 4/6] restore: transparently support gzipped input
  2014-04-02  1:16 v5 gzip / dump restore David Bremner
                   ` (2 preceding siblings ...)
  2014-04-02  1:16 ` [Patch v5 3/6] util: add gz_readline David Bremner
@ 2014-04-02  1:16 ` David Bremner
  2014-04-02  2:49   ` Austin Clements
  2014-04-02  1:16 ` [Patch v5 5/6] notmuch-new: backup tags before database upgrade David Bremner
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: David Bremner @ 2014-04-02  1:16 UTC (permalink / raw)
  To: notmuch

We rely completely on zlib to do the right thing in detecting gzipped
input. Since our dump format is chosen to be 7 bit ascii, this should
be fine.
---
 doc/man1/notmuch-restore.rst |  8 ++++++++
 notmuch-restore.c            | 37 +++++++++++++++++++++----------------
 test/T240-dump-restore.sh    | 14 ++++++++++++++
 3 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/doc/man1/notmuch-restore.rst b/doc/man1/notmuch-restore.rst
index d6cf19a..936b138 100644
--- a/doc/man1/notmuch-restore.rst
+++ b/doc/man1/notmuch-restore.rst
@@ -50,6 +50,14 @@ Supported options for **restore** include
             format, this heuristic, based the fact that batch-tag format
             contains no parentheses, should be accurate.
 
+GZIPPED INPUT
+=============
+
+\ **notmuch restore** will detect if the input is compressed in
+**gzip(1)** format and automatically decompress it while reading. This
+detection does not depend on file naming and in particular works for
+standard input.
+
 SEE ALSO
 ========
 
diff --git a/notmuch-restore.c b/notmuch-restore.c
index c54d513..86bce20 100644
--- a/notmuch-restore.c
+++ b/notmuch-restore.c
@@ -22,6 +22,7 @@
 #include "hex-escape.h"
 #include "tag-util.h"
 #include "string-util.h"
+#include "zlib-extra.h"
 
 static regex_t regex;
 
@@ -128,7 +129,7 @@ notmuch_restore_command (notmuch_config_t *config, int argc, char *argv[])
     tag_op_list_t *tag_ops;
 
     char *input_file_name = NULL;
-    FILE *input = stdin;
+    gzFile input;
     char *line = NULL;
     void *line_ctx = NULL;
     size_t line_size;
@@ -163,13 +164,15 @@ notmuch_restore_command (notmuch_config_t *config, int argc, char *argv[])
     if (! accumulate)
 	flags |= TAG_FLAG_REMOVE_ALL;
 
-    if (input_file_name) {
-	input = fopen (input_file_name, "r");
-	if (input == NULL) {
-	    fprintf (stderr, "Error opening %s for reading: %s\n",
-		     input_file_name, strerror (errno));
-	    return EXIT_FAILURE;
-	}
+    if (input_file_name)
+	input = gzopen (input_file_name, "r");
+    else
+	input = gzdopen (fileno (stdin), "r");
+
+    if (input == NULL) {
+	fprintf (stderr, "Error opening %s for (gzip) reading: %s\n",
+		 input_file_name ? input_file_name : "stdin", strerror (errno));
+	return EXIT_FAILURE;
     }
 
     if (opt_index < argc) {
@@ -184,12 +187,17 @@ notmuch_restore_command (notmuch_config_t *config, int argc, char *argv[])
     }
 
     do {
-	line_len = getline (&line, &line_size, input);
+	util_status_t status;
+
+	status = gz_getline (line_ctx, &line, &line_size, &line_len, input);
 
 	/* empty input file not considered an error */
-	if (line_len < 0)
+	if (status == UTIL_EOF)
 	    return EXIT_SUCCESS;
 
+	if (status)
+	    return EXIT_FAILURE;
+
     } while ((line_len == 0) ||
 	     (line[0] == '#') ||
 	     /* the cast is safe because we checked about for line_len < 0 */
@@ -254,7 +262,7 @@ notmuch_restore_command (notmuch_config_t *config, int argc, char *argv[])
 	if (ret)
 	    break;
 
-    }  while ((line_len = getline (&line, &line_size, input)) != -1);
+    }  while (gz_getline (line_ctx, &line, &line_size, &line_len, input) == UTIL_SUCCESS);
 
     if (line_ctx != NULL)
 	talloc_free (line_ctx);
@@ -262,13 +270,10 @@ notmuch_restore_command (notmuch_config_t *config, int argc, char *argv[])
     if (input_format == DUMP_FORMAT_SUP)
 	regfree (&regex);
 
-    if (line)
-	free (line);
-
     notmuch_database_destroy (notmuch);
 
-    if (input != stdin)
-	fclose (input);
+    if (input_file_name != NULL)
+	gzclose_r (input);
 
     return ret ? EXIT_FAILURE : EXIT_SUCCESS;
 }
diff --git a/test/T240-dump-restore.sh b/test/T240-dump-restore.sh
index d79aca8..50d4d48 100755
--- a/test/T240-dump-restore.sh
+++ b/test/T240-dump-restore.sh
@@ -80,6 +80,20 @@ notmuch dump --gzip --output=dump-gzip-outfile.gz
 gunzip dump-gzip-outfile.gz
 test_expect_equal_file dump.expected dump-gzip-outfile
 
+test_begin_subtest "restoring gzipped stdin"
+notmuch dump --gzip --output=backup.gz
+notmuch tag +new_tag '*'
+notmuch restore < backup.gz
+notmuch dump --output=dump.actual
+test_expect_equal_file dump.expected dump.actual
+
+test_begin_subtest "restoring gzipped file"
+notmuch dump --gzip --output=backup.gz
+notmuch tag +new_tag '*'
+notmuch restore --input=backup.gz
+notmuch dump --output=dump.actual
+test_expect_equal_file dump.expected dump.actual
+
 # Note, we assume all messages from cworth have a message-id
 # containing cworth.org
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Patch v5 5/6] notmuch-new: backup tags before database upgrade
  2014-04-02  1:16 v5 gzip / dump restore David Bremner
                   ` (3 preceding siblings ...)
  2014-04-02  1:16 ` [Patch v5 4/6] restore: transparently support gzipped input David Bremner
@ 2014-04-02  1:16 ` David Bremner
  2014-04-02  3:35   ` Austin Clements
  2014-04-02  1:16 ` [Patch v5 6/6] test: verify tag backup generated by " David Bremner
  2014-04-02  2:07 ` [Patch v5 2/6] dump: when given output file name, write atomically Austin Clements
  6 siblings, 1 reply; 17+ messages in thread
From: David Bremner @ 2014-04-02  1:16 UTC (permalink / raw)
  To: notmuch

All we do here is calculate the backup filename, and call the existing
dump routine.

Also take the opportity to add a message about being safe to
interrupt.
---
 notmuch-new.c        | 29 ++++++++++++++++++++++++++++-
 test/T530-upgrade.sh |  4 +++-
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/notmuch-new.c b/notmuch-new.c
index 82acf69..e0431c6 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -989,8 +989,35 @@ notmuch_new_command (notmuch_config_t *config, int argc, char *argv[])
 	    return EXIT_FAILURE;
 
 	if (notmuch_database_needs_upgrade (notmuch)) {
-	    if (add_files_state.verbosity >= VERBOSITY_NORMAL)
+	    time_t now = time (NULL);
+	    struct tm *gm_time = gmtime (&now);
+
+	    /* since dump files are written atomically, the amount of
+	     * harm from overwriting one within a second seems
+	     * relatively small. */
+
+	    const char *backup_name =
+		talloc_asprintf (notmuch, "%s/backup-%04d-%02d-%02d-%02d:%02d:%02d.gz",
+				 dot_notmuch_path,
+				 gm_time->tm_year + 1900,
+				 gm_time->tm_mon + 1,
+				 gm_time->tm_mday,
+				 gm_time->tm_hour,
+				 gm_time->tm_min,
+				 gm_time->tm_sec);
+
+	    if (add_files_state.verbosity >= VERBOSITY_NORMAL) {
 		printf ("Welcome to a new version of notmuch! Your database will now be upgraded.\n");
+		printf ("This process is safe to interrupt.\n");
+		printf ("Backing up tags to %s\n", backup_name);
+	    }
+
+	    if (notmuch_database_dump (notmuch, backup_name, "",
+				       DUMP_FORMAT_BATCH_TAG, TRUE)) {
+		fprintf (stderr, "Backup failed. Aborting upgrade");
+		return EXIT_FAILURE;
+	    }
+
 	    gettimeofday (&add_files_state.tv_start, NULL);
 	    notmuch_database_upgrade (notmuch,
 				      add_files_state.verbosity >= VERBOSITY_NORMAL ? upgrade_print_progress : NULL,
diff --git a/test/T530-upgrade.sh b/test/T530-upgrade.sh
index 67bbf31..d46e3d1 100755
--- a/test/T530-upgrade.sh
+++ b/test/T530-upgrade.sh
@@ -26,9 +26,11 @@ output=$(notmuch search path:foo)
 test_expect_equal "$output" ""
 
 test_begin_subtest "database upgrade from format version 1"
-output=$(notmuch new)
+output=$(notmuch new | sed -e 's/^Backing up tags to .*$/Backing up tags to FILENAME/')
 test_expect_equal "$output" "\
 Welcome to a new version of notmuch! Your database will now be upgraded.
+This process is safe to interrupt.
+Backing up tags to FILENAME
 Your notmuch database has now been upgraded to database format version 2.
 No new mail."
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Patch v5 6/6] test: verify tag backup generated by database upgrade
  2014-04-02  1:16 v5 gzip / dump restore David Bremner
                   ` (4 preceding siblings ...)
  2014-04-02  1:16 ` [Patch v5 5/6] notmuch-new: backup tags before database upgrade David Bremner
@ 2014-04-02  1:16 ` David Bremner
  2014-04-02  2:07 ` [Patch v5 2/6] dump: when given output file name, write atomically Austin Clements
  6 siblings, 0 replies; 17+ messages in thread
From: David Bremner @ 2014-04-02  1:16 UTC (permalink / raw)
  To: notmuch

'pre upgrade dump' is not much of a test, but at least this way we get
somewhat sensible behaviour if it fails.
---
 test/T530-upgrade.sh | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/test/T530-upgrade.sh b/test/T530-upgrade.sh
index d46e3d1..7a68ddf 100755
--- a/test/T530-upgrade.sh
+++ b/test/T530-upgrade.sh
@@ -25,6 +25,8 @@ test_begin_subtest "path: search does not work with old database version"
 output=$(notmuch search path:foo)
 test_expect_equal "$output" ""
 
+test_expect_success 'pre upgrade dump' 'notmuch dump | sort > pre-upgrade-dump'
+
 test_begin_subtest "database upgrade from format version 1"
 output=$(notmuch new | sed -e 's/^Backing up tags to .*$/Backing up tags to FILENAME/')
 test_expect_equal "$output" "\
@@ -34,6 +36,10 @@ Backing up tags to FILENAME
 Your notmuch database has now been upgraded to database format version 2.
 No new mail."
 
+test_begin_subtest "tag backup matches pre-upgrade dump"
+gunzip -c ${MAIL_DIR}/.notmuch/backup-*.gz | sort > backup-dump
+test_expect_equal_file pre-upgrade-dump backup-dump
+
 test_begin_subtest "folder: no longer matches in the middle of path"
 output=$(notmuch search folder:baz)
 test_expect_equal "$output" ""
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Patch v5 2/6] dump: when given output file name, write atomically
@ 2014-04-02  2:07 ` Austin Clements
  2014-04-02 20:55   ` Austin Clements
  0 siblings, 1 reply; 17+ messages in thread
From: Austin Clements @ 2014-04-02  2:07 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

(Pardon the mobile review)

On Apr 1, 2014 9:16 PM, David Bremner <david@tethera.net> wrote:
>
> It is useful to able to tell whether a dump completed successfully in 
> situtions where we don't have access to the return code.

"Situations."  This commit message doesn't seem very related to atomicity?

> --- 
> notmuch-dump.c | 61 +++++++++++++++++++++++++++++++++++++++++++++------------- 
> 1 file changed, 48 insertions(+), 13 deletions(-) 
>
> diff --git a/notmuch-dump.c b/notmuch-dump.c 
> index 28342b7..05ed6b4 100644 
> --- a/notmuch-dump.c 
> +++ b/notmuch-dump.c 
> @@ -129,30 +129,65 @@ notmuch_database_dump (notmuch_database_t *notmuch, 
> { 
>      gzFile output; 
>      const char *mode = gzip_output ? "w9" : "wT"; 
> +    const char *name_for_error = output_file_name ? output_file_name : "stdout"; 
>
> -    int ret; 
> +    char *tempname = NULL; 
> +    int outfd = -1; 
> + 
> +    int ret = -1; 
> + 
> +    if (output_file_name) { 
> + tempname = talloc_asprintf (notmuch, "%s.XXXXXX", output_file_name); 
> + outfd = mkstemp (tempname); 
> +    } else { 
> + outfd = fileno (stdout); 
> +    } 
>
> -    if (output_file_name) 
> - output = gzopen (output_file_name, mode); 
> -    else 
> - output = gzdopen (fileno (stdout), mode); 
> +    if (outfd < 0) { 
> + fprintf (stderr, "Bad output file %s\n", name_for_error); 
> + goto DONE; 
> +    } 
> + 
> +    output = gzdopen (outfd, mode); 
>
>      if (output == NULL) { 
> fprintf (stderr, "Error opening %s for (gzip) writing: %s\n", 
> - output_file_name ? output_file_name : "stdout", strerror (errno)); 
> - return EXIT_FAILURE; 
> + name_for_error, strerror (errno)); 
> + goto DONE; 
>      } 
>
>      ret = database_dump_file (notmuch, output, query_str, output_format); 
> +    if (ret) goto DONE; 
>
> -    if (gzflush (output, Z_FINISH)) { 
> - fprintf (stderr, "Error flushing output: %s\n", 
> - gzerror (output, NULL)); 
> - return EXIT_FAILURE; 
> +    ret = gzflush (output, Z_FINISH); 
> +    if (ret) { 
> + fprintf (stderr, "Error flushing output: %s\n", gzerror (output, NULL)); 
> + goto DONE; 
>      } 
>
> -    if (output_file_name) 
> - gzclose_w (output); 
> +    ret = fdatasync (outfd);

This may fail if outfd isn't a regular file.  We probably should only do this if output_file_name.  Or we could ignore EINVAL (but that may mask bugs).

> +    if (ret) { 
> + perror ("fdatasync");

perror's fine for quick hacks, but it would be better to give a real error message here.

> + goto DONE; 
> +    } 
> + 
> +    if (output_file_name) { 
> + ret = gzclose_w (output); 
> + if (ret != Z_OK) { 
> +     ret = EXIT_FAILURE; 
> +     goto DONE; 
> + } 
> + 
> + ret = rename (tempname, output_file_name); 
> + if (ret) { 
> +     perror ("rename"); 
> +     goto DONE; 
> + } 
> + 
> +    } 
> + DONE: 
> +    if (ret != EXIT_SUCCESS && output_file_name) 
> + (void) unlink (tempname);

We need to gzclose on the error paths to free zlib's internal resources.  This is a problem if we handed stdout to gzdopen.  We shouldn't write any more to stdout anyway in that case, but maybe it would be better to dup stdout?

>
>      return ret; 
> } 
> -- 
> 1.9.0 
>
> _______________________________________________ 
> notmuch mailing list 
> notmuch@notmuchmail.org 
> http://notmuchmail.org/mailman/listinfo/notmuch 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch v5 4/6] restore: transparently support gzipped input
  2014-04-02  1:16 ` [Patch v5 4/6] restore: transparently support gzipped input David Bremner
@ 2014-04-02  2:49   ` Austin Clements
  0 siblings, 0 replies; 17+ messages in thread
From: Austin Clements @ 2014-04-02  2:49 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

Quoth David Bremner on Apr 01 at 10:16 pm:
> We rely completely on zlib to do the right thing in detecting gzipped
> input. Since our dump format is chosen to be 7 bit ascii, this should
> be fine.
> ---
>  doc/man1/notmuch-restore.rst |  8 ++++++++
>  notmuch-restore.c            | 37 +++++++++++++++++++++----------------
>  test/T240-dump-restore.sh    | 14 ++++++++++++++
>  3 files changed, 43 insertions(+), 16 deletions(-)
> 
> diff --git a/doc/man1/notmuch-restore.rst b/doc/man1/notmuch-restore.rst
> index d6cf19a..936b138 100644
> --- a/doc/man1/notmuch-restore.rst
> +++ b/doc/man1/notmuch-restore.rst
> @@ -50,6 +50,14 @@ Supported options for **restore** include
>              format, this heuristic, based the fact that batch-tag format
>              contains no parentheses, should be accurate.
>  
> +GZIPPED INPUT
> +=============
> +
> +\ **notmuch restore** will detect if the input is compressed in
> +**gzip(1)** format and automatically decompress it while reading. This
> +detection does not depend on file naming and in particular works for
> +standard input.
> +
>  SEE ALSO
>  ========
>  
> diff --git a/notmuch-restore.c b/notmuch-restore.c
> index c54d513..86bce20 100644
> --- a/notmuch-restore.c
> +++ b/notmuch-restore.c
> @@ -22,6 +22,7 @@
>  #include "hex-escape.h"
>  #include "tag-util.h"
>  #include "string-util.h"
> +#include "zlib-extra.h"
>  
>  static regex_t regex;
>  
> @@ -128,7 +129,7 @@ notmuch_restore_command (notmuch_config_t *config, int argc, char *argv[])
>      tag_op_list_t *tag_ops;
>  
>      char *input_file_name = NULL;
> -    FILE *input = stdin;
> +    gzFile input;
>      char *line = NULL;
>      void *line_ctx = NULL;
>      size_t line_size;
> @@ -163,13 +164,15 @@ notmuch_restore_command (notmuch_config_t *config, int argc, char *argv[])
>      if (! accumulate)
>  	flags |= TAG_FLAG_REMOVE_ALL;
>  
> -    if (input_file_name) {
> -	input = fopen (input_file_name, "r");
> -	if (input == NULL) {
> -	    fprintf (stderr, "Error opening %s for reading: %s\n",
> -		     input_file_name, strerror (errno));
> -	    return EXIT_FAILURE;
> -	}
> +    if (input_file_name)
> +	input = gzopen (input_file_name, "r");
> +    else
> +	input = gzdopen (fileno (stdin), "r");

As for patch 2, we also need to gzclose input on all paths out of this
function, which also means we probably need to dup stdin above.

> +
> +    if (input == NULL) {
> +	fprintf (stderr, "Error opening %s for (gzip) reading: %s\n",
> +		 input_file_name ? input_file_name : "stdin", strerror (errno));
> +	return EXIT_FAILURE;
>      }
>  
>      if (opt_index < argc) {
> @@ -184,12 +187,17 @@ notmuch_restore_command (notmuch_config_t *config, int argc, char *argv[])
>      }
>  
>      do {
> -	line_len = getline (&line, &line_size, input);
> +	util_status_t status;
> +
> +	status = gz_getline (line_ctx, &line, &line_size, &line_len, input);
>  
>  	/* empty input file not considered an error */
> -	if (line_len < 0)
> +	if (status == UTIL_EOF)
>  	    return EXIT_SUCCESS;
>  
> +	if (status)
> +	    return EXIT_FAILURE;
> +
>      } while ((line_len == 0) ||
>  	     (line[0] == '#') ||
>  	     /* the cast is safe because we checked about for line_len < 0 */
> @@ -254,7 +262,7 @@ notmuch_restore_command (notmuch_config_t *config, int argc, char *argv[])
>  	if (ret)
>  	    break;
>  
> -    }  while ((line_len = getline (&line, &line_size, input)) != -1);
> +    }  while (gz_getline (line_ctx, &line, &line_size, &line_len, input) == UTIL_SUCCESS);
>  
>      if (line_ctx != NULL)
>  	talloc_free (line_ctx);
> @@ -262,13 +270,10 @@ notmuch_restore_command (notmuch_config_t *config, int argc, char *argv[])
>      if (input_format == DUMP_FORMAT_SUP)
>  	regfree (&regex);
>  
> -    if (line)
> -	free (line);
> -
>      notmuch_database_destroy (notmuch);
>  
> -    if (input != stdin)
> -	fclose (input);
> +    if (input_file_name != NULL)
> +	gzclose_r (input);
>  
>      return ret ? EXIT_FAILURE : EXIT_SUCCESS;
>  }
> diff --git a/test/T240-dump-restore.sh b/test/T240-dump-restore.sh
> index d79aca8..50d4d48 100755
> --- a/test/T240-dump-restore.sh
> +++ b/test/T240-dump-restore.sh
> @@ -80,6 +80,20 @@ notmuch dump --gzip --output=dump-gzip-outfile.gz
>  gunzip dump-gzip-outfile.gz
>  test_expect_equal_file dump.expected dump-gzip-outfile
>  
> +test_begin_subtest "restoring gzipped stdin"
> +notmuch dump --gzip --output=backup.gz
> +notmuch tag +new_tag '*'
> +notmuch restore < backup.gz
> +notmuch dump --output=dump.actual
> +test_expect_equal_file dump.expected dump.actual
> +
> +test_begin_subtest "restoring gzipped file"
> +notmuch dump --gzip --output=backup.gz
> +notmuch tag +new_tag '*'
> +notmuch restore --input=backup.gz
> +notmuch dump --output=dump.actual
> +test_expect_equal_file dump.expected dump.actual
> +
>  # Note, we assume all messages from cworth have a message-id
>  # containing cworth.org
>  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch v5 3/6] util: add gz_readline
  2014-04-02  1:16 ` [Patch v5 3/6] util: add gz_readline David Bremner
@ 2014-04-02  3:26   ` Austin Clements
  2014-04-02 16:43     ` Tomi Ollila
  0 siblings, 1 reply; 17+ messages in thread
From: Austin Clements @ 2014-04-02  3:26 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

Quoth David Bremner on Apr 01 at 10:16 pm:
> The idea is to provide a more or less drop in replacement for readline
> to read from zlib/gzip streams.  Take the opportunity to replace
> malloc with talloc.
> ---
>  util/Makefile.local |  2 +-
>  util/util.h         | 12 +++++++++
>  util/zlib-extra.c   | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  util/zlib-extra.h   | 11 ++++++++
>  4 files changed, 100 insertions(+), 1 deletion(-)
>  create mode 100644 util/util.h
>  create mode 100644 util/zlib-extra.c
>  create mode 100644 util/zlib-extra.h
> 
> diff --git a/util/Makefile.local b/util/Makefile.local
> index 29c0ce6..e2a5b65 100644
> --- a/util/Makefile.local
> +++ b/util/Makefile.local
> @@ -4,7 +4,7 @@ dir := util
>  extra_cflags += -I$(srcdir)/$(dir)
>  
>  libutil_c_srcs := $(dir)/xutil.c $(dir)/error_util.c $(dir)/hex-escape.c \
> -		  $(dir)/string-util.c $(dir)/talloc-extra.c
> +		  $(dir)/string-util.c $(dir)/talloc-extra.c $(dir)/zlib-extra.c
>  
>  libutil_modules := $(libutil_c_srcs:.c=.o)
>  
> diff --git a/util/util.h b/util/util.h
> new file mode 100644
> index 0000000..8663cfc
> --- /dev/null
> +++ b/util/util.h
> @@ -0,0 +1,12 @@
> +#ifndef _UTIL_H
> +#define _UTIL_H
> +
> +typedef enum util_status {
> +    UTIL_SUCCESS = 0,
> +    UTIL_ERROR = 1,
> +    UTIL_OUT_OF_MEMORY,
> +    UTIL_EOF,
> +    UTIL_FILE,
> +} util_status_t;
> +
> +#endif
> diff --git a/util/zlib-extra.c b/util/zlib-extra.c
> new file mode 100644
> index 0000000..cb1eba0
> --- /dev/null
> +++ b/util/zlib-extra.c
> @@ -0,0 +1,76 @@
> +/* zlib-extra.c -  Extra or enhanced routines for compressed I/O.
> + *
> + * Copyright (c) 2014 David Bremner
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 3 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see http://www.gnu.org/licenses/ .
> + *
> + * Author: David Bremner <david@tethera.net>
> + */
> +
> +#include "zlib-extra.h"
> +#include <talloc.h>
> +#include <stdio.h>
> +#include <string.h>
> +
> +/* mimic POSIX/glibc getline, but on a zlib gzFile stream, and using talloc */
> +util_status_t
> +gz_getline (void *talloc_ctx, char **bufptr, size_t *bufsiz, ssize_t *bytes_read,

Talloc chunks know their size, so rather than taking bufsize, use
talloc_get_size (or talloc_array_length if you switch to talloc array
functions below).

> +	    gzFile stream)
> +{
> +    size_t len = *bufsiz;
> +    char *buf = *bufptr;
> +    size_t offset = 0;
> +
> +    if (len == 0 || buf == NULL) {
> +	/* same as getdelim from gnulib */
> +	len = 120;

This is presumably because glibc's malloc has an 8 byte header.  Fun
fact: talloc has a 104 byte header (on 64-bit and including the malloc
header).

> +	buf = talloc_size (talloc_ctx, len);
> +	if (buf == NULL)
> +	    return UTIL_OUT_OF_MEMORY;
> +    }
> +
> +    while (1) {
> +	if (! gzgets (stream, buf + offset, len - offset)) {
> +	    int zlib_status = 0;
> +	    (void) gzerror (stream, &zlib_status);
> +	    switch (zlib_status) {
> +	    case Z_OK:
> +		/* follow getline behaviour */
> +		*bytes_read = -1;

Is this really what getline does when the last line of a file isn't
\n-terminated?

> +		return UTIL_EOF;
> +		break;
> +	    case Z_ERRNO:
> +		return UTIL_FILE;
> +		break;
> +	    default:
> +		return UTIL_ERROR;
> +	    }
> +	}
> +
> +	offset += strlen (buf + offset);
> +
> +	if ( buf[offset - 1] == '\n' )

Too many spaces!

> +	    break;
> +
> +	len *= 2;
> +	buf = talloc_realloc (talloc_ctx, buf, char, len);

Or talloc_realloc_size, to match the initial talloc_size.
Alternatively, the initial talloc_size could be a talloc_array.

> +	if (buf == NULL)
> +	    return UTIL_OUT_OF_MEMORY;
> +    }
> +
> +    *bufptr = buf;
> +    *bufsiz = len;
> +    *bytes_read = offset;
> +    return UTIL_SUCCESS;
> +}
> diff --git a/util/zlib-extra.h b/util/zlib-extra.h
> new file mode 100644
> index 0000000..ed46ac1
> --- /dev/null
> +++ b/util/zlib-extra.h
> @@ -0,0 +1,11 @@
> +#ifndef _ZLIB_EXTRA_H
> +#define _ZLIB_EXTRA_H
> +
> +#include <zlib.h>
> +#include "util.h"

I'd put "util.h" first so we're more likely to catch missing header
dependencies (obviously util.h doesn't have any right now, but in the
future).

Also, I'd put a blank line after the #includes.

> +/* Like getline, but read from a gzFile. Allocation is with talloc */
> +util_status_t
> +gz_getline (void *ctx, char **lineptr, size_t *line_size, ssize_t *bytes_read,
> +	    gzFile stream);
> +
> +#endif

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch v5 5/6] notmuch-new: backup tags before database upgrade
  2014-04-02  1:16 ` [Patch v5 5/6] notmuch-new: backup tags before database upgrade David Bremner
@ 2014-04-02  3:35   ` Austin Clements
  0 siblings, 0 replies; 17+ messages in thread
From: Austin Clements @ 2014-04-02  3:35 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

Quoth David Bremner on Apr 01 at 10:16 pm:
> All we do here is calculate the backup filename, and call the existing
> dump routine.
> 
> Also take the opportity to add a message about being safe to
> interrupt.
> ---
>  notmuch-new.c        | 29 ++++++++++++++++++++++++++++-
>  test/T530-upgrade.sh |  4 +++-
>  2 files changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/notmuch-new.c b/notmuch-new.c
> index 82acf69..e0431c6 100644
> --- a/notmuch-new.c
> +++ b/notmuch-new.c
> @@ -989,8 +989,35 @@ notmuch_new_command (notmuch_config_t *config, int argc, char *argv[])
>  	    return EXIT_FAILURE;
>  
>  	if (notmuch_database_needs_upgrade (notmuch)) {
> -	    if (add_files_state.verbosity >= VERBOSITY_NORMAL)
> +	    time_t now = time (NULL);
> +	    struct tm *gm_time = gmtime (&now);
> +
> +	    /* since dump files are written atomically, the amount of
> +	     * harm from overwriting one within a second seems
> +	     * relatively small. */
> +
> +	    const char *backup_name =
> +		talloc_asprintf (notmuch, "%s/backup-%04d-%02d-%02d-%02d:%02d:%02d.gz",

Maybe "dump" instead of "backup" so it's more obvious that it's a
notmuch dump?

This would be ISO 8601 compatible if you put a 'T' instead of a '-'
between the date and the time.

Colons tend to get file names into trouble (some file systems don't
support them, tools like scp think they demarcate host names, etc).
How about compact ISO 8601: dump-%04d%02d%02dT%02d%02d%02d.gz?

> +				 dot_notmuch_path,
> +				 gm_time->tm_year + 1900,
> +				 gm_time->tm_mon + 1,
> +				 gm_time->tm_mday,
> +				 gm_time->tm_hour,
> +				 gm_time->tm_min,
> +				 gm_time->tm_sec);
> +
> +	    if (add_files_state.verbosity >= VERBOSITY_NORMAL) {
>  		printf ("Welcome to a new version of notmuch! Your database will now be upgraded.\n");
> +		printf ("This process is safe to interrupt.\n");
> +		printf ("Backing up tags to %s\n", backup_name);

Add "..." to indicate a running process?  Even better would be a
progress report, but we can add that to TODO.

> +	    }
> +
> +	    if (notmuch_database_dump (notmuch, backup_name, "",
> +				       DUMP_FORMAT_BATCH_TAG, TRUE)) {
> +		fprintf (stderr, "Backup failed. Aborting upgrade");

Add a "." at the end for consistency.

> +		return EXIT_FAILURE;
> +	    }
> +
>  	    gettimeofday (&add_files_state.tv_start, NULL);
>  	    notmuch_database_upgrade (notmuch,
>  				      add_files_state.verbosity >= VERBOSITY_NORMAL ? upgrade_print_progress : NULL,
> diff --git a/test/T530-upgrade.sh b/test/T530-upgrade.sh
> index 67bbf31..d46e3d1 100755
> --- a/test/T530-upgrade.sh
> +++ b/test/T530-upgrade.sh
> @@ -26,9 +26,11 @@ output=$(notmuch search path:foo)
>  test_expect_equal "$output" ""
>  
>  test_begin_subtest "database upgrade from format version 1"
> -output=$(notmuch new)
> +output=$(notmuch new | sed -e 's/^Backing up tags to .*$/Backing up tags to FILENAME/')
>  test_expect_equal "$output" "\
>  Welcome to a new version of notmuch! Your database will now be upgraded.
> +This process is safe to interrupt.
> +Backing up tags to FILENAME
>  Your notmuch database has now been upgraded to database format version 2.
>  No new mail."
>  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch v5 3/6] util: add gz_readline
  2014-04-02  3:26   ` Austin Clements
@ 2014-04-02 16:43     ` Tomi Ollila
  2014-04-02 20:43       ` Austin Clements
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Tomi Ollila @ 2014-04-02 16:43 UTC (permalink / raw)
  To: Austin Clements, David Bremner; +Cc: notmuch

On Wed, Apr 02 2014, Austin Clements <amdragon@MIT.EDU> wrote:

> Quoth David Bremner on Apr 01 at 10:16 pm:
>> The idea is to provide a more or less drop in replacement for readline
>> to read from zlib/gzip streams.  Take the opportunity to replace
>> malloc with talloc.
>> ---
>>  util/Makefile.local |  2 +-
>>  util/util.h         | 12 +++++++++
>>  util/zlib-extra.c   | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  util/zlib-extra.h   | 11 ++++++++
>>  4 files changed, 100 insertions(+), 1 deletion(-)
>>  create mode 100644 util/util.h
>>  create mode 100644 util/zlib-extra.c
>>  create mode 100644 util/zlib-extra.h
>> 
>> diff --git a/util/Makefile.local b/util/Makefile.local
>> index 29c0ce6..e2a5b65 100644
>> --- a/util/Makefile.local
>> +++ b/util/Makefile.local
>> @@ -4,7 +4,7 @@ dir := util
>>  extra_cflags += -I$(srcdir)/$(dir)
>>  
>>  libutil_c_srcs := $(dir)/xutil.c $(dir)/error_util.c $(dir)/hex-escape.c \
>> -		  $(dir)/string-util.c $(dir)/talloc-extra.c
>> +		  $(dir)/string-util.c $(dir)/talloc-extra.c $(dir)/zlib-extra.c
>>  
>>  libutil_modules := $(libutil_c_srcs:.c=.o)
>>  
>> diff --git a/util/util.h b/util/util.h
>> new file mode 100644
>> index 0000000..8663cfc
>> --- /dev/null
>> +++ b/util/util.h
>> @@ -0,0 +1,12 @@
>> +#ifndef _UTIL_H
>> +#define _UTIL_H
>> +
>> +typedef enum util_status {
>> +    UTIL_SUCCESS = 0,
>> +    UTIL_ERROR = 1,
>> +    UTIL_OUT_OF_MEMORY,
>> +    UTIL_EOF,
>> +    UTIL_FILE,
>> +} util_status_t;
>> +
>> +#endif
>> diff --git a/util/zlib-extra.c b/util/zlib-extra.c
>> new file mode 100644
>> index 0000000..cb1eba0
>> --- /dev/null
>> +++ b/util/zlib-extra.c
>> @@ -0,0 +1,76 @@
>> +/* zlib-extra.c -  Extra or enhanced routines for compressed I/O.
>> + *
>> + * Copyright (c) 2014 David Bremner
>> + *
>> + * This program is free software: you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation, either version 3 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see http://www.gnu.org/licenses/ .
>> + *
>> + * Author: David Bremner <david@tethera.net>
>> + */
>> +
>> +#include "zlib-extra.h"
>> +#include <talloc.h>
>> +#include <stdio.h>
>> +#include <string.h>
>> +
>> +/* mimic POSIX/glibc getline, but on a zlib gzFile stream, and using talloc */
>> +util_status_t
>> +gz_getline (void *talloc_ctx, char **bufptr, size_t *bufsiz, ssize_t *bytes_read,
>
> Talloc chunks know their size, so rather than taking bufsize, use
> talloc_get_size (or talloc_array_length if you switch to talloc array
> functions below).

Now yoy David have a chance to drop the bufsiz argument altogether, as the
info is available in *bufptr:s talloc context...

>
>> +	    gzFile stream)
>> +{
>> +    size_t len = *bufsiz;
>> +    char *buf = *bufptr;
>> +    size_t offset = 0;
>> +
>> +    if (len == 0 || buf == NULL) {
>> +	/* same as getdelim from gnulib */
>> +	len = 120;
>
> This is presumably because glibc's malloc has an 8 byte header.  Fun
> fact: talloc has a 104 byte header (on 64-bit and including the malloc
> header).

hmm, what should we choose here? 152 ? Some bikeshedding on IRC ?

>> +	buf = talloc_size (talloc_ctx, len);
>> +	if (buf == NULL)
>> +	    return UTIL_OUT_OF_MEMORY;
>> +    }
>> +
>> +    while (1) {
>> +	if (! gzgets (stream, buf + offset, len - offset)) {
>> +	    int zlib_status = 0;
>> +	    (void) gzerror (stream, &zlib_status);
>> +	    switch (zlib_status) {
>> +	    case Z_OK:
>> +		/* follow getline behaviour */
>> +		*bytes_read = -1;
>
> Is this really what getline does when the last line of a file isn't
> \n-terminated?

Maybe the previous call returned non-\n -terminated string and
for this call there was 0 bytes left to return ???

Tomi

>> +		return UTIL_EOF;
>> +		break;
>> +	    case Z_ERRNO:
>> +		return UTIL_FILE;
>> +		break;
>> +	    default:
>> +		return UTIL_ERROR;
>> +	    }
>> +	}
>> +
>> +	offset += strlen (buf + offset);
>> +
>> +	if ( buf[offset - 1] == '\n' )
>
> Too many spaces!
>
>> +	    break;
>> +
>> +	len *= 2;
>> +	buf = talloc_realloc (talloc_ctx, buf, char, len);
>
> Or talloc_realloc_size, to match the initial talloc_size.
> Alternatively, the initial talloc_size could be a talloc_array.
>
>> +	if (buf == NULL)
>> +	    return UTIL_OUT_OF_MEMORY;
>> +    }
>> +
>> +    *bufptr = buf;
>> +    *bufsiz = len;
>> +    *bytes_read = offset;
>> +    return UTIL_SUCCESS;
>> +}
>> diff --git a/util/zlib-extra.h b/util/zlib-extra.h
>> new file mode 100644
>> index 0000000..ed46ac1
>> --- /dev/null
>> +++ b/util/zlib-extra.h
>> @@ -0,0 +1,11 @@
>> +#ifndef _ZLIB_EXTRA_H
>> +#define _ZLIB_EXTRA_H
>> +
>> +#include <zlib.h>
>> +#include "util.h"
>
> I'd put "util.h" first so we're more likely to catch missing header
> dependencies (obviously util.h doesn't have any right now, but in the
> future).
>
> Also, I'd put a blank line after the #includes.
>
>> +/* Like getline, but read from a gzFile. Allocation is with talloc */
>> +util_status_t
>> +gz_getline (void *ctx, char **lineptr, size_t *line_size, ssize_t *bytes_read,
>> +	    gzFile stream);
>> +
>> +#endif

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch v5 1/6] dump: support gzipped output
  2014-04-02  1:16 ` [Patch v5 1/6] dump: support gzipped output David Bremner
@ 2014-04-02 16:50   ` Tomi Ollila
  0 siblings, 0 replies; 17+ messages in thread
From: Tomi Ollila @ 2014-04-02 16:50 UTC (permalink / raw)
  To: David Bremner, notmuch

On Wed, Apr 02 2014, David Bremner <david@tethera.net> wrote:

> The main goal is to support gzipped output for future internal
> calls (e.g. from notmuch-new) to notmuch_database_dump.
>
> The additional dependency is not very heavy since xapian already pulls
> in zlib.
> ---
>  INSTALL                   | 17 +++++++++++----
>  Makefile.local            |  2 +-
>  configure                 | 28 +++++++++++++++++++++---
>  doc/man1/notmuch-dump.rst |  3 +++
>  notmuch-client.h          |  4 +++-
>  notmuch-dump.c            | 54 ++++++++++++++++++++++++++++++-----------------
>  test/T240-dump-restore.sh | 12 +++++++++++
>  7 files changed, 92 insertions(+), 28 deletions(-)
>
> diff --git a/INSTALL b/INSTALL
> index 690b0ef..df318fa 100644
> --- a/INSTALL
> +++ b/INSTALL
> @@ -20,8 +20,8 @@ configure stage.
>  
>  Dependencies
>  ------------
> -Notmuch depends on three libraries: Xapian, GMime 2.4 or 2.6, and
> -Talloc which are each described below:
> +Notmuch depends on four libraries: Xapian, GMime 2.4 or 2.6,
> +Talloc, and zlib which are each described below:
>  
>  	Xapian
>  	------
> @@ -60,6 +60,15 @@ Talloc which are each described below:
>  
>  	Talloc is available from http://talloc.samba.org/
>  
> +	zlib
> +	----
> +
> +	zlib is an extremely popular compression library. It is used
> +	by Xapian, so if you installed that you will already have
> +	zlib. You may need to install the zlib headers separately.
> +
> +	zlib is available from http://zlib.net
> +
>  Building Documentation
>  ----------------------
>  
> @@ -79,11 +88,11 @@ dependencies with a simple simple command line. For example:
>  
>    For Debian and similar:
>  
> -        sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev python-sphinx
> +        sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev zlib1g-dev python-sphinx
>  
>    For Fedora and similar:
>  
> -	sudo yum install xapian-core-devel gmime-devel libtalloc-devel python-sphinx
> +	sudo yum install xapian-core-devel gmime-devel libtalloc-devel zlib-devel python-sphinx
>  
>  On other systems, a similar command can be used, but the details of
>  the package names may be different.
> diff --git a/Makefile.local b/Makefile.local
> index cb7b106..e5a20a7 100644
> --- a/Makefile.local
> +++ b/Makefile.local
> @@ -41,7 +41,7 @@ PV_FILE=bindings/python/notmuch/version.py
>  # Smash together user's values with our extra values
>  FINAL_CFLAGS = -DNOTMUCH_VERSION=$(VERSION) $(CPPFLAGS) $(CFLAGS) $(WARN_CFLAGS) $(extra_cflags) $(CONFIGURE_CFLAGS)
>  FINAL_CXXFLAGS = $(CPPFLAGS) $(CXXFLAGS) $(WARN_CXXFLAGS) $(extra_cflags) $(extra_cxxflags) $(CONFIGURE_CXXFLAGS)
> -FINAL_NOTMUCH_LDFLAGS = $(LDFLAGS) -Lutil -lutil -Llib -lnotmuch $(AS_NEEDED_LDFLAGS) $(GMIME_LDFLAGS) $(TALLOC_LDFLAGS)
> +FINAL_NOTMUCH_LDFLAGS = $(LDFLAGS) -Lutil -lutil -Llib -lnotmuch $(AS_NEEDED_LDFLAGS) $(GMIME_LDFLAGS) $(TALLOC_LDFLAGS) $(ZLIB_LDFLAGS)
>  FINAL_NOTMUCH_LINKER = CC
>  ifneq ($(LINKER_RESOLVES_LIBRARY_DEPENDENCIES),1)
>  FINAL_NOTMUCH_LDFLAGS += $(CONFIGURE_LDFLAGS)
> diff --git a/configure b/configure
> index 1d430b9..d685ab3 100755
> --- a/configure
> +++ b/configure
> @@ -340,6 +340,18 @@ else
>      errors=$((errors + 1))
>  fi
>  
> +printf "Checking for zlib development files... "
> +have_zlib=0
> +if pkg-config --exists zlib; then
> +    printf "Yes.\n"
> +    have_zlib=1
> +    zlib_cflags=$(pkg-config --cflags zlib)
> +    zlib_ldflags=$(pkg-config --libs zlib)
> +else
> +    printf "No.\n"
> +    errors=$((errors + 1))
> +fi

Like Austin mentioned on IRC, zlib 1.2.5.2 (Dec 2011) is the first version
supporting 'wT', the test above should use:

pkg-config --atleast-version=1.2.5.2 zlib 

And the related documentation should mention this version dependency...

(maybe we can go with this and somebody(tm) who wants support to older
zlibs may provide tolerable patches for that to be done).

For example Scientific Linux 6.2 (RHEL 6.2-compatible) has zlib 1.2.3 (released
2005). Maybe there are backports and maybe 6.3 has newer... for me it is
just easier to compile newer zlib :D)


Tomi

> +
>  printf "Checking for talloc development files... "
>  if pkg-config --exists talloc; then
>      printf "Yes.\n"
> @@ -496,6 +508,11 @@ EOF
>  	echo "	Xapian library (including development files such as headers)"
>  	echo "	http://xapian.org/"
>      fi
> +    if [ $have_zlib -eq 0 ]; then
> +	echo "	zlib library (including development files such as headers)"
> +	echo "	http://zlib.net/"
> +	echo
> +    fi
>      if [ $have_gmime -eq 0 ]; then
>  	echo "	Either GMime 2.4 library" $GMIME_24_VERSION_CTR "or GMime 2.6 library" $GMIME_26_VERSION_CTR
>  	echo "	(including development files such as headers)"
> @@ -519,11 +536,11 @@ case a simple command will install everything you need. For example:
>  
>  On Debian and similar systems:
>  
> -	sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev
> +	sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev zlib1g-dev
>  
>  Or on Fedora and similar systems:
>  
> -	sudo yum install xapian-core-devel gmime-devel libtalloc-devel
> +	sudo yum install xapian-core-devel gmime-devel libtalloc-devel zlib-devel
>  
>  On other systems, similar commands can be used, but the details of the
>  package names may be different.
> @@ -844,6 +861,10 @@ XAPIAN_LDFLAGS = ${xapian_ldflags}
>  GMIME_CFLAGS = ${gmime_cflags}
>  GMIME_LDFLAGS = ${gmime_ldflags}
>  
> +# Flags needed to compile and link against zlib
> +ZLIB_CFLAGS = ${zlib_cflags}
> +ZLIB_LDFLAGS = ${zlib_ldflags}
> +
>  # Flags needed to compile and link against talloc
>  TALLOC_CFLAGS = ${talloc_cflags}
>  TALLOC_LDFLAGS = ${talloc_ldflags}
> @@ -882,6 +903,7 @@ CONFIGURE_CFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS)      \\
>  		   -DUTIL_BYTE_ORDER=\$(UTIL_BYTE_ORDER)
>  
>  CONFIGURE_CXXFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS)    \\
> +		     \$(ZLIB_CFLAGS)					 \\
>  		     \$(TALLOC_CFLAGS) -DHAVE_VALGRIND=\$(HAVE_VALGRIND) \\
>  		     \$(VALGRIND_CFLAGS) \$(XAPIAN_CXXFLAGS)             \\
>  		     -DHAVE_STRCASESTR=\$(HAVE_STRCASESTR)               \\
> @@ -892,5 +914,5 @@ CONFIGURE_CXXFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS)    \\
>  		     -DHAVE_XAPIAN_COMPACT=\$(HAVE_XAPIAN_COMPACT)       \\
>  		     -DUTIL_BYTE_ORDER=\$(UTIL_BYTE_ORDER)
>  
> -CONFIGURE_LDFLAGS =  \$(GMIME_LDFLAGS) \$(TALLOC_LDFLAGS) \$(XAPIAN_LDFLAGS)
> +CONFIGURE_LDFLAGS =  \$(GMIME_LDFLAGS) \$(TALLOC_LDFLAGS) \$(ZLIB_LDFLAGS) \$(XAPIAN_LDFLAGS)
>  EOF
> diff --git a/doc/man1/notmuch-dump.rst b/doc/man1/notmuch-dump.rst
> index 17d1da5..d94cb4f 100644
> --- a/doc/man1/notmuch-dump.rst
> +++ b/doc/man1/notmuch-dump.rst
> @@ -19,6 +19,9 @@ recreated from the messages themselves. The output of notmuch dump is
>  therefore the only critical thing to backup (and much more friendly to
>  incremental backup than the native database files.)
>  
> +``--gzip``
> +    Compress the output in a format compatible with **gzip(1)**.
> +
>  ``--format=(sup|batch-tag)``
>      Notmuch restore supports two plain text dump formats, both with one
>      message-id per line, followed by a list of tags.
> diff --git a/notmuch-client.h b/notmuch-client.h
> index d110648..e1efbe0 100644
> --- a/notmuch-client.h
> +++ b/notmuch-client.h
> @@ -450,7 +450,9 @@ typedef enum dump_formats {
>  int
>  notmuch_database_dump (notmuch_database_t *notmuch,
>  		       const char *output_file_name,
> -		       const char *query_str, dump_format_t output_format);
> +		       const char *query_str,
> +		       dump_format_t output_format,
> +		       notmuch_bool_t gzip_output);
>  
>  #include "command-line-arguments.h"
>  #endif
> diff --git a/notmuch-dump.c b/notmuch-dump.c
> index 21702d7..28342b7 100644
> --- a/notmuch-dump.c
> +++ b/notmuch-dump.c
> @@ -21,9 +21,11 @@
>  #include "notmuch-client.h"
>  #include "hex-escape.h"
>  #include "string-util.h"
> +#include <zlib.h>
> +
>  
>  static int
> -database_dump_file (notmuch_database_t *notmuch, FILE *output,
> +database_dump_file (notmuch_database_t *notmuch, gzFile output,
>  		    const char *query_str, int output_format)
>  {
>      notmuch_query_t *query;
> @@ -69,7 +71,7 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output,
>  	}
>  
>  	if (output_format == DUMP_FORMAT_SUP) {
> -	    fprintf (output, "%s (", message_id);
> +	    gzprintf (output, "%s (", message_id);
>  	}
>  
>  	for (tags = notmuch_message_get_tags (message);
> @@ -78,12 +80,12 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output,
>  	    const char *tag_str = notmuch_tags_get (tags);
>  
>  	    if (! first)
> -		fputs (" ", output);
> +		gzputs (output, " ");
>  
>  	    first = 0;
>  
>  	    if (output_format == DUMP_FORMAT_SUP) {
> -		fputs (tag_str, output);
> +		gzputs (output, tag_str);
>  	    } else {
>  		if (hex_encode (notmuch, tag_str,
>  				&buffer, &buffer_size) != HEX_SUCCESS) {
> @@ -91,12 +93,12 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output,
>  			     tag_str);
>  		    return EXIT_FAILURE;
>  		}
> -		fprintf (output, "+%s", buffer);
> +		gzprintf (output, "+%s", buffer);
>  	    }
>  	}
>  
>  	if (output_format == DUMP_FORMAT_SUP) {
> -	    fputs (")\n", output);
> +	    gzputs (output, ")\n");
>  	} else {
>  	    if (make_boolean_term (notmuch, "id", message_id,
>  				   &buffer, &buffer_size)) {
> @@ -104,7 +106,7 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output,
>  			     message_id, strerror (errno));
>  		    return EXIT_FAILURE;
>  	    }
> -	    fprintf (output, " -- %s\n", buffer);
> +	    gzprintf (output, " -- %s\n", buffer);
>  	}
>  
>  	notmuch_message_destroy (message);
> @@ -121,24 +123,36 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output,
>  int
>  notmuch_database_dump (notmuch_database_t *notmuch,
>  		       const char *output_file_name,
> -		       const char *query_str, dump_format_t output_format)
> +		       const char *query_str,
> +		       dump_format_t output_format,
> +		       notmuch_bool_t gzip_output)
>  {
> -    FILE *output = stdout;
> +    gzFile output;
> +    const char *mode = gzip_output ? "w9" : "wT";
> +
>      int ret;
>  
> -    if (output_file_name) {
> -	output = fopen (output_file_name, "w");
> -	if (output == NULL) {
> -	    fprintf (stderr, "Error opening %s for writing: %s\n",
> -		     output_file_name, strerror (errno));
> -	    return EXIT_FAILURE;
> -	}
> +    if (output_file_name)
> +	output = gzopen (output_file_name, mode);
> +    else
> +	output = gzdopen (fileno (stdout), mode);
> +
> +    if (output == NULL) {
> +	fprintf (stderr, "Error opening %s for (gzip) writing: %s\n",
> +		 output_file_name ? output_file_name : "stdout", strerror (errno));
> +	return EXIT_FAILURE;
>      }
>  
>      ret = database_dump_file (notmuch, output, query_str, output_format);
>  
> -    if (output != stdout)
> -	fclose (output);
> +    if (gzflush (output, Z_FINISH)) {
> +	fprintf (stderr, "Error flushing output: %s\n",
> +		 gzerror (output, NULL));
> +	return EXIT_FAILURE;
> +    }
> +
> +    if (output_file_name)
> +	gzclose_w (output);
>  
>      return ret;
>  }
> @@ -158,6 +172,7 @@ notmuch_dump_command (notmuch_config_t *config, int argc, char *argv[])
>      int opt_index;
>  
>      int output_format = DUMP_FORMAT_BATCH_TAG;
> +    notmuch_bool_t gzip_output = 0;
>  
>      notmuch_opt_desc_t options[] = {
>  	{ NOTMUCH_OPT_KEYWORD, &output_format, "format", 'f',
> @@ -165,6 +180,7 @@ notmuch_dump_command (notmuch_config_t *config, int argc, char *argv[])
>  				  { "batch-tag", DUMP_FORMAT_BATCH_TAG },
>  				  { 0, 0 } } },
>  	{ NOTMUCH_OPT_STRING, &output_file_name, "output", 'o', 0  },
> +	{ NOTMUCH_OPT_BOOLEAN, &gzip_output, "gzip", 'z', 0 },
>  	{ 0, 0, 0, 0, 0 }
>      };
>  
> @@ -181,7 +197,7 @@ notmuch_dump_command (notmuch_config_t *config, int argc, char *argv[])
>      }
>  
>      ret = notmuch_database_dump (notmuch, output_file_name, query_str,
> -				 output_format);
> +				 output_format, gzip_output);
>  
>      notmuch_database_destroy (notmuch);
>  
> diff --git a/test/T240-dump-restore.sh b/test/T240-dump-restore.sh
> index 0004438..d79aca8 100755
> --- a/test/T240-dump-restore.sh
> +++ b/test/T240-dump-restore.sh
> @@ -68,6 +68,18 @@ test_begin_subtest "dump --output=outfile --"
>  notmuch dump --output=dump-1-arg-dash.actual --
>  test_expect_equal_file dump.expected dump-1-arg-dash.actual
>  
> +# gzipped output
> +
> +test_begin_subtest "dump --gzip"
> +notmuch dump --gzip > dump-gzip.gz
> +gunzip dump-gzip.gz
> +test_expect_equal_file dump.expected dump-gzip
> +
> +test_begin_subtest "dump --gzip --output=outfile"
> +notmuch dump --gzip --output=dump-gzip-outfile.gz
> +gunzip dump-gzip-outfile.gz
> +test_expect_equal_file dump.expected dump-gzip-outfile
> +
>  # Note, we assume all messages from cworth have a message-id
>  # containing cworth.org
>  
> -- 
> 1.9.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch v5 3/6] util: add gz_readline
  2014-04-02 16:43     ` Tomi Ollila
@ 2014-04-02 20:43       ` Austin Clements
  2014-04-03  6:03       ` Kim Minh Kaplan
  2014-04-03  6:17       ` Kim Minh Kaplan
  2 siblings, 0 replies; 17+ messages in thread
From: Austin Clements @ 2014-04-02 20:43 UTC (permalink / raw)
  To: Tomi Ollila; +Cc: notmuch

Quoth Tomi Ollila on Apr 02 at  7:43 pm:
> On Wed, Apr 02 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> 
> > Quoth David Bremner on Apr 01 at 10:16 pm:
> >> The idea is to provide a more or less drop in replacement for readline
> >> to read from zlib/gzip streams.  Take the opportunity to replace
> >> malloc with talloc.
> >> ---
> >>  util/Makefile.local |  2 +-
> >>  util/util.h         | 12 +++++++++
> >>  util/zlib-extra.c   | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  util/zlib-extra.h   | 11 ++++++++
> >>  4 files changed, 100 insertions(+), 1 deletion(-)
> >>  create mode 100644 util/util.h
> >>  create mode 100644 util/zlib-extra.c
> >>  create mode 100644 util/zlib-extra.h
> >> 
> >> diff --git a/util/Makefile.local b/util/Makefile.local
> >> index 29c0ce6..e2a5b65 100644
> >> --- a/util/Makefile.local
> >> +++ b/util/Makefile.local
> >> @@ -4,7 +4,7 @@ dir := util
> >>  extra_cflags += -I$(srcdir)/$(dir)
> >>  
> >>  libutil_c_srcs := $(dir)/xutil.c $(dir)/error_util.c $(dir)/hex-escape.c \
> >> -		  $(dir)/string-util.c $(dir)/talloc-extra.c
> >> +		  $(dir)/string-util.c $(dir)/talloc-extra.c $(dir)/zlib-extra.c
> >>  
> >>  libutil_modules := $(libutil_c_srcs:.c=.o)
> >>  
> >> diff --git a/util/util.h b/util/util.h
> >> new file mode 100644
> >> index 0000000..8663cfc
> >> --- /dev/null
> >> +++ b/util/util.h
> >> @@ -0,0 +1,12 @@
> >> +#ifndef _UTIL_H
> >> +#define _UTIL_H
> >> +
> >> +typedef enum util_status {
> >> +    UTIL_SUCCESS = 0,
> >> +    UTIL_ERROR = 1,
> >> +    UTIL_OUT_OF_MEMORY,
> >> +    UTIL_EOF,
> >> +    UTIL_FILE,
> >> +} util_status_t;
> >> +
> >> +#endif
> >> diff --git a/util/zlib-extra.c b/util/zlib-extra.c
> >> new file mode 100644
> >> index 0000000..cb1eba0
> >> --- /dev/null
> >> +++ b/util/zlib-extra.c
> >> @@ -0,0 +1,76 @@
> >> +/* zlib-extra.c -  Extra or enhanced routines for compressed I/O.
> >> + *
> >> + * Copyright (c) 2014 David Bremner
> >> + *
> >> + * This program is free software: you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation, either version 3 of the License, or
> >> + * (at your option) any later version.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see http://www.gnu.org/licenses/ .
> >> + *
> >> + * Author: David Bremner <david@tethera.net>
> >> + */
> >> +
> >> +#include "zlib-extra.h"
> >> +#include <talloc.h>
> >> +#include <stdio.h>
> >> +#include <string.h>
> >> +
> >> +/* mimic POSIX/glibc getline, but on a zlib gzFile stream, and using talloc */
> >> +util_status_t
> >> +gz_getline (void *talloc_ctx, char **bufptr, size_t *bufsiz, ssize_t *bytes_read,
> >
> > Talloc chunks know their size, so rather than taking bufsize, use
> > talloc_get_size (or talloc_array_length if you switch to talloc array
> > functions below).
> 
> Now yoy David have a chance to drop the bufsiz argument altogether, as the
> info is available in *bufptr:s talloc context...
> 
> >
> >> +	    gzFile stream)
> >> +{
> >> +    size_t len = *bufsiz;
> >> +    char *buf = *bufptr;
> >> +    size_t offset = 0;
> >> +
> >> +    if (len == 0 || buf == NULL) {
> >> +	/* same as getdelim from gnulib */
> >> +	len = 120;
> >
> > This is presumably because glibc's malloc has an 8 byte header.  Fun
> > fact: talloc has a 104 byte header (on 64-bit and including the malloc
> > header).
> 
> hmm, what should we choose here? 152 ? Some bikeshedding on IRC ?

How about we bikeshed about not bikeshedding about this?

> >> +	buf = talloc_size (talloc_ctx, len);
> >> +	if (buf == NULL)
> >> +	    return UTIL_OUT_OF_MEMORY;
> >> +    }
> >> +
> >> +    while (1) {
> >> +	if (! gzgets (stream, buf + offset, len - offset)) {
> >> +	    int zlib_status = 0;
> >> +	    (void) gzerror (stream, &zlib_status);
> >> +	    switch (zlib_status) {
> >> +	    case Z_OK:
> >> +		/* follow getline behaviour */
> >> +		*bytes_read = -1;
> >
> > Is this really what getline does when the last line of a file isn't
> > \n-terminated?
> 
> Maybe the previous call returned non-\n -terminated string and
> for this call there was 0 bytes left to return ???

But my point is that the previous call *won't* return a
non-\n-terminated string.  If my file looks like "a\nb\nc", this will
return "a\n", then "b\n", and then fail (unless I'm following the code
wrong).  This is *not* what getline does (the manpage is confusing,
but I just tested it).

> Tomi
> 
> >> +		return UTIL_EOF;
> >> +		break;

Unnecessary break.

> >> +	    case Z_ERRNO:
> >> +		return UTIL_FILE;
> >> +		break;

And here.

> >> +	    default:
> >> +		return UTIL_ERROR;
> >> +	    }
> >> +	}
> >> +
> >> +	offset += strlen (buf + offset);
> >> +
> >> +	if ( buf[offset - 1] == '\n' )
> >
> > Too many spaces!
> >
> >> +	    break;
> >> +
> >> +	len *= 2;
> >> +	buf = talloc_realloc (talloc_ctx, buf, char, len);
> >
> > Or talloc_realloc_size, to match the initial talloc_size.
> > Alternatively, the initial talloc_size could be a talloc_array.
> >
> >> +	if (buf == NULL)
> >> +	    return UTIL_OUT_OF_MEMORY;
> >> +    }
> >> +
> >> +    *bufptr = buf;
> >> +    *bufsiz = len;
> >> +    *bytes_read = offset;
> >> +    return UTIL_SUCCESS;
> >> +}
> >> diff --git a/util/zlib-extra.h b/util/zlib-extra.h
> >> new file mode 100644
> >> index 0000000..ed46ac1
> >> --- /dev/null
> >> +++ b/util/zlib-extra.h
> >> @@ -0,0 +1,11 @@
> >> +#ifndef _ZLIB_EXTRA_H
> >> +#define _ZLIB_EXTRA_H
> >> +
> >> +#include <zlib.h>
> >> +#include "util.h"
> >
> > I'd put "util.h" first so we're more likely to catch missing header
> > dependencies (obviously util.h doesn't have any right now, but in the
> > future).
> >
> > Also, I'd put a blank line after the #includes.
> >
> >> +/* Like getline, but read from a gzFile. Allocation is with talloc */
> >> +util_status_t
> >> +gz_getline (void *ctx, char **lineptr, size_t *line_size, ssize_t *bytes_read,
> >> +	    gzFile stream);
> >> +
> >> +#endif

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch v5 2/6] dump: when given output file name, write atomically
  2014-04-02  2:07 ` [Patch v5 2/6] dump: when given output file name, write atomically Austin Clements
@ 2014-04-02 20:55   ` Austin Clements
  0 siblings, 0 replies; 17+ messages in thread
From: Austin Clements @ 2014-04-02 20:55 UTC (permalink / raw)
  To: notmuch

On Tue, 01 Apr 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> (Pardon the mobile review)

Apparently it's 2014 and the latest and greatest version of Android
still can't be bothered to add threading headers to emails.  This
message should at least put it in the right thread by exploiting
notmuch's thread linking algorithm.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch v5 3/6] util: add gz_readline
  2014-04-02 16:43     ` Tomi Ollila
  2014-04-02 20:43       ` Austin Clements
@ 2014-04-03  6:03       ` Kim Minh Kaplan
  2014-04-03  6:17       ` Kim Minh Kaplan
  2 siblings, 0 replies; 17+ messages in thread
From: Kim Minh Kaplan @ 2014-04-03  6:03 UTC (permalink / raw)
  To: Tomi Ollila; +Cc: notmuch

Tomi Ollila writes:

> On Wed, Apr 02 2014, Austin Clements <amdragon@MIT.EDU> wrote:
>
>> Quoth David Bremner on Apr 01 at 10:16 pm:
>>
>>> +    if (len == 0 || buf == NULL) {
>>> +	/* same as getdelim from gnulib */
>>> +	len = 120;
>>
>> This is presumably because glibc's malloc has an 8 byte header.  Fun
>> fact: talloc has a 104 byte header (on 64-bit and including the malloc
>> header).
>
> hmm, what should we choose here? 152 ? Some bikeshedding on IRC ?

When your done with this bikeshed, do not forget the other one :

>>> +	len *= 2;

-- 
Kim Minh.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch v5 3/6] util: add gz_readline
  2014-04-02 16:43     ` Tomi Ollila
  2014-04-02 20:43       ` Austin Clements
  2014-04-03  6:03       ` Kim Minh Kaplan
@ 2014-04-03  6:17       ` Kim Minh Kaplan
  2 siblings, 0 replies; 17+ messages in thread
From: Kim Minh Kaplan @ 2014-04-03  6:17 UTC (permalink / raw)
  To: Tomi Ollila; +Cc: notmuch

Tomi Ollila writes:

> On Wed, Apr 02 2014, Austin Clements <amdragon@MIT.EDU> wrote:
>
>> Quoth David Bremner on Apr 01 at 10:16 pm:
>>
>>> +    if (len == 0 || buf == NULL) {
>>> +	/* same as getdelim from gnulib */
>>> +	len = 120;
>>
>> This is presumably because glibc's malloc has an 8 byte header.  Fun
>> fact: talloc has a 104 byte header (on 64-bit and including the malloc
>> header).
>
> hmm, what should we choose here? 152 ? Some bikeshedding on IRC ?

When your done with this bikeshed, do not forget the other one :

>>> +	len *= 2;

-- 
Kim Minh.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-04-03  6:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-02  1:16 v5 gzip / dump restore David Bremner
2014-04-02  1:16 ` [Patch v5 1/6] dump: support gzipped output David Bremner
2014-04-02 16:50   ` Tomi Ollila
2014-04-02  1:16 ` [Patch v5 2/6] dump: when given output file name, write atomically David Bremner
2014-04-02  1:16 ` [Patch v5 3/6] util: add gz_readline David Bremner
2014-04-02  3:26   ` Austin Clements
2014-04-02 16:43     ` Tomi Ollila
2014-04-02 20:43       ` Austin Clements
2014-04-03  6:03       ` Kim Minh Kaplan
2014-04-03  6:17       ` Kim Minh Kaplan
2014-04-02  1:16 ` [Patch v5 4/6] restore: transparently support gzipped input David Bremner
2014-04-02  2:49   ` Austin Clements
2014-04-02  1:16 ` [Patch v5 5/6] notmuch-new: backup tags before database upgrade David Bremner
2014-04-02  3:35   ` Austin Clements
2014-04-02  1:16 ` [Patch v5 6/6] test: verify tag backup generated by " David Bremner
2014-04-02  2:07 ` [Patch v5 2/6] dump: when given output file name, write atomically Austin Clements
2014-04-02 20:55   ` Austin Clements

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).