From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 1C8FC431FBF for ; Fri, 4 Apr 2014 15:05:57 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id g1n8XGRddU0C for ; Fri, 4 Apr 2014 15:05:52 -0700 (PDT) Received: from dmz-mailsec-scanner-5.mit.edu (dmz-mailsec-scanner-5.mit.edu [18.7.68.34]) by olra.theworths.org (Postfix) with ESMTP id EDD00431FAF for ; Fri, 4 Apr 2014 15:05:51 -0700 (PDT) X-AuditID: 12074422-f79186d00000135a-f4-533f2cbfda6f Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP id C1.19.04954.FBC2F335; Fri, 4 Apr 2014 18:05:51 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id s34M5ndx014052; Fri, 4 Apr 2014 18:05:50 -0400 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s34M5mqO017034 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Fri, 4 Apr 2014 18:05:49 -0400 Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) (envelope-from ) id 1WWCFA-0004Ho-10; Fri, 04 Apr 2014 18:05:48 -0400 Date: Fri, 4 Apr 2014 18:05:47 -0400 From: Austin Clements To: David Bremner Subject: Re: [Patch v6 1/6] dump: support gzipped and atomic output Message-ID: <20140404220547.GB15472@mit.edu> References: <1396554083-3892-1-git-send-email-david@tethera.net> <1396554083-3892-2-git-send-email-david@tethera.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1396554083-3892-2-git-send-email-david@tethera.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpileLIzCtJLcpLzFFi42IR4hTV1t2vYx9scHa+hMWN1m5Gi+s3ZzI7 MHk8W3WL2WPLoffMAUxRXDYpqTmZZalF+nYJXBm7/kxlL1hWUPHkvHMD472wLkZODgkBE4nz z7YzQdhiEhfurWfrYuTiEBKYzSRx7/g9RghnA6PEyweLoZxTTBITem+xQjhLGCUezz/G3MXI wcEioCIx56YyyCg2AQ2JbfuXM4LYIgKqEle3TWYDsZkFpCW+/W4GWycs4Czx6dxpZhCbV0BH 4u+ra2A1QgLlEqdefGCHiAtKnJz5hAWiV0vixr+XTCCrQOYs/8cBEuYUcJR4/6YBbJUo0AVT Tm5jm8AoNAtJ9ywk3bMQuhcwMq9ilE3JrdLNTczMKU5N1i1OTszLSy3SNdXLzSzRS00p3cQI Cmp2F6UdjD8PKh1iFOBgVOLh7dhhFyzEmlhWXJl7iFGSg0lJlLdLwT5YiC8pP6UyI7E4I76o NCe1+BCjBAezkgivjCpQjjclsbIqtSgfJiXNwaIkzvvW2ipYSCA9sSQ1OzW1ILUIJivDwaEk wbtFG6hRsCg1PbUiLTOnBCHNxMEJMpwHaPhakBre4oLE3OLMdIj8KUZFKXHeLVpACQGQREZp HlwvLOm8YhQHekWY9wRIOw8wYcF1vwIazAQ0uCHMDmRwSSJCSqqB0ZB9/8LvkdnvbQ+xKXz3 OTuRb07THAXe0k3iW87b373Z/Wj9W+u9x17+2Np8i5dPryHodffZO2lcHOvKQtJmfti0ZNrn 79+bcmVtVA/7MF33urf09kmpoNrzDleir/u+LbI/PPNgvFmdwaEn5z9va7ZNesuld3yxkPXX wrf2Tge/HPZ3b37WXq/EUpyRaKjFXFScCACNihoBFQMAAA== Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Apr 2014 22:05:57 -0000 Quoth David Bremner on Apr 03 at 4:41 pm: > The main goal is to support gzipped output for future internal > calls (e.g. from notmuch-new) to notmuch_database_dump. > > The additional dependency is not very heavy since xapian already pulls > in zlib. > > We want the dump to be "atomic", in the sense that after running the > dump file is either present and complete, or not present. This avoids > certain classes of mishaps involving overwriting a good backup with a > bad or partial one. > --- > INSTALL | 20 ++++++++-- > Makefile.local | 2 +- > configure | 28 ++++++++++++-- > doc/man1/notmuch-dump.rst | 3 ++ > notmuch-client.h | 4 +- > notmuch-dump.c | 95 +++++++++++++++++++++++++++++++++++++---------- > test/T240-dump-restore.sh | 12 ++++++ > 7 files changed, 136 insertions(+), 28 deletions(-) > > diff --git a/INSTALL b/INSTALL > index 690b0ef..b543c50 100644 > --- a/INSTALL > +++ b/INSTALL > @@ -20,8 +20,8 @@ configure stage. > > Dependencies > ------------ > -Notmuch depends on three libraries: Xapian, GMime 2.4 or 2.6, and > -Talloc which are each described below: > +Notmuch depends on four libraries: Xapian, GMime 2.4 or 2.6, > +Talloc, and zlib which are each described below: > > Xapian > ------ > @@ -60,6 +60,18 @@ Talloc which are each described below: > > Talloc is available from http://talloc.samba.org/ > > + zlib > + ---- > + > + zlib is an extremely popular compression library. It is used > + by Xapian, so if you installed that you will already have > + zlib. You may need to install the zlib headers separately. > + > + Notmuch needs the transparent write feature of zlib introduced > + in version 1.2.5.2 (Dec. 2011). > + > + zlib is available from http://zlib.net > + > Building Documentation > ---------------------- > > @@ -79,11 +91,11 @@ dependencies with a simple simple command line. For example: > > For Debian and similar: > > - sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev python-sphinx > + sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev zlib1g-dev python-sphinx > > For Fedora and similar: > > - sudo yum install xapian-core-devel gmime-devel libtalloc-devel python-sphinx > + sudo yum install xapian-core-devel gmime-devel libtalloc-devel zlib-devel python-sphinx > > On other systems, a similar command can be used, but the details of > the package names may be different. > diff --git a/Makefile.local b/Makefile.local > index cb7b106..e5a20a7 100644 > --- a/Makefile.local > +++ b/Makefile.local > @@ -41,7 +41,7 @@ PV_FILE=bindings/python/notmuch/version.py > # Smash together user's values with our extra values > FINAL_CFLAGS = -DNOTMUCH_VERSION=$(VERSION) $(CPPFLAGS) $(CFLAGS) $(WARN_CFLAGS) $(extra_cflags) $(CONFIGURE_CFLAGS) > FINAL_CXXFLAGS = $(CPPFLAGS) $(CXXFLAGS) $(WARN_CXXFLAGS) $(extra_cflags) $(extra_cxxflags) $(CONFIGURE_CXXFLAGS) > -FINAL_NOTMUCH_LDFLAGS = $(LDFLAGS) -Lutil -lutil -Llib -lnotmuch $(AS_NEEDED_LDFLAGS) $(GMIME_LDFLAGS) $(TALLOC_LDFLAGS) > +FINAL_NOTMUCH_LDFLAGS = $(LDFLAGS) -Lutil -lutil -Llib -lnotmuch $(AS_NEEDED_LDFLAGS) $(GMIME_LDFLAGS) $(TALLOC_LDFLAGS) $(ZLIB_LDFLAGS) > FINAL_NOTMUCH_LINKER = CC > ifneq ($(LINKER_RESOLVES_LIBRARY_DEPENDENCIES),1) > FINAL_NOTMUCH_LDFLAGS += $(CONFIGURE_LDFLAGS) > diff --git a/configure b/configure > index 1d430b9..1d624f7 100755 > --- a/configure > +++ b/configure > @@ -340,6 +340,18 @@ else > errors=$((errors + 1)) > fi > > +printf "Checking for zlib (>= 1.2.5.2)... " > +have_zlib=0 > +if pkg-config --atleast-version=1.2.5.2 zlib; then > + printf "Yes.\n" > + have_zlib=1 > + zlib_cflags=$(pkg-config --cflags zlib) > + zlib_ldflags=$(pkg-config --libs zlib) > +else > + printf "No.\n" > + errors=$((errors + 1)) > +fi > + > printf "Checking for talloc development files... " > if pkg-config --exists talloc; then > printf "Yes.\n" > @@ -496,6 +508,11 @@ EOF > echo " Xapian library (including development files such as headers)" > echo " http://xapian.org/" > fi > + if [ $have_zlib -eq 0 ]; then > + echo " zlib library (including development files such as headers)" > + echo " http://zlib.net/" > + echo > + fi > if [ $have_gmime -eq 0 ]; then > echo " Either GMime 2.4 library" $GMIME_24_VERSION_CTR "or GMime 2.6 library" $GMIME_26_VERSION_CTR > echo " (including development files such as headers)" > @@ -519,11 +536,11 @@ case a simple command will install everything you need. For example: > > On Debian and similar systems: > > - sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev > + sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev zlib1g-dev > > Or on Fedora and similar systems: > > - sudo yum install xapian-core-devel gmime-devel libtalloc-devel > + sudo yum install xapian-core-devel gmime-devel libtalloc-devel zlib-devel > > On other systems, similar commands can be used, but the details of the > package names may be different. > @@ -844,6 +861,10 @@ XAPIAN_LDFLAGS = ${xapian_ldflags} > GMIME_CFLAGS = ${gmime_cflags} > GMIME_LDFLAGS = ${gmime_ldflags} > > +# Flags needed to compile and link against zlib > +ZLIB_CFLAGS = ${zlib_cflags} > +ZLIB_LDFLAGS = ${zlib_ldflags} > + > # Flags needed to compile and link against talloc > TALLOC_CFLAGS = ${talloc_cflags} > TALLOC_LDFLAGS = ${talloc_ldflags} > @@ -882,6 +903,7 @@ CONFIGURE_CFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS) \\ > -DUTIL_BYTE_ORDER=\$(UTIL_BYTE_ORDER) > > CONFIGURE_CXXFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS) \\ > + \$(ZLIB_CFLAGS) \\ > \$(TALLOC_CFLAGS) -DHAVE_VALGRIND=\$(HAVE_VALGRIND) \\ > \$(VALGRIND_CFLAGS) \$(XAPIAN_CXXFLAGS) \\ > -DHAVE_STRCASESTR=\$(HAVE_STRCASESTR) \\ > @@ -892,5 +914,5 @@ CONFIGURE_CXXFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) \$(GMIME_CFLAGS) \\ > -DHAVE_XAPIAN_COMPACT=\$(HAVE_XAPIAN_COMPACT) \\ > -DUTIL_BYTE_ORDER=\$(UTIL_BYTE_ORDER) > > -CONFIGURE_LDFLAGS = \$(GMIME_LDFLAGS) \$(TALLOC_LDFLAGS) \$(XAPIAN_LDFLAGS) > +CONFIGURE_LDFLAGS = \$(GMIME_LDFLAGS) \$(TALLOC_LDFLAGS) \$(ZLIB_LDFLAGS) \$(XAPIAN_LDFLAGS) > EOF > diff --git a/doc/man1/notmuch-dump.rst b/doc/man1/notmuch-dump.rst > index 17d1da5..d94cb4f 100644 > --- a/doc/man1/notmuch-dump.rst > +++ b/doc/man1/notmuch-dump.rst > @@ -19,6 +19,9 @@ recreated from the messages themselves. The output of notmuch dump is > therefore the only critical thing to backup (and much more friendly to > incremental backup than the native database files.) > > +``--gzip`` > + Compress the output in a format compatible with **gzip(1)**. > + > ``--format=(sup|batch-tag)`` > Notmuch restore supports two plain text dump formats, both with one > message-id per line, followed by a list of tags. > diff --git a/notmuch-client.h b/notmuch-client.h > index d110648..e1efbe0 100644 > --- a/notmuch-client.h > +++ b/notmuch-client.h > @@ -450,7 +450,9 @@ typedef enum dump_formats { > int > notmuch_database_dump (notmuch_database_t *notmuch, > const char *output_file_name, > - const char *query_str, dump_format_t output_format); > + const char *query_str, > + dump_format_t output_format, > + notmuch_bool_t gzip_output); > > #include "command-line-arguments.h" > #endif > diff --git a/notmuch-dump.c b/notmuch-dump.c > index 21702d7..2a7252a 100644 > --- a/notmuch-dump.c > +++ b/notmuch-dump.c > @@ -21,9 +21,11 @@ > #include "notmuch-client.h" > #include "hex-escape.h" > #include "string-util.h" > +#include > + > > static int > -database_dump_file (notmuch_database_t *notmuch, FILE *output, > +database_dump_file (notmuch_database_t *notmuch, gzFile output, > const char *query_str, int output_format) > { > notmuch_query_t *query; > @@ -69,7 +71,7 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output, > } > > if (output_format == DUMP_FORMAT_SUP) { > - fprintf (output, "%s (", message_id); > + gzprintf (output, "%s (", message_id); > } > > for (tags = notmuch_message_get_tags (message); > @@ -78,12 +80,12 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output, > const char *tag_str = notmuch_tags_get (tags); > > if (! first) > - fputs (" ", output); > + gzputs (output, " "); > > first = 0; > > if (output_format == DUMP_FORMAT_SUP) { > - fputs (tag_str, output); > + gzputs (output, tag_str); > } else { > if (hex_encode (notmuch, tag_str, > &buffer, &buffer_size) != HEX_SUCCESS) { > @@ -91,12 +93,12 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output, > tag_str); > return EXIT_FAILURE; > } > - fprintf (output, "+%s", buffer); > + gzprintf (output, "+%s", buffer); > } > } > > if (output_format == DUMP_FORMAT_SUP) { > - fputs (")\n", output); > + gzputs (output, ")\n"); > } else { > if (make_boolean_term (notmuch, "id", message_id, > &buffer, &buffer_size)) { > @@ -104,7 +106,7 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output, > message_id, strerror (errno)); > return EXIT_FAILURE; > } > - fprintf (output, " -- %s\n", buffer); > + gzprintf (output, " -- %s\n", buffer); > } > > notmuch_message_destroy (message); > @@ -121,24 +123,77 @@ database_dump_file (notmuch_database_t *notmuch, FILE *output, > int > notmuch_database_dump (notmuch_database_t *notmuch, > const char *output_file_name, > - const char *query_str, dump_format_t output_format) > + const char *query_str, > + dump_format_t output_format, > + notmuch_bool_t gzip_output) > { > - FILE *output = stdout; > - int ret; > + gzFile output; This should to gzclose_w output on all error paths. The easiest way to do that it probably to gzFile output = NULL; here and ... > + const char *mode = gzip_output ? "w9" : "wT"; > + const char *name_for_error = output_file_name ? output_file_name : "stdout"; > + > + char *tempname = NULL; > + int outfd = -1; > + > + int ret = -1; > > if (output_file_name) { > - output = fopen (output_file_name, "w"); > - if (output == NULL) { > - fprintf (stderr, "Error opening %s for writing: %s\n", > - output_file_name, strerror (errno)); > - return EXIT_FAILURE; > - } > + tempname = talloc_asprintf (notmuch, "%s.XXXXXX", output_file_name); > + outfd = mkstemp (tempname); > + } else { > + outfd = dup (STDOUT_FILENO); > + } > + > + if (outfd < 0) { > + fprintf (stderr, "Bad output file %s\n", name_for_error); > + goto DONE; > + } > + > + output = gzdopen (outfd, mode); > + > + if (output == NULL) { > + fprintf (stderr, "Error opening %s for (gzip) writing: %s\n", > + name_for_error, strerror (errno)); > + if (close (outfd)) > + fprintf (stderr, "Error closing %s during shutdown: %s\n", > + name_for_error, strerror (errno)); > + goto DONE; > } > > ret = database_dump_file (notmuch, output, query_str, output_format); > + if (ret) goto DONE; > > - if (output != stdout) > - fclose (output); > + ret = gzflush (output, Z_FINISH); > + if (ret) { > + fprintf (stderr, "Error flushing output: %s\n", gzerror (output, NULL)); > + goto DONE; > + } > + > + if (output_file_name) { > + ret = fdatasync (outfd); > + if (ret) { > + fprintf (stderr, "Error syncing %s to disk: %s\n", > + name_for_error, strerror (errno)); > + goto DONE; > + } > + } > + > + if (gzclose_w (output) != Z_OK) { > + ret = EXIT_FAILURE; > + goto DONE; > + } ... output = NULL; here, and ... > + > + if (output_file_name) { > + ret = rename (tempname, output_file_name); > + if (ret) { > + fprintf (stderr, "Error renaming %s to %s: %s\n", > + tempname, output_file_name, strerror (errno)); > + goto DONE; > + } > + > + } > + DONE: ... if (output) gzclose_w (output); here. > + if (ret != EXIT_SUCCESS && output_file_name) > + (void) unlink (tempname); > > return ret; > } > @@ -158,6 +213,7 @@ notmuch_dump_command (notmuch_config_t *config, int argc, char *argv[]) > int opt_index; > > int output_format = DUMP_FORMAT_BATCH_TAG; > + notmuch_bool_t gzip_output = 0; > > notmuch_opt_desc_t options[] = { > { NOTMUCH_OPT_KEYWORD, &output_format, "format", 'f', > @@ -165,6 +221,7 @@ notmuch_dump_command (notmuch_config_t *config, int argc, char *argv[]) > { "batch-tag", DUMP_FORMAT_BATCH_TAG }, > { 0, 0 } } }, > { NOTMUCH_OPT_STRING, &output_file_name, "output", 'o', 0 }, > + { NOTMUCH_OPT_BOOLEAN, &gzip_output, "gzip", 'z', 0 }, > { 0, 0, 0, 0, 0 } > }; > > @@ -181,7 +238,7 @@ notmuch_dump_command (notmuch_config_t *config, int argc, char *argv[]) > } > > ret = notmuch_database_dump (notmuch, output_file_name, query_str, > - output_format); > + output_format, gzip_output); > > notmuch_database_destroy (notmuch); > > diff --git a/test/T240-dump-restore.sh b/test/T240-dump-restore.sh > index 0004438..d79aca8 100755 > --- a/test/T240-dump-restore.sh > +++ b/test/T240-dump-restore.sh > @@ -68,6 +68,18 @@ test_begin_subtest "dump --output=outfile --" > notmuch dump --output=dump-1-arg-dash.actual -- > test_expect_equal_file dump.expected dump-1-arg-dash.actual > > +# gzipped output > + > +test_begin_subtest "dump --gzip" > +notmuch dump --gzip > dump-gzip.gz > +gunzip dump-gzip.gz > +test_expect_equal_file dump.expected dump-gzip > + > +test_begin_subtest "dump --gzip --output=outfile" > +notmuch dump --gzip --output=dump-gzip-outfile.gz > +gunzip dump-gzip-outfile.gz > +test_expect_equal_file dump.expected dump-gzip-outfile > + > # Note, we assume all messages from cworth have a message-id > # containing cworth.org >