unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Mark Walters <markwalters1009@gmail.com>
To: David Bremner <david@tethera.net>, notmuch@notmuchmail.org
Cc: David Bremner <bremner@debian.org>
Subject: Re: [PATCH v3 09/10] random-dump.c: new test-binary to generate dump files
Date: Sun, 05 Feb 2012 01:04:13 +0000	[thread overview]
Message-ID: <87pqduozua.fsf@qmul.ac.uk> (raw)
In-Reply-To: <1326591624-15493-10-git-send-email-david@tethera.net>

On Sat, 14 Jan 2012 21:40:23 -0400, David Bremner <david@tethera.net> wrote:
> From: David Bremner <bremner@debian.org>
> 
> This binary creates a "torture test" dump file for the new dump
> format.
> ---
>  test/Makefile.local |    4 ++
>  test/basic          |    2 +-
>  test/random-dump.c  |  144 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 149 insertions(+), 1 deletions(-)
>  create mode 100644 test/random-dump.c
> 
> diff --git a/test/Makefile.local b/test/Makefile.local
> index ba697f4..b59f837 100644
> --- a/test/Makefile.local
> +++ b/test/Makefile.local
> @@ -16,6 +16,9 @@ $(dir)/arg-test: $(dir)/arg-test.o command-line-arguments.o util/libutil.a
>  $(dir)/hex-xcode: $(dir)/hex-xcode.o command-line-arguments.o util/libutil.a
>  	$(call quiet,CC) -I. $^ -o $@ -ltalloc
>  
> +$(dir)/random-dump:  $(dir)/random-dump.o command-line-arguments.o util/libutil.a
> +	$(call quiet,CC) -I. $^ -o $@ -ltalloc -lm
> +
>  $(dir)/smtp-dummy: $(smtp_dummy_modules)
>  	$(call quiet,CC) $^ -o $@
>  
> @@ -25,6 +28,7 @@ $(dir)/symbol-test: $(dir)/symbol-test.o
>  .PHONY: test check
>  
>  test-binaries: $(dir)/arg-test $(dir)/hex-xcode \
> +	$(dir)/random-dump \
>  	 $(dir)/smtp-dummy $(dir)/symbol-test
>  
>  test:	all test-binaries
> diff --git a/test/basic b/test/basic
> index af57026..e3a6cef 100755
> --- a/test/basic
> +++ b/test/basic
> @@ -54,7 +54,7 @@ test_begin_subtest 'Ensure that all available tests will be run by notmuch-test'
>  eval $(sed -n -e '/^TESTS="$/,/^"$/p' $TEST_DIRECTORY/notmuch-test)
>  tests_in_suite=$(for i in $TESTS; do echo $i; done | sort)
>  available=$(find "$TEST_DIRECTORY" -maxdepth 1 -type f -executable -printf '%f\n' | \
> -    sed -r -e "/^(aggregate-results.sh|notmuch-test|smtp-dummy|test-verbose|symbol-test|arg-test|hex-xcode)$/d" | \
> +    sed -r -e "/^(aggregate-results.sh|notmuch-test|smtp-dummy|test-verbose|symbol-test|arg-test|hex-xcode|random-dump)$/d" | \
>      sort)
>  test_expect_equal "$tests_in_suite" "$available"
>  
> diff --git a/test/random-dump.c b/test/random-dump.c
> new file mode 100644
> index 0000000..1949425
> --- /dev/null
> +++ b/test/random-dump.c
> @@ -0,0 +1,144 @@
> +/*
> +   Generate a random dump file in 'notmuch' format.
> +   Generated message-id's and tags are intentionally nasty.
> +
> +   We restrict ourselves to 7 bit message-ids, because generating
> +   random valid UTF-8 seems like work. And invalid UTF-8 can't be
> +   round-tripped via Xapian.
> +
> + */
> +
> +#include <stdlib.h>
> +#include <assert.h>
> +#include <talloc.h>
> +#include <string.h>
> +#include "math.h"
> +#include "hex-escape.h"
> +#include "command-line-arguments.h"
> +
> +static void
> +hex_out (void *ctx, char *buf)
> +{
> +    static char *encoded_buf = NULL;
> +    static size_t encoded_buf_size = 0;
> +
> +    if (hex_encode (ctx, buf, &encoded_buf, &encoded_buf_size) != HEX_SUCCESS) {
> +	fprintf (stderr, "Hex encoding failed");
> +	exit (1);
> +    }
> +
> +    fputs (encoded_buf, stdout);
> +}
> +
> +static void
> +random_chars (char *buf, int from, int stop, int max_char,
> +	      const char *blacklist)
> +{
> +    int i;
> +
> +    for (i = from; i < stop; i++) {
> +	do {
> +	    buf[i] = ' ' + (random () % (max_char - ' '));
> +	} while (blacklist && strchr (blacklist, buf[i]));
> +    }
> +}
> +
> +static void
> +random_tag (void *ctx, size_t len)
> +{
> +    static char *buf = NULL;
> +    static size_t buf_len = 0;
> +
> +    int use = (random () % (len - 1)) + 1;
> +
> +    if (len > buf_len) {
> +	buf = talloc_realloc (ctx, buf, char, len);
> +	buf_len = len;
> +    }
> +
> +    random_chars (buf, 0, use, 255, NULL);
> +
> +    buf[use] = '\0';
> +
> +    hex_out (ctx, buf);
> +}
> +
> +static void
> +random_message_id (void *ctx, size_t len)
> +{
> +    static char *buf = NULL;
> +    static size_t buf_len = 0;
> +
> +    int lhs_len = (random () % (len / 2 - 1)) + 1;
> +
> +    int rhs_len = (random () % len / 2) + 1;
> +
> +    const char *blacklist = "\n\r@<>[]()";
> +
> +    if (len > buf_len) {
> +	buf = talloc_realloc (ctx, buf, char, len);
> +	buf_len = len;
> +    }
> +
> +    random_chars (buf, 0, lhs_len, 127, blacklist);
> +
> +    buf[lhs_len] = '@';
> +
> +    random_chars (buf, lhs_len + 1, lhs_len + rhs_len + 1, 127, blacklist);
> +
> +    hex_out (ctx, buf);
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> +
> +    void *ctx = talloc_new (NULL);
> +    int num_lines = 500;
> +    int max_tags = 10;
> +    int message_id_len = 100;
> +    int tag_len = 50;
> +    int seed = 734569;
> +
> +    int pad_tag = 0, pad_mid = 0;
> +
> +    notmuch_opt_desc_t options[] = {
> +	{ NOTMUCH_OPT_INT, &num_lines, "num-lines", 'n', 0 },
> +	{ NOTMUCH_OPT_INT, &max_tags, "max-tags", 'm', 0 },
> +	{ NOTMUCH_OPT_INT, &message_id_len, "message-id-len", 'M', 0 },
> +	{ NOTMUCH_OPT_INT, &tag_len, "tag-len", 't', 0 },
> +	{ NOTMUCH_OPT_INT, &seed, "tag-len", 't', 0 },
> +	{ 0, 0, 0, 0, 0 }
> +    };
> +
> +    int opt_index = parse_arguments (argc, argv, options, 1);
> +
> +    if (opt_index < 0)
> +	exit (1);
> +
> +    pad_mid = ((int) log10 (num_lines) + 1);
> +    pad_tag = ((int) log10 (max_tags)) + 1;
> +
> +    srandom (seed);
> +
> +    int line;
> +    for (line = 0; line < num_lines; line++) {
> +
> +	printf ("%0*d-", pad_mid, line);
> +
> +	random_message_id (ctx, message_id_len);
> +
> +	int num_tags = random () % (max_tags + 1);
> +
> +	int j;
> +	for (j = 0; j < num_tags; j++) {
> +	    printf (" %0*d-", pad_tag, j);
> +	    random_tag (ctx, tag_len);
> +	}
> +	putchar ('\n');
> +    }
> +
> +    talloc_free (ctx);
> +
> +    return 0;
> +}

Hi

Just a thought on this and the next test. Could you add messages with
the random ids and tags from the above code to the Xapian database
directly by calling whatever notmuch-new calls. Then test by doing dump,
restore and dump and check the two dumps are equal? It might avoid your
gmime concern from the next patch and you could have arbitrary
(non-null) strings including all sorts of malformed utf-8.

I guess Xapian might do bizarre things on the malformed utf-8 but, if it
does, it might mean the correct place to fix it is in notmuch-new.

Best wishes

Mark

  parent reply	other threads:[~2012-02-05  1:03 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <id:87sjkmuck7.fsf@gmail.com>
2011-12-18 13:15 ` Revised dump-restore patches David Bremner
2011-12-18 13:15   ` [PATCH Draft 2 1/9] hex-escape: (en|de)code strings to/from restricted character set David Bremner
2011-12-18 13:15   ` [PATCH Draft 2 2/9] test/hex-xcode: new test binary David Bremner
2011-12-18 13:15   ` [PATCH Draft 2 3/9] test/hex-escaping: new test for hex escaping routines David Bremner
2011-12-18 13:15   ` [PATCH Draft 2 4/9] notmuch-dump: add --format=(notmuch|sup) David Bremner
2011-12-18 13:15   ` [PATCH Draft 2 5/9] test: add test for dump --format=notmuch David Bremner
2011-12-18 13:15   ` [PATCH Draft 2 6/9] notmuch-restore: add --format=notmuch support David Bremner
2011-12-18 13:15   ` [PATCH Draft 2 7/9] test: second set of dump/restore --format=notmuch tests David Bremner
2011-12-18 13:15   ` [PATCH Draft 2 8/9] notmuch-restore: auto detect format of backup David Bremner
2011-12-18 13:15   ` [PATCH Draft 2 9/9] test: add tests for restore --auto David Bremner
2011-12-18 20:51   ` Revised dump-restore patches Jameson Graef Rollins
2011-12-18 22:09     ` David Bremner
2012-01-15  1:40       ` New dump/restore format David Bremner
2012-01-15  1:40         ` [PATCH v3 01/10] hex-escape: (en|de)code strings to/from restricted character set David Bremner
2012-01-15  1:40         ` [PATCH v3 02/10] test/hex-xcode: new test binary David Bremner
2012-01-15  1:40         ` [PATCH v3 03/10] test/hex-escaping: new test for hex escaping routines David Bremner
2012-01-15  1:40         ` [PATCH v3 04/10] notmuch-dump: add --format=(notmuch|sup) David Bremner
2012-01-15 20:35           ` Austin Clements
2012-01-15 23:40             ` David Bremner
2012-01-15  1:40         ` [PATCH v3 05/10] test: add test for dump --format=notmuch David Bremner
2012-01-15  1:40         ` [PATCH v3 06/10] notmuch-restore: add 'notmuch format' support, auto detect David Bremner
2012-01-15  1:40         ` [PATCH v3 07/10] test: second set of dump/restore --format=notmuch tests David Bremner
2012-01-15  1:40         ` [PATCH v3 08/10] notmuch-{dump, restore}.1: document new format options David Bremner
2012-01-15  1:40         ` [PATCH v3 09/10] random-dump.c: new test-binary to generate dump files David Bremner
2012-01-15  8:05           ` Jani Nikula
2012-01-15 13:44             ` David Bremner
2012-02-05  1:04           ` Mark Walters [this message]
2012-01-15  1:40         ` [PATCH v3 10/10] test: new random message-id and tags dump/restore test David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pqduozua.fsf@qmul.ac.uk \
    --to=markwalters1009@gmail.com \
    --cc=bremner@debian.org \
    --cc=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).