unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH v5 0/9] notmuch search date:since..until query support
@ 2012-10-21 21:22 Jani Nikula
  2012-10-21 21:22 ` [PATCH v5 1/9] build: drop the -Wswitch-enum warning Jani Nikula
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Jani Nikula @ 2012-10-21 21:22 UTC (permalink / raw)
  To: notmuch

Hi all, v5 of id:"cover.1350164594.git.jani@nikula.org", with the
following changes in patch 2 per Ethan's review
id:"87hapw9x99.fsf@betacantrips.com":
 - remove useless intermediate function handle_postponed_number
 - rename stringcmp to match_keyword

BR,
Jani.


Jani Nikula (9):
  build: drop the -Wswitch-enum warning
  parse-time-string: add a date/time parser to notmuch
  test: add new test tool parse-time for date/time parser
  test: add smoke tests for the date/time parser module
  build: build parse-time-string as part of the notmuch lib and static
    cli
  lib: add date range query support
  test: add tests for date:since..until range queries
  man: document the date:since..until range queries
  NEWS: date range search support

 Makefile                              |    2 +-
 Makefile.local                        |    2 +-
 NEWS                                  |   14 +
 configure                             |    2 +-
 lib/Makefile.local                    |    3 +-
 lib/database-private.h                |    1 +
 lib/database.cc                       |    5 +
 lib/parse-time-vrp.cc                 |   40 +
 lib/parse-time-vrp.h                  |   19 +
 man/man7/notmuch-search-terms.7       |  147 +++-
 parse-time-string/Makefile            |    5 +
 parse-time-string/Makefile.local      |   12 +
 parse-time-string/README              |    9 +
 parse-time-string/parse-time-string.c | 1477 +++++++++++++++++++++++++++++++++
 parse-time-string/parse-time-string.h |  102 +++
 test/Makefile.local                   |    7 +-
 test/basic                            |    2 +-
 test/notmuch-test                     |    2 +
 test/parse-time-string                |   71 ++
 test/parse-time.c                     |  281 +++++++
 test/search-date                      |   21 +
 21 files changed, 2206 insertions(+), 18 deletions(-)
 create mode 100644 lib/parse-time-vrp.cc
 create mode 100644 lib/parse-time-vrp.h
 create mode 100644 parse-time-string/Makefile
 create mode 100644 parse-time-string/Makefile.local
 create mode 100644 parse-time-string/README
 create mode 100644 parse-time-string/parse-time-string.c
 create mode 100644 parse-time-string/parse-time-string.h
 create mode 100755 test/parse-time-string
 create mode 100644 test/parse-time.c
 create mode 100755 test/search-date

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v5 1/9] build: drop the -Wswitch-enum warning
  2012-10-21 21:22 [PATCH v5 0/9] notmuch search date:since..until query support Jani Nikula
@ 2012-10-21 21:22 ` Jani Nikula
  2012-10-21 21:22 ` [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch Jani Nikula
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Jani Nikula @ 2012-10-21 21:22 UTC (permalink / raw)
  To: notmuch

-Wswitch-enum is a bit awkward if a switch statement is intended to
handle just some of the named codes of an enumeration especially, and
leave the rest to the default label.

We already have -Wall, which enables -Wswitch by default, and per GCC
documentation, "The only difference between -Wswitch and this option
[-Wswitch-enum] is that this option gives a warning about an omitted
enumeration code even if there is a default label."

Drop -Wswitch-enum to not force listing all named codes of
enumerations in switch statements that have a default label.

---

This will be useful in the next patch.
---
 configure |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index acb90a8..afa5c16 100755
--- a/configure
+++ b/configure
@@ -532,7 +532,7 @@ fi
 
 WARN_CXXFLAGS=""
 printf "Checking for available C++ compiler warning flags... "
-for flag in -Wall -Wextra -Wwrite-strings -Wswitch-enum; do
+for flag in -Wall -Wextra -Wwrite-strings; do
     if ${CC} $flag -o minimal minimal.c > /dev/null 2>&1
     then
 	WARN_CXXFLAGS="${WARN_CXXFLAGS}${WARN_CXXFLAGS:+ }${flag}"
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch
  2012-10-21 21:22 [PATCH v5 0/9] notmuch search date:since..until query support Jani Nikula
  2012-10-21 21:22 ` [PATCH v5 1/9] build: drop the -Wswitch-enum warning Jani Nikula
@ 2012-10-21 21:22 ` Jani Nikula
  2012-10-22  8:14   ` Austin Clements
  2012-10-21 21:22 ` [PATCH v5 3/9] test: add new test tool parse-time for date/time parser Jani Nikula
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Jani Nikula @ 2012-10-21 21:22 UTC (permalink / raw)
  To: notmuch

Add a date/time parser to notmuch, to be used for adding date range
query support for notmuch lib later on. Add the parser to a directory
of its own to make it independent of the rest of the notmuch code
base.

Signed-off-by: Jani Nikula <jani@nikula.org>
---
 Makefile                              |    2 +-
 parse-time-string/Makefile            |    5 +
 parse-time-string/Makefile.local      |   12 +
 parse-time-string/README              |    9 +
 parse-time-string/parse-time-string.c | 1477 +++++++++++++++++++++++++++++++++
 parse-time-string/parse-time-string.h |  102 +++
 6 files changed, 1606 insertions(+), 1 deletion(-)
 create mode 100644 parse-time-string/Makefile
 create mode 100644 parse-time-string/Makefile.local
 create mode 100644 parse-time-string/README
 create mode 100644 parse-time-string/parse-time-string.c
 create mode 100644 parse-time-string/parse-time-string.h

diff --git a/Makefile b/Makefile
index e5e2e3a..bb9c316 100644
--- a/Makefile
+++ b/Makefile
@@ -3,7 +3,7 @@
 all:
 
 # List all subdirectories here. Each contains its own Makefile.local
-subdirs = compat completion emacs lib man util test
+subdirs = compat completion emacs lib man parse-time-string util test
 
 # We make all targets depend on the Makefiles themselves.
 global_deps = Makefile Makefile.config Makefile.local \
diff --git a/parse-time-string/Makefile b/parse-time-string/Makefile
new file mode 100644
index 0000000..fa25832
--- /dev/null
+++ b/parse-time-string/Makefile
@@ -0,0 +1,5 @@
+all:
+	$(MAKE) -C .. all
+
+.DEFAULT:
+	$(MAKE) -C .. $@
diff --git a/parse-time-string/Makefile.local b/parse-time-string/Makefile.local
new file mode 100644
index 0000000..53534f3
--- /dev/null
+++ b/parse-time-string/Makefile.local
@@ -0,0 +1,12 @@
+dir := parse-time-string
+extra_cflags += -I$(srcdir)/$(dir)
+
+libparse-time-string_c_srcs := $(dir)/parse-time-string.c
+
+libparse-time-string_modules := $(libparse-time-string_c_srcs:.c=.o)
+
+$(dir)/libparse-time-string.a: $(libparse-time-string_modules)
+	$(call quiet,AR) rcs $@ $^
+
+SRCS := $(SRCS) $(libparse-time-string_c_srcs)
+CLEAN := $(CLEAN) $(libparse-time-string_modules) $(dir)/libparse-time-string.a
diff --git a/parse-time-string/README b/parse-time-string/README
new file mode 100644
index 0000000..300ff1f
--- /dev/null
+++ b/parse-time-string/README
@@ -0,0 +1,9 @@
+PARSE TIME STRING
+=================
+
+parse_time_string() is a date/time parser originally written for
+notmuch by Jani Nikula <jani@nikula.org>. However, there is nothing
+notmuch specific in it, and it should be kept reusable for other
+projects, and ready to be packaged on its own as needed. Please do not
+add dependencies on or references to anything notmuch specific. The
+parser should only depend on the C library.
diff --git a/parse-time-string/parse-time-string.c b/parse-time-string/parse-time-string.c
new file mode 100644
index 0000000..942041a
--- /dev/null
+++ b/parse-time-string/parse-time-string.c
@@ -0,0 +1,1477 @@
+/*
+ * parse time string - user friendly date and time parser
+ * Copyright © 2012 Jani Nikula
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Jani Nikula <jani@nikula.org>
+ */
+
+#include <assert.h>
+#include <ctype.h>
+#include <errno.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+#include <strings.h>
+#include <time.h>
+#include <sys/time.h>
+#include <sys/types.h>
+
+#include "parse-time-string.h"
+
+/*
+ * IMPLEMENTATION DETAILS
+ *
+ * At a high level, the parsing is done in two phases: 1) actual
+ * parsing of the input string and storing the parsed data into
+ * 'struct state', and 2) processing of the data in 'struct state'
+ * according to current time (or provided reference time) and
+ * rounding. This is evident in the main entry point function
+ * parse_time_string().
+ *
+ * 1) The parsing phase - parse_input()
+ *
+ * Parsing is greedy and happens from left to right. The parsing is as
+ * unambiguous as possible; only unambiguous date/time formats are
+ * accepted. Redundant or contradictory absolute date/time in the
+ * input (e.g. date specified multiple times/ways) is not
+ * accepted. Relative date/time on the other hand just accumulates if
+ * present multiple times (e.g. "5 days 5 days" just turns into 10
+ * days).
+ *
+ * Parsing decisions are made on the input format, not value. For
+ * example, "20/5/2005" fails because the recognized format here is
+ * MM/D/YYYY, even though the values would suggest DD/M/YYYY.
+ *
+ * Parsing is mostly stateless in the sense that parsing decisions are
+ * not made based on the values of previously parsed data, or whether
+ * certain data is present in the first place. (There are a few
+ * exceptions to the latter part, though, such as parsing of time zone
+ * that would otherwise look like plain time.)
+ *
+ * When the parser encounters a number that is not greedily parsed as
+ * part of a format, the interpretation is postponed until the next
+ * token is parsed. The parser for the next token may consume the
+ * previously postponed number. For example, when parsing "20 May" the
+ * meaning of "20" is not known until "May" is parsed. If the parser
+ * for the next token does not consume the postponed number, the
+ * number is handled as a "lone" number before parser for the next
+ * token finishes.
+ *
+ * 2) The processing phase - create_output()
+ *
+ * Once the parser in phase 1 has finished, 'struct state' contains
+ * all the information from the input string, and it's no longer
+ * needed. Since the parser does not even handle the concept of "now",
+ * the processing initializes the fields referring to the current
+ * date/time.
+ *
+ * If requested, the result is rounded towards past or future. The
+ * idea behind rounding is to support parsing date/time ranges in an
+ * obvious way. For example, for a range defined as two dates (without
+ * time), one would typically want to have an inclusive range from the
+ * beginning of start date to the end of the end date. The caller
+ * would use rounding towards past in the start date, and towards
+ * future in the end date.
+ *
+ * The absolute date and time is shifted by the relative date and
+ * time, and time zone adjustments are made. Daylight saving time
+ * (DST) is specifically *not* handled at all.
+ *
+ * Finally, the result is stored to time_t.
+ */
+
+#define unused(x) x __attribute__ ((unused))
+
+/* XXX: Redefine these to add i18n support. The keyword table uses
+ * N_() to mark strings to be translated; they are accessed
+ * dynamically using _(). */
+#define _(s) (s)	/* i18n: define as gettext (s) */
+#define N_(s) (s)	/* i18n: define as gettext_noop (s) */
+
+#define ARRAY_SIZE(a) (sizeof (a) / sizeof (a[0]))
+
+/*
+ * Field indices in the tm and set arrays of struct state.
+ *
+ * NOTE: There's some code that depends on the ordering of this enum.
+ */
+enum field {
+    /* Keep SEC...YEAR in this order. */
+    TM_ABS_SEC,		/* seconds */
+    TM_ABS_MIN,		/* minutes */
+    TM_ABS_HOUR,	/* hours */
+    TM_ABS_MDAY,	/* day of the month */
+    TM_ABS_MON,		/* month */
+    TM_ABS_YEAR,	/* year */
+
+    TM_ABS_WDAY,	/* day of the week. special: may be relative */
+    TM_ABS_ISDST,	/* daylight saving time */
+
+    TM_AMPM,		/* am vs. pm */
+    TM_TZ,		/* timezone in minutes */
+
+    /* Keep SEC...YEAR in this order. */
+    TM_REL_SEC,		/* seconds relative to absolute or reference time */
+    TM_REL_MIN,		/* minutes ... */
+    TM_REL_HOUR,	/* hours ... */
+    TM_REL_DAY,		/* days ... */
+    TM_REL_MON,		/* months ... */
+    TM_REL_YEAR,	/* years ... */
+    TM_REL_WEEK,	/* weeks ... */
+
+    TM_NONE,		/* not a field */
+
+    TM_SIZE = TM_NONE,
+    TM_FIRST_ABS = TM_ABS_SEC,
+    TM_FIRST_REL = TM_REL_SEC,
+};
+
+/* Values for the set array of struct state. */
+enum field_set {
+    FIELD_UNSET,	/* The field has not been touched by parser. */
+    FIELD_SET,		/* The field has been set by parser. */
+    FIELD_NOW,		/* The field will be set to reference time. */
+};
+
+static enum field
+next_abs_field (enum field field)
+{
+    /* NOTE: Depends on the enum ordering. */
+    return field < TM_ABS_YEAR ? field + 1 : TM_NONE;
+}
+
+static enum field
+abs_to_rel_field (enum field field)
+{
+    assert (field <= TM_ABS_YEAR);
+
+    /* NOTE: Depends on the enum ordering. */
+    return field + (TM_FIRST_REL - TM_FIRST_ABS);
+}
+
+/* Get epoch value for field. */
+static int
+field_epoch (enum field field)
+{
+    if (field == TM_ABS_MDAY || field == TM_ABS_MON)
+	return 1;
+    else if (field == TM_ABS_YEAR)
+	return 1970;
+    else
+	return 0;
+}
+
+/* The parsing state. */
+struct state {
+    int tm[TM_SIZE];			/* parsed date and time */
+    enum field_set set[TM_SIZE];	/* set status of tm */
+
+    enum field last_field;	/* Previously set field. */
+    char delim;
+
+    int postponed_length;	/* Number of digits in postponed value. */
+    int postponed_value;
+    char postponed_delim;	/* The delimiter preceding postponed number. */
+};
+
+/*
+ * Helpers for postponed numbers.
+ *
+ * postponed_length is the number of digits in postponed value. 0
+ * means there is no postponed number. -1 means there is a postponed
+ * number, but it comes from a keyword, and it doesn't have digits.
+ */
+static int
+get_postponed_length (struct state *state)
+{
+    return state->postponed_length;
+}
+
+/*
+ * Consume a previously postponed number. Return true if a number was
+ * in fact postponed, false otherwise. Store the postponed number's
+ * value in *v, length in the input string in *n (or -1 if the number
+ * was written out and parsed as a keyword), and the preceding
+ * delimiter to *d.
+ */
+static bool
+get_postponed_number (struct state *state, int *v, int *n, char *d)
+{
+    if (!state->postponed_length)
+	return false;
+
+    if (n)
+	*n = state->postponed_length;
+
+    if (v)
+	*v = state->postponed_value;
+
+    if (d)
+	*d = state->postponed_delim;
+
+    state->postponed_length = 0;
+    state->postponed_value = 0;
+    state->postponed_delim = 0;
+
+    return true;
+}
+
+static int parse_postponed_number (struct state *state, enum field next_field);
+
+/*
+ * Postpone a number to be handled later. If one exists already,
+ * handle it first. n may be -1 to indicate a keyword that has no
+ * number length.
+ */
+static int
+set_postponed_number (struct state *state, int v, int n)
+{
+    int r;
+    char d = state->delim;
+
+    /* Parse a previously postponed number, if any. */
+    r = parse_postponed_number (state, TM_NONE);
+    if (r)
+	return r;
+
+    state->postponed_length = n;
+    state->postponed_value = v;
+    state->postponed_delim = d;
+
+    return 0;
+}
+
+static void
+set_delim (struct state *state, char delim)
+{
+    state->delim = delim;
+}
+
+static void
+unset_delim (struct state *state)
+{
+    state->delim = 0;
+}
+
+/*
+ * Field set/get/mod helpers.
+ */
+
+/* Return true if field has been set. */
+static bool
+is_field_set (struct state *state, enum field field)
+{
+    assert (field < ARRAY_SIZE (state->tm));
+
+    return field < ARRAY_SIZE (state->set) &&
+	   state->set[field] != FIELD_UNSET;
+}
+
+static void
+unset_field (struct state *state, enum field field)
+{
+    assert (field < ARRAY_SIZE (state->tm));
+
+    state->set[field] = FIELD_UNSET;
+    state->tm[field] = 0;
+}
+
+/*
+ * Set field to value. A field can only be set once to ensure the
+ * input does not contain redundant and potentially conflicting data.
+ */
+static int
+set_field (struct state *state, enum field field, int value)
+{
+    int r;
+
+    assert (field < ARRAY_SIZE (state->tm));
+
+    /* Fields can only be set once. */
+    if (field < ARRAY_SIZE (state->set) && state->set[field] != FIELD_UNSET)
+	return -PARSE_TIME_ERR_ALREADYSET;
+
+    state->set[field] = FIELD_SET;
+
+    /* Parse a previously postponed number, if any. */
+    r = parse_postponed_number (state, field);
+    if (r)
+	return r;
+
+    unset_delim (state);
+
+    state->tm[field] = value;
+    state->last_field = field;
+
+    return 0;
+}
+
+/*
+ * Mark n fields in fields to be set to the reference date/time in the
+ * specified time zone, or local timezone if not specified. The fields
+ * will be initialized after parsing is complete and timezone is
+ * known.
+ */
+static int
+set_fields_to_now (struct state *state, enum field *fields, size_t n)
+{
+    size_t i;
+    int r;
+
+    for (i = 0; i < n; i++) {
+	r = set_field (state, fields[i], 0);
+	if (r)
+	    return r;
+	state->set[fields[i]] = FIELD_NOW;
+    }
+
+    return 0;
+}
+
+/* Modify field by adding value to it. To be used on relative fields,
+ * which can be modified multiple times (to accumulate). */
+static int
+mod_field (struct state *state, enum field field, int value)
+{
+    int r;
+
+    assert (field < ARRAY_SIZE (state->tm));   /* assert relative??? */
+
+    if (field < ARRAY_SIZE (state->set))
+	state->set[field] = FIELD_SET;
+
+    /* Parse a previously postponed number, if any. */
+    r = parse_postponed_number (state, field);
+    if (r)
+	return r;
+
+    unset_delim (state);
+
+    state->tm[field] += value;
+    state->last_field = field;
+
+    return 0;
+}
+
+/*
+ * Get field value. Make sure the field is set before query. It's most
+ * likely an error to call this while parsing (for example fields set
+ * as FIELD_NOW will only be set to some value after parsing).
+ */
+static int
+get_field (struct state *state, enum field field)
+{
+    assert (field < ARRAY_SIZE (state->tm));
+
+    return state->tm[field];
+}
+
+/*
+ * Validity checkers.
+ */
+static bool is_valid_12hour (int h)
+{
+    return h >= 0 && h <= 12;
+}
+
+static bool is_valid_time (int h, int m, int s)
+{
+    /* Allow 24:00:00 to denote end of day. */
+    if (h == 24 && m == 0 && s == 0)
+	return true;
+
+    return h >= 0 && h <= 23 && m >= 0 && m <= 59 && s >= 0 && s <= 59;
+}
+
+static bool is_valid_mday (int mday)
+{
+    return mday >= 1 && mday <= 31;
+}
+
+static bool is_valid_mon (int mon)
+{
+    return mon >= 1 && mon <= 12;
+}
+
+static bool is_valid_year (int year)
+{
+    return year >= 1970;
+}
+
+static bool is_valid_date (int year, int mon, int mday)
+{
+    return is_valid_year (year) && is_valid_mon (mon) && is_valid_mday (mday);
+}
+
+/* Unset indicator for time and date set helpers. */
+#define UNSET -1
+
+/* Time set helper. No input checking. Use UNSET (-1) to leave unset. */
+static int
+set_abs_time (struct state *state, int hour, int min, int sec)
+{
+    int r;
+
+    if (hour != UNSET) {
+	if ((r = set_field (state, TM_ABS_HOUR, hour)))
+	    return r;
+    }
+
+    if (min != UNSET) {
+	if ((r = set_field (state, TM_ABS_MIN, min)))
+	    return r;
+    }
+
+    if (sec != UNSET) {
+	if ((r = set_field (state, TM_ABS_SEC, sec)))
+	    return r;
+    }
+
+    return 0;
+}
+
+/* Date set helper. No input checking. Use UNSET (-1) to leave unset. */
+static int
+set_abs_date (struct state *state, int year, int mon, int mday)
+{
+    int r;
+
+    if (year != UNSET) {
+	if ((r = set_field (state, TM_ABS_YEAR, year)))
+	    return r;
+    }
+
+    if (mon != UNSET) {
+	if ((r = set_field (state, TM_ABS_MON, mon)))
+	    return r;
+    }
+
+    if (mday != UNSET) {
+	if ((r = set_field (state, TM_ABS_MDAY, mday)))
+	    return r;
+    }
+
+    return 0;
+}
+
+/*
+ * Keyword parsing and handling.
+ */
+struct keyword;
+typedef int (*setter_t)(struct state *state, struct keyword *kw);
+
+struct keyword {
+    const char *name;	/* keyword */
+    enum field field;	/* field to set, or FIELD_NONE if N/A */
+    int value;		/* value to set, or 0 if N/A */
+    setter_t set;	/* function to use for setting, if non-NULL */
+};
+
+/*
+ * Setter callback functions for keywords.
+ */
+static int
+kw_set_default (struct state *state, struct keyword *kw)
+{
+    return set_field (state, kw->field, kw->value);
+}
+
+static int
+kw_set_rel (struct state *state, struct keyword *kw)
+{
+    int multiplier = 1;
+
+    /* Get a previously set multiplier, if any. */
+    get_postponed_number (state, &multiplier, NULL, NULL);
+
+    /* Accumulate relative field values. */
+    return mod_field (state, kw->field, multiplier * kw->value);
+}
+
+static int
+kw_set_number (struct state *state, struct keyword *kw)
+{
+    /* -1 = no length, from keyword. */
+    return set_postponed_number (state, kw->value, -1);
+}
+
+static int
+kw_set_month (struct state *state, struct keyword *kw)
+{
+    int n = get_postponed_length (state);
+
+    /* Consume postponed number if it could be mday. This handles "20
+     * January". */
+    if (n == 1 || n == 2) {
+	int r, v;
+
+	get_postponed_number (state, &v, NULL, NULL);
+
+	if (!is_valid_mday (v))
+	    return -PARSE_TIME_ERR_INVALIDDATE;
+
+	r = set_field (state, TM_ABS_MDAY, v);
+	if (r)
+	    return r;
+    }
+
+    return set_field (state, kw->field, kw->value);
+}
+
+static int
+kw_set_ampm (struct state *state, struct keyword *kw)
+{
+    int n = get_postponed_length (state);
+
+    /* Consume postponed number if it could be hour. This handles
+     * "5pm". */
+    if (n == 1 || n == 2) {
+	int r, v;
+
+	get_postponed_number (state, &v, NULL, NULL);
+
+	if (!is_valid_12hour (v))
+	    return -PARSE_TIME_ERR_INVALIDTIME;
+
+	r = set_abs_time (state, v, 0, 0);
+	if (r)
+	    return r;
+    }
+
+    return set_field (state, kw->field, kw->value);
+}
+
+static int
+kw_set_timeofday (struct state *state, struct keyword *kw)
+{
+    return set_abs_time (state, kw->value, 0, 0);
+}
+
+static int
+kw_set_today (struct state *state, unused (struct keyword *kw))
+{
+    enum field fields[] = { TM_ABS_YEAR, TM_ABS_MON, TM_ABS_MDAY };
+
+    return set_fields_to_now (state, fields, ARRAY_SIZE (fields));
+}
+
+static int
+kw_set_now (struct state *state, unused (struct keyword *kw))
+{
+    enum field fields[] = { TM_ABS_HOUR, TM_ABS_MIN, TM_ABS_SEC };
+
+    return set_fields_to_now (state, fields, ARRAY_SIZE (fields));
+}
+
+static int
+kw_set_ordinal (struct state *state, struct keyword *kw)
+{
+    int n, v;
+
+    /* Require a postponed number. */
+    if (!get_postponed_number (state, &v, &n, NULL))
+	return -PARSE_TIME_ERR_DATEFORMAT;
+
+    /* Ordinals are mday. */
+    if (n != 1 && n != 2)
+	return -PARSE_TIME_ERR_DATEFORMAT;
+
+    /* Be strict about st, nd, rd, and lax about th. */
+    if (strcasecmp (kw->name, "st") == 0 && v != 1 && v != 21 && v != 31)
+	return -PARSE_TIME_ERR_INVALIDDATE;
+    else if (strcasecmp (kw->name, "nd") == 0 && v != 2 && v != 22)
+	return -PARSE_TIME_ERR_INVALIDDATE;
+    else if (strcasecmp (kw->name, "rd") == 0 && v != 3 && v != 23)
+	return -PARSE_TIME_ERR_INVALIDDATE;
+    else if (strcasecmp (kw->name, "th") == 0 && !is_valid_mday (v))
+	return -PARSE_TIME_ERR_INVALIDDATE;
+
+    return set_field (state, TM_ABS_MDAY, v);
+}
+
+/*
+ * Accepted keywords.
+ *
+ * A keyword may optionally contain a '|' to indicate the minimum
+ * match length. Without one, full match is required. It's advisable
+ * to keep the minimum match parts unique across all keywords.
+ *
+ * If keyword begins with upper case letter, then the matching will be
+ * case sensitive. Otherwise the matching is case insensitive.
+ *
+ * If setter is NULL, set_default will be used.
+ *
+ * Note: Order matters. Matching is greedy, longest match is used, but
+ * of equal length matches the first one is used, unless there's an
+ * equal length case sensitive match which trumps case insensitive
+ * matches.
+ */
+static struct keyword keywords[] = {
+    /* Weekdays. */
+    { N_("sun|day"),	TM_ABS_WDAY,	0,	NULL },
+    { N_("mon|day"),	TM_ABS_WDAY,	1,	NULL },
+    { N_("tue|sday"),	TM_ABS_WDAY,	2,	NULL },
+    { N_("wed|nesday"),	TM_ABS_WDAY,	3,	NULL },
+    { N_("thu|rsday"),	TM_ABS_WDAY,	4,	NULL },
+    { N_("fri|day"),	TM_ABS_WDAY,	5,	NULL },
+    { N_("sat|urday"),	TM_ABS_WDAY,	6,	NULL },
+
+    /* Months. */
+    { N_("jan|uary"),	TM_ABS_MON,	1,	kw_set_month },
+    { N_("feb|ruary"),	TM_ABS_MON,	2,	kw_set_month },
+    { N_("mar|ch"),	TM_ABS_MON,	3,	kw_set_month },
+    { N_("apr|il"),	TM_ABS_MON,	4,	kw_set_month },
+    { N_("may"),	TM_ABS_MON,	5,	kw_set_month },
+    { N_("jun|e"),	TM_ABS_MON,	6,	kw_set_month },
+    { N_("jul|y"),	TM_ABS_MON,	7,	kw_set_month },
+    { N_("aug|ust"),	TM_ABS_MON,	8,	kw_set_month },
+    { N_("sep|tember"),	TM_ABS_MON,	9,	kw_set_month },
+    { N_("oct|ober"),	TM_ABS_MON,	10,	kw_set_month },
+    { N_("nov|ember"),	TM_ABS_MON,	11,	kw_set_month },
+    { N_("dec|ember"),	TM_ABS_MON,	12,	kw_set_month },
+
+    /* Durations. */
+    { N_("y|ears"),	TM_REL_YEAR,	1,	kw_set_rel },
+    { N_("w|eeks"),	TM_REL_WEEK,	1,	kw_set_rel },
+    { N_("d|ays"),	TM_REL_DAY,	1,	kw_set_rel },
+    { N_("h|ours"),	TM_REL_HOUR,	1,	kw_set_rel },
+    { N_("hr|s"),	TM_REL_HOUR,	1,	kw_set_rel },
+    { N_("m|inutes"),	TM_REL_MIN,	1,	kw_set_rel },
+    /* M=months, m=minutes */
+    { N_("M"),		TM_REL_MON,	1,	kw_set_rel },
+    { N_("mins"),	TM_REL_MIN,	1,	kw_set_rel },
+    { N_("mo|nths"),	TM_REL_MON,	1,	kw_set_rel },
+    { N_("s|econds"),	TM_REL_SEC,	1,	kw_set_rel },
+    { N_("secs"),	TM_REL_SEC,	1,	kw_set_rel },
+
+    /* Numbers. */
+    { N_("one"),	TM_NONE,	1,	kw_set_number },
+    { N_("two"),	TM_NONE,	2,	kw_set_number },
+    { N_("three"),	TM_NONE,	3,	kw_set_number },
+    { N_("four"),	TM_NONE,	4,	kw_set_number },
+    { N_("five"),	TM_NONE,	5,	kw_set_number },
+    { N_("six"),	TM_NONE,	6,	kw_set_number },
+    { N_("seven"),	TM_NONE,	7,	kw_set_number },
+    { N_("eight"),	TM_NONE,	8,	kw_set_number },
+    { N_("nine"),	TM_NONE,	9,	kw_set_number },
+    { N_("ten"),	TM_NONE,	10,	kw_set_number },
+    { N_("dozen"),	TM_NONE,	12,	kw_set_number },
+    { N_("hundred"),	TM_NONE,	100,	kw_set_number },
+
+    /* Special number forms. */
+    { N_("this"),	TM_NONE,	0,	kw_set_number },
+    { N_("last"),	TM_NONE,	1,	kw_set_number },
+
+    /* Other special keywords. */
+    { N_("yesterday"),	TM_REL_DAY,	1,	kw_set_rel },
+    { N_("today"),	TM_NONE,	0,	kw_set_today },
+    { N_("now"),	TM_NONE,	0,	kw_set_now },
+    { N_("noon"),	TM_NONE,	12,	kw_set_timeofday },
+    { N_("midnight"),	TM_NONE,	0,	kw_set_timeofday },
+    { N_("am"),		TM_AMPM,	0,	kw_set_ampm },
+    { N_("a.m."),	TM_AMPM,	0,	kw_set_ampm },
+    { N_("pm"),		TM_AMPM,	1,	kw_set_ampm },
+    { N_("p.m."),	TM_AMPM,	1,	kw_set_ampm },
+    { N_("st"),		TM_NONE,	0,	kw_set_ordinal },
+    { N_("nd"),		TM_NONE,	0,	kw_set_ordinal },
+    { N_("rd"),		TM_NONE,	0,	kw_set_ordinal },
+    { N_("th"),		TM_NONE,	0,	kw_set_ordinal },
+
+    /* Timezone codes: offset in minutes. XXX: Add more codes. */
+    { N_("pst"),	TM_TZ,		-8*60,	NULL },
+    { N_("mst"),	TM_TZ,		-7*60,	NULL },
+    { N_("cst"),	TM_TZ,		-6*60,	NULL },
+    { N_("est"),	TM_TZ,		-5*60,	NULL },
+    { N_("ast"),	TM_TZ,		-4*60,	NULL },
+    { N_("nst"),	TM_TZ,		-(3*60+30),	NULL },
+
+    { N_("gmt"),	TM_TZ,		0,	NULL },
+    { N_("utc"),	TM_TZ,		0,	NULL },
+
+    { N_("wet"),	TM_TZ,		0,	NULL },
+    { N_("cet"),	TM_TZ,		1*60,	NULL },
+    { N_("eet"),	TM_TZ,		2*60,	NULL },
+    { N_("fet"),	TM_TZ,		3*60,	NULL },
+
+    { N_("wat"),	TM_TZ,		1*60,	NULL },
+    { N_("cat"),	TM_TZ,		2*60,	NULL },
+    { N_("eat"),	TM_TZ,		3*60,	NULL },
+};
+
+/*
+ * Compare strings s and keyword. Return number of matching chars on
+ * match, 0 for no match. Match must be at least n chars, or all of
+ * keyword if n < 0, otherwise it's not a match. Use match_case for
+ * case sensitive matching.
+ */
+static size_t
+match_keyword (const char *s, const char *keyword, ssize_t n, bool match_case)
+{
+    ssize_t i;
+
+    if (!n)
+	return 0;
+
+    for (i = 0; *s && *keyword; i++, s++, keyword++) {
+	if (match_case) {
+	    if (*s != *keyword)
+		break;
+	} else {
+	    if (tolower ((unsigned char) *s) !=
+		tolower ((unsigned char) *keyword))
+		break;
+	}
+    }
+
+    if (n > 0)
+	return i < n ? 0 : i;
+    else
+	return *keyword ? 0 : i;
+}
+
+/*
+ * Parse a keyword. Return < 0 on error, number of parsed chars on
+ * success.
+ */
+static ssize_t
+parse_keyword (struct state *state, const char *s)
+{
+    unsigned int i;
+    size_t n, max_n = 0;
+    struct keyword *kw = NULL;
+    int r;
+
+    /* Match longest keyword */
+    for (i = 0; i < ARRAY_SIZE (keywords); i++) {
+	/* Match case if keyword begins with upper case letter. */
+	bool mcase = isupper ((unsigned char) keywords[i].name[0]);
+	ssize_t minlen = -1;
+	char keyword[128];
+	char *p;
+
+	strncpy (keyword, _(keywords[i].name), sizeof (keyword));
+
+	/* Truncate too long keywords. XXX: Make this dynamic? */
+	keyword[sizeof (keyword) - 1] = '\0';
+
+	/* Minimum match length. */
+	p = strchr (keyword, '|');
+	if (p) {
+	    minlen = p - keyword;
+
+	    /* Remove the minimum match length separator. */
+	    memmove (p, p + 1, strlen (p + 1) + 1);
+	}
+
+	n = match_keyword (s, keyword, minlen, mcase);
+	if (n > max_n || (n == max_n && mcase)) {
+	    max_n = n;
+	    kw = &keywords[i];
+	}
+    }
+
+    if (!kw)
+	return -PARSE_TIME_ERR_KEYWORD;
+
+    if (kw->set)
+	r = kw->set (state, kw);
+    else
+	r = kw_set_default (state, kw);
+
+    if (r < 0)
+	return r;
+
+    return max_n;
+}
+
+/*
+ * Non-keyword parsers and their helpers.
+ */
+
+static int
+set_user_tz (struct state *state, char sign, int hour, int min)
+{
+    int tz = hour * 60 + min;
+
+    assert (sign == '+' || sign == '-');
+
+    if (hour < 0 || hour > 14 || min < 0 || min > 59 || min % 15)
+	return -PARSE_TIME_ERR_INVALIDTIME;
+
+    if (sign == '-')
+	tz = -tz;
+
+    return set_field (state, TM_TZ, tz);
+}
+
+/*
+ * Parse a previously postponed number if one exists. Independent
+ * parsing of a postponed number when it wasn't consumed during
+ * parsing of the following token.
+ */
+static int
+parse_postponed_number (struct state *state, unused (enum field next_field))
+{
+    int v, n;
+    char d;
+
+    /* Bail out if there's no postponed number. */
+    if (!get_postponed_number (state, &v, &n, &d))
+	return 0;
+
+    if (n == 1 || n == 2) {
+	/* Notable exception: Previous field affects parsing. This
+	 * handles "January 20". */
+	if (state->last_field == TM_ABS_MON) {
+	    /* D[D] */
+	    if (!is_valid_mday (v))
+		return -PARSE_TIME_ERR_INVALIDDATE;
+
+	    return set_field (state, TM_ABS_MDAY, v);
+	} else if (n == 2) {
+	    /* XXX: Only allow if last field is hour, min, or sec? */
+	    if (d == '+' || d == '-') {
+		/* +/-HH */
+		return set_user_tz (state, d, v, 0);
+	    }
+	}
+    } else if (n == 4) {
+	/* Notable exception: Value affects parsing. Time zones are
+	 * always at most 1400 and we don't understand years before
+	 * 1970. */
+	if (!is_valid_year (v)) {
+	    if (d == '+' || d == '-') {
+		/* +/-HHMM */
+		return set_user_tz (state, d, v / 100, v % 100);
+	    }
+	} else {
+	    /* YYYY */
+	    return set_field (state, TM_ABS_YEAR, v);
+	}
+    } else if (n == 6) {
+	/* HHMMSS */
+	int hour = v / 10000;
+	int min = (v / 100) % 100;
+	int sec = v % 100;
+
+	if (!is_valid_time (hour, min, sec))
+	    return -PARSE_TIME_ERR_INVALIDTIME;
+
+	return set_abs_time (state, hour, min, sec);
+    } else if (n == 8) {
+	/* YYYYMMDD */
+	int year = v / 10000;
+	int mon = (v / 100) % 100;
+	int mday = v % 100;
+
+	if (!is_valid_date (year, mon, mday))
+	    return -PARSE_TIME_ERR_INVALIDDATE;
+
+	return set_abs_date (state, year, mon, mday);
+    } else {
+	return -PARSE_TIME_ERR_FORMAT;
+    }
+
+    return -PARSE_TIME_ERR_FORMAT;
+}
+
+static int tm_get_field (const struct tm *tm, enum field field);
+
+static int
+set_timestamp (struct state *state, time_t t)
+{
+    struct tm tm;
+    enum field f;
+    int r;
+
+    if (gmtime_r (&t, &tm) == NULL)
+	return -PARSE_TIME_ERR_LIB;
+
+    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
+	r = set_field (state, f, tm_get_field (&tm, f));
+	if (r)
+	    return r;
+    }
+
+    r = set_field (state, TM_TZ, 0);
+    if (r)
+	return r;
+
+    /* XXX: Prevent TM_AMPM with timestamp, e.g. "@123456 pm" */
+
+    return 0;
+}
+
+/* Parse a single number. Typically postpone parsing until later. */
+static int
+parse_single_number (struct state *state, unsigned long v,
+		     unsigned long n)
+{
+    assert (n);
+
+    if (state->delim == '@')
+	return set_timestamp (state, (time_t) v);
+
+    if (v > INT_MAX)
+	return -PARSE_TIME_ERR_FORMAT;
+
+    return set_postponed_number (state, v, n);
+}
+
+static bool
+is_time_sep (char c)
+{
+    return c == ':';
+}
+
+static bool
+is_date_sep (char c)
+{
+    return c == '/' || c == '-' || c == '.';
+}
+
+static bool
+is_sep (char c)
+{
+    return is_time_sep (c) || is_date_sep (c);
+}
+
+/* Two-digit year: 00...69 is 2000s, 70...99 1900s, if n == 0 keep
+ * unset. */
+static int
+expand_year (unsigned long year, size_t n)
+{
+    if (n == 2) {
+	return (year < 70 ? 2000 : 1900) + year;
+    } else if (n == 4) {
+	return year;
+    } else {
+	return UNSET;
+    }
+}
+
+/* Parse a date number triplet. */
+static int
+parse_date (struct state *state, char sep,
+	    unsigned long v1, unsigned long v2, unsigned long v3,
+	    size_t n1, size_t n2, size_t n3)
+{
+    int year = UNSET, mon = UNSET, mday = UNSET;
+
+    assert (is_date_sep (sep));
+
+    switch (sep) {
+    case '/': /* Date: M[M]/D[D][/YY[YY]] or M[M]/YYYY */
+	if (n1 != 1 && n1 != 2)
+	    return -PARSE_TIME_ERR_DATEFORMAT;
+
+	if ((n2 == 1 || n2 == 2) && (n3 == 0 || n3 == 2 || n3 == 4)) {
+	    /* M[M]/D[D][/YY[YY]] */
+	    year = expand_year (v3, n3);
+	    mon = v1;
+	    mday = v2;
+	} else if (n2 == 4 && n3 == 0) {
+	    /* M[M]/YYYY */
+	    year = v2;
+	    mon = v1;
+	} else {
+	    return -PARSE_TIME_ERR_DATEFORMAT;
+	}
+	break;
+
+    case '-': /* Date: YYYY-MM[-DD] or DD-MM[-YY[YY]] or MM-YYYY */
+	if (n1 == 4 && n2 == 2 && (n3 == 0 || n3 == 2)) {
+	    /* YYYY-MM[-DD] */
+	    year = v1;
+	    mon = v2;
+	    if (n3)
+		mday = v3;
+	} else if (n1 == 2 && n2 == 2 && (n3 == 0 || n3 == 2 || n3 == 4)) {
+	    /* DD-MM[-YY[YY]] */
+	    year = expand_year (v3, n3);
+	    mon = v2;
+	    mday = v1;
+	} else if (n1 == 2 && n2 == 4 && n3 == 0) {
+	    /* MM-YYYY */
+	    year = v2;
+	    mon = v1;
+	} else {
+	    return -PARSE_TIME_ERR_DATEFORMAT;
+	}
+	break;
+
+    case '.': /* Date: D[D].M[M][.[YY[YY]]] */
+	if ((n1 != 1 && n1 != 2) || (n2 != 1 && n2 != 2) ||
+	    (n3 != 0 && n3 != 2 && n3 != 4))
+	    return -PARSE_TIME_ERR_DATEFORMAT;
+
+	year = expand_year (v3, n3);
+	mon = v2;
+	mday = v1;
+	break;
+    }
+
+    if (year != UNSET && !is_valid_year (year))
+	return -PARSE_TIME_ERR_INVALIDDATE;
+
+    if (mon != UNSET && !is_valid_mon (mon))
+	return -PARSE_TIME_ERR_INVALIDDATE;
+
+    if (mday != UNSET && !is_valid_mday (mday))
+	return -PARSE_TIME_ERR_INVALIDDATE;
+
+    return set_abs_date (state, year, mon, mday);
+}
+
+/* Parse a time number triplet. */
+static int
+parse_time (struct state *state, char sep,
+	    unsigned long v1, unsigned long v2, unsigned long v3,
+	    size_t n1, size_t n2, size_t n3)
+{
+    assert (is_time_sep (sep));
+
+    if ((n1 != 1 && n1 != 2) || n2 != 2 || (n3 != 0 && n3 != 2))
+	return -PARSE_TIME_ERR_TIMEFORMAT;
+
+    /*
+     * Notable exception: Previously set fields affect
+     * parsing. Interpret (+|-)HH:MM as time zone only if hour and
+     * minute have been set.
+     *
+     * XXX: This could be fixed by restricting the delimiters
+     * preceding time. For '+' it would be justified, but for '-' it
+     * might be inconvenient. However prefer to allow '-' as an
+     * insignificant delimiter preceding time for convenience, and
+     * handle '+' the same way for consistency between positive and
+     * negative time zones.
+     */
+    if (is_field_set (state, TM_ABS_HOUR) &&
+	is_field_set (state, TM_ABS_MIN) &&
+	n1 == 2 && n2 == 2 && n3 == 0 &&
+	(state->delim == '+' || state->delim == '-')) {
+	return set_user_tz (state, state->delim, v1, v2);
+    }
+
+    if (!is_valid_time (v1, v2, v3))
+	return -PARSE_TIME_ERR_INVALIDTIME;
+
+    return set_abs_time (state, v1, v2, n3 ? v3 : 0);
+}
+
+/* strtoul helper that assigns length. */
+static unsigned long
+strtoul_len (const char *s, const char **endp, size_t *len)
+{
+    unsigned long val = strtoul (s, (char **) endp, 10);
+
+    *len = *endp - s;
+    return val;
+}
+
+/*
+ * Parse a (group of) number(s). Return < 0 on error, number of parsed
+ * chars on success.
+ */
+static ssize_t
+parse_number (struct state *state, const char *s)
+{
+    int r;
+    unsigned long v1, v2, v3 = 0;
+    size_t n1, n2, n3 = 0;
+    const char *p = s;
+    char sep;
+
+    v1 = strtoul_len (p, &p, &n1);
+
+    if (is_sep (*p) && isdigit ((unsigned char) *(p + 1))) {
+	sep = *p;
+	v2 = strtoul_len (p + 1, &p, &n2);
+    } else {
+	/* A single number. */
+	r = parse_single_number (state, v1, n1);
+	if (r)
+	    return r;
+
+	return p - s;
+    }
+
+    /* A group of two or three numbers? */
+    if (*p == sep && isdigit ((unsigned char) *(p + 1)))
+	v3 = strtoul_len (p + 1, &p, &n3);
+
+    if (is_time_sep (sep))
+	r = parse_time (state, sep, v1, v2, v3, n1, n2, n3);
+    else
+	r = parse_date (state, sep, v1, v2, v3, n1, n2, n3);
+
+    if (r)
+	return r;
+
+    return p - s;
+}
+
+/*
+ * Parse delimiter(s). Throw away all except the last one, which is
+ * stored for parsing the next non-delimiter. Return < 0 on error,
+ * number of parsed chars on success.
+ *
+ * XXX: We might want to be more strict here.
+ */
+static ssize_t
+parse_delim (struct state *state, const char *s)
+{
+    const char *p = s;
+
+    /*
+     * Skip non-alpha and non-digit, and store the last for further
+     * processing.
+     */
+    while (*p && !isalnum ((unsigned char) *p)) {
+	set_delim (state, *p);
+	p++;
+    }
+
+    return p - s;
+}
+
+/*
+ * Parse a date/time string. Return < 0 on error, number of parsed
+ * chars on success.
+ */
+static ssize_t
+parse_input (struct state *state, const char *s)
+{
+    const char *p = s;
+    ssize_t n;
+    int r;
+
+    while (*p) {
+	if (isalpha ((unsigned char) *p)) {
+	    n = parse_keyword (state, p);
+	} else if (isdigit ((unsigned char) *p)) {
+	    n = parse_number (state, p);
+	} else {
+	    n = parse_delim (state, p);
+	}
+
+	if (n <= 0) {
+	    if (n == 0)
+		n = -PARSE_TIME_ERR;
+
+	    return n;
+	}
+
+	p += n;
+    }
+
+    /* Parse a previously postponed number, if any. */
+    r = parse_postponed_number (state, TM_NONE);
+    if (r < 0)
+	return r;
+
+    return p - s;
+}
+
+/*
+ * Processing the parsed input.
+ */
+
+/*
+ * Initialize reference time to tm. Use time zone in state if
+ * specified, otherwise local time. Use now for reference time if
+ * non-NULL, otherwise current time.
+ */
+static int
+initialize_now (struct state *state, struct tm *tm, const time_t *now)
+{
+    time_t t;
+
+    if (now) {
+	t = *now;
+    } else {
+	if (time (&t) == (time_t) -1)
+	    return -PARSE_TIME_ERR_LIB;
+    }
+
+    if (is_field_set (state, TM_TZ)) {
+	/* Some other time zone. */
+
+	/* Adjust now according to the TZ. */
+	t += get_field (state, TM_TZ) * 60;
+
+	/* It's not gm, but this doesn't mess with the TZ. */
+	if (gmtime_r (&t, tm) == NULL)
+	    return -PARSE_TIME_ERR_LIB;
+    } else {
+	/* Local time. */
+	if (localtime_r (&t, tm) == NULL)
+	    return -PARSE_TIME_ERR_LIB;
+    }
+
+    return 0;
+}
+
+/*
+ * Normalize tm according to mktime(3). Both mktime(3) and
+ * localtime_r(3) use local time, but they cancel each other out here,
+ * making this function agnostic to time zone.
+ */
+static int
+normalize_tm (struct tm *tm)
+{
+    time_t t = mktime (tm);
+
+    if (t == (time_t) -1)
+	return -PARSE_TIME_ERR_LIB;
+
+    if (!localtime_r (&t, tm))
+	return -PARSE_TIME_ERR_LIB;
+
+    return 0;
+}
+
+/* Get field out of a struct tm. */
+static int
+tm_get_field (const struct tm *tm, enum field field)
+{
+    switch (field) {
+    case TM_ABS_SEC:	return tm->tm_sec;
+    case TM_ABS_MIN:	return tm->tm_min;
+    case TM_ABS_HOUR:	return tm->tm_hour;
+    case TM_ABS_MDAY:	return tm->tm_mday;
+    case TM_ABS_MON:	return tm->tm_mon + 1; /* 0- to 1-based */
+    case TM_ABS_YEAR:	return 1900 + tm->tm_year;
+    case TM_ABS_WDAY:	return tm->tm_wday;
+    case TM_ABS_ISDST:	return tm->tm_isdst;
+    default:
+	assert (false);
+	break;
+    }
+
+    return 0;
+}
+
+/* Modify hour according to am/pm setting. */
+static int
+fixup_ampm (struct state *state)
+{
+    int hour, hdiff = 0;
+
+    if (!is_field_set (state, TM_AMPM))
+	return 0;
+
+    if (!is_field_set (state, TM_ABS_HOUR))
+	return -PARSE_TIME_ERR_TIMEFORMAT;
+
+    hour = get_field (state, TM_ABS_HOUR);
+    if (!is_valid_12hour (hour))
+	return -PARSE_TIME_ERR_INVALIDTIME;
+
+    if (get_field (state, TM_AMPM)) {
+	/* 12pm is noon. */
+	if (hour != 12)
+	    hdiff = 12;
+    } else {
+	/* 12am is midnight, beginning of day. */
+	if (hour == 12)
+	    hdiff = -12;
+    }
+
+    mod_field (state, TM_REL_HOUR, -hdiff);
+
+    return 0;
+}
+
+/* Combine absolute and relative fields, and round. */
+static int
+create_output (struct state *state, time_t *t_out, const time_t *ref,
+	       int round)
+{
+    struct tm tm = { .tm_isdst = -1 };
+    struct tm now;
+    time_t t;
+    enum field f;
+    int r;
+    int week_round = PARSE_TIME_NO_ROUND;
+
+    r = initialize_now (state, &now, ref);
+    if (r)
+	return r;
+
+    /* Initialize fields flagged as "now" to reference time. */
+    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
+	if (state->set[f] == FIELD_NOW) {
+	    state->tm[f] = tm_get_field (&now, f);
+	    state->set[f] = FIELD_SET;
+	}
+    }
+
+    /*
+     * If WDAY is set but MDAY is not, we consider WDAY relative
+     *
+     * XXX: This fails on stuff like "two months monday" because two
+     * months ago wasn't the same day as today. Postpone until we know
+     * date?
+     */
+    if (is_field_set (state, TM_ABS_WDAY) &&
+	!is_field_set (state, TM_ABS_MDAY)) {
+	int wday = get_field (state, TM_ABS_WDAY);
+	int today = tm_get_field (&now, TM_ABS_WDAY);
+	int rel_days;
+
+	if (today > wday)
+	    rel_days = today - wday;
+	else
+	    rel_days = today + 7 - wday;
+
+	/* This also prevents special week rounding from happening. */
+	mod_field (state, TM_REL_DAY, rel_days);
+
+	unset_field (state, TM_ABS_WDAY);
+    }
+
+    r = fixup_ampm (state);
+    if (r)
+	return r;
+
+    /*
+     * Iterate fields from most accurate to least accurate, and set
+     * unset fields according to requested rounding.
+     */
+    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
+	if (round != PARSE_TIME_NO_ROUND) {
+	    enum field r = abs_to_rel_field (f);
+
+	    if (is_field_set (state, f) || is_field_set (state, r)) {
+		if (round >= PARSE_TIME_ROUND_UP && f != TM_ABS_SEC) {
+		    mod_field (state, r, -1);
+		    if (round == PARSE_TIME_ROUND_UP_INCLUSIVE)
+			mod_field (state, TM_REL_SEC, 1);
+		}
+		round = PARSE_TIME_NO_ROUND; /* No more rounding. */
+	    } else {
+		if (f == TM_ABS_MDAY &&
+		    is_field_set (state, TM_REL_WEEK)) {
+		    /* Week is most accurate. */
+		    week_round = round;
+		    round = PARSE_TIME_NO_ROUND;
+		} else {
+		    set_field (state, f, field_epoch (f));
+		}
+	    }
+	}
+
+	if (!is_field_set (state, f))
+	    set_field (state, f, tm_get_field (&now, f));
+    }
+
+    /* Special case: rounding with week accuracy. */
+    if (week_round != PARSE_TIME_NO_ROUND) {
+	/* Temporarily set more accurate fields to now. */
+	set_field (state, TM_ABS_SEC, tm_get_field (&now, TM_ABS_SEC));
+	set_field (state, TM_ABS_MIN, tm_get_field (&now, TM_ABS_MIN));
+	set_field (state, TM_ABS_HOUR, tm_get_field (&now, TM_ABS_HOUR));
+	set_field (state, TM_ABS_MDAY, tm_get_field (&now, TM_ABS_MDAY));
+    }
+
+    /*
+     * Set all fields. They may contain out of range values before
+     * normalization by mktime(3).
+     */
+    tm.tm_sec = get_field (state, TM_ABS_SEC) - get_field (state, TM_REL_SEC);
+    tm.tm_min = get_field (state, TM_ABS_MIN) - get_field (state, TM_REL_MIN);
+    tm.tm_hour = get_field (state, TM_ABS_HOUR) - get_field (state, TM_REL_HOUR);
+    tm.tm_mday = get_field (state, TM_ABS_MDAY) -
+		 get_field (state, TM_REL_DAY) - 7 * get_field (state, TM_REL_WEEK);
+    tm.tm_mon = get_field (state, TM_ABS_MON) - get_field (state, TM_REL_MON);
+    tm.tm_mon--; /* 1- to 0-based */
+    tm.tm_year = get_field (state, TM_ABS_YEAR) - get_field (state, TM_REL_YEAR) - 1900;
+
+    /*
+     * It's always normal time.
+     *
+     * XXX: This is probably not a solution that universally
+     * works. Just make sure DST is not taken into account. We don't
+     * want rounding to be affected by DST.
+     */
+    tm.tm_isdst = -1;
+
+    /* Special case: rounding with week accuracy. */
+    if (week_round != PARSE_TIME_NO_ROUND) {
+	/* Normalize to get proper tm.wday. */
+	r = normalize_tm (&tm);
+	if (r < 0)
+	    return r;
+
+	/* Set more accurate fields back to zero. */
+	tm.tm_sec = 0;
+	tm.tm_min = 0;
+	tm.tm_hour = 0;
+	tm.tm_isdst = -1;
+
+	/* Monday is the true 1st day of week, but this is easier. */
+	if (week_round >= PARSE_TIME_ROUND_UP) {
+	    tm.tm_mday += 7 - tm.tm_wday;
+	    if (week_round == PARSE_TIME_ROUND_UP_INCLUSIVE)
+		tm.tm_sec--;
+	} else {
+	    tm.tm_mday -= tm.tm_wday;
+	}
+    }
+
+    if (is_field_set (state, TM_TZ)) {
+	/* tm is in specified TZ, convert to UTC for timegm(3). */
+	tm.tm_min -= get_field (state, TM_TZ);
+	t = timegm (&tm);
+    } else {
+	/* tm is in local time. */
+	t = mktime (&tm);
+    }
+
+    if (t == (time_t) -1)
+	return -PARSE_TIME_ERR_LIB;
+
+    *t_out = t;
+
+    return 0;
+}
+
+/* Internally, all errors are < 0. parse_time_string() returns errors > 0. */
+#define EXTERNAL_ERR(r) (-r)
+
+int
+parse_time_string (const char *s, time_t *t, const time_t *ref, int round)
+{
+    struct state state = { .last_field = TM_NONE };
+    int r;
+
+    if (!s || !t)
+	return EXTERNAL_ERR (-PARSE_TIME_ERR);
+
+    r = parse_input (&state, s);
+    if (r < 0)
+	return EXTERNAL_ERR (r);
+
+    r = create_output (&state, t, ref, round);
+    if (r < 0)
+	return EXTERNAL_ERR (r);
+
+    return 0;
+}
diff --git a/parse-time-string/parse-time-string.h b/parse-time-string/parse-time-string.h
new file mode 100644
index 0000000..bfa4ee3
--- /dev/null
+++ b/parse-time-string/parse-time-string.h
@@ -0,0 +1,102 @@
+/*
+ * parse time string - user friendly date and time parser
+ * Copyright © 2012 Jani Nikula
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Jani Nikula <jani@nikula.org>
+ */
+
+#ifndef PARSE_TIME_STRING_H
+#define PARSE_TIME_STRING_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <time.h>
+
+/* return values for parse_time_string() */
+enum {
+    PARSE_TIME_OK = 0,
+    PARSE_TIME_ERR,		/* unspecified error */
+    PARSE_TIME_ERR_LIB,		/* library call failed */
+    PARSE_TIME_ERR_ALREADYSET,	/* attempt to set unit twice */
+    PARSE_TIME_ERR_FORMAT,	/* generic date/time format error */
+    PARSE_TIME_ERR_DATEFORMAT,	/* date format error */
+    PARSE_TIME_ERR_TIMEFORMAT,	/* time format error */
+    PARSE_TIME_ERR_INVALIDDATE,	/* date value error */
+    PARSE_TIME_ERR_INVALIDTIME,	/* time value error */
+    PARSE_TIME_ERR_KEYWORD,	/* unknown keyword */
+};
+
+/* round values for parse_time_string() */
+enum {
+    PARSE_TIME_ROUND_DOWN = -1,
+    PARSE_TIME_NO_ROUND = 0,
+    PARSE_TIME_ROUND_UP = 1,
+    PARSE_TIME_ROUND_UP_INCLUSIVE = 2,
+};
+
+/**
+ * parse_time_string() - user friendly date and time parser
+ * @s:		string to parse
+ * @t:		pointer to time_t to store parsed time in
+ * @ref:	pointer to time_t containing reference date/time, or NULL
+ * @round:	PARSE_TIME_NO_ROUND, PARSE_TIME_ROUND_DOWN, or
+ *		PARSE_TIME_ROUND_UP
+ *
+ * Parse a date/time string 's' and store the parsed date/time result
+ * in 't'.
+ *
+ * A reference date/time is used for determining the "date/time units"
+ * (roughly equivalent to struct tm members) not specified by 's'. If
+ * 'ref' is non-NULL, it must contain a pointer to a time_t to be used
+ * as reference date/time. Otherwise, the current time is used.
+ *
+ * If 's' does not specify a full date/time, the 'round' parameter
+ * specifies if and how the result should be rounded as follows:
+ *
+ *   PARSE_TIME_NO_ROUND: All date/time units that are not specified
+ *   by 's' are set to the corresponding unit derived from the
+ *   reference date/time.
+ *
+ *   PARSE_TIME_ROUND_DOWN: All date/time units that are more accurate
+ *   than the most accurate unit specified by 's' are set to the
+ *   smallest valid value for that unit. Rest of the unspecified units
+ *   are set as in PARSE_TIME_NO_ROUND.
+ *
+ *   PARSE_TIME_ROUND_UP: All date/time units that are more accurate
+ *   than the most accurate unit specified by 's' are set to the
+ *   smallest valid value for that unit. The most accurate unit
+ *   specified by 's' is incremented by one (and this is rolled over
+ *   to the less accurate units as necessary), unless the most
+ *   accurate unit is seconds. Rest of the unspecified units are set
+ *   as in PARSE_TIME_NO_ROUND.
+ *
+ *   PARSE_TIME_ROUND_UP_INCLUSIVE: Same as PARSE_TIME_ROUND_UP, minus
+ *   one second, unless the most accurate unit specified by 's' is
+ *   seconds. This is useful for callers that require a value for
+ *   inclusive comparison of the result.
+ *
+ * Return 0 (PARSE_TIME_OK) for succesfully parsed date/time, or one
+ * of PARSE_TIME_ERR_* on error. 't' is not modified on error.
+ */
+int parse_time_string (const char *s, time_t *t, const time_t *ref, int round);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* PARSE_TIME_STRING_H */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 3/9] test: add new test tool parse-time for date/time parser
  2012-10-21 21:22 [PATCH v5 0/9] notmuch search date:since..until query support Jani Nikula
  2012-10-21 21:22 ` [PATCH v5 1/9] build: drop the -Wswitch-enum warning Jani Nikula
  2012-10-21 21:22 ` [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch Jani Nikula
@ 2012-10-21 21:22 ` Jani Nikula
  2012-10-21 21:22 ` [PATCH v5 4/9] test: add smoke tests for the date/time parser module Jani Nikula
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Jani Nikula @ 2012-10-21 21:22 UTC (permalink / raw)
  To: notmuch

Add a smoke testing tool to support testing the date/time parser
module directly and independent of the rest of notmuch.

Credits to Michal Sojka <sojkam1@fel.cvut.cz> for the stdin parsing
idea and consequent massive improvement in testability.
---
 test/Makefile.local |    7 +-
 test/basic          |    2 +-
 test/parse-time.c   |  281 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 288 insertions(+), 2 deletions(-)
 create mode 100644 test/parse-time.c

diff --git a/test/Makefile.local b/test/Makefile.local
index 45df4c7..9ae130a 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -19,9 +19,13 @@ $(dir)/smtp-dummy: $(smtp_dummy_modules)
 $(dir)/symbol-test: $(dir)/symbol-test.o
 	$(call quiet,CXX) $^ -o $@ -Llib -lnotmuch $(XAPIAN_LDFLAGS)
 
+$(dir)/parse-time: $(dir)/parse-time.o parse-time-string/parse-time-string.o
+	$(call quiet,CC) $^ -o $@
+
 .PHONY: test check
 
-test-binaries: $(dir)/arg-test $(dir)/smtp-dummy $(dir)/symbol-test
+test-binaries: $(dir)/arg-test $(dir)/smtp-dummy $(dir)/symbol-test \
+	$(dir)/parse-time
 
 test:	all test-binaries
 	@${dir}/notmuch-test $(OPTIONS)
@@ -32,4 +36,5 @@ SRCS := $(SRCS) $(smtp_dummy_srcs)
 CLEAN := $(CLEAN) $(dir)/smtp-dummy $(dir)/smtp-dummy.o \
 	 $(dir)/symbol-test $(dir)/symbol-test.o \
 	 $(dir)/arg-test $(dir)/arg-test.o \
+	 $(dir)/parse-time $(dir)/parse-time.o \
 	 $(dir)/corpus.mail $(dir)/test-results $(dir)/tmp.*
diff --git a/test/basic b/test/basic
index 3b635c8..c47197c 100755
--- a/test/basic
+++ b/test/basic
@@ -54,7 +54,7 @@ test_begin_subtest 'Ensure that all available tests will be run by notmuch-test'
 eval $(sed -n -e '/^TESTS="$/,/^"$/p' $TEST_DIRECTORY/notmuch-test)
 tests_in_suite=$(for i in $TESTS; do echo $i; done | sort)
 available=$(find "$TEST_DIRECTORY" -maxdepth 1 -type f -perm +111 | \
-    sed -r -e "s,.*/,," -e "/^(aggregate-results.sh|notmuch-test|smtp-dummy|test-verbose|symbol-test|arg-test)$/d" | \
+    sed -r -e "s,.*/,," -e "/^(aggregate-results.sh|notmuch-test|smtp-dummy|test-verbose|symbol-test|arg-test|parse-time)$/d" | \
     sort)
 test_expect_equal "$tests_in_suite" "$available"
 
diff --git a/test/parse-time.c b/test/parse-time.c
new file mode 100644
index 0000000..5f73b85
--- /dev/null
+++ b/test/parse-time.c
@@ -0,0 +1,281 @@
+/*
+ * parse time string - user friendly date and time parser
+ * Copyright © 2012 Jani Nikula
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Jani Nikula <jani@nikula.org>
+ */
+
+#include <assert.h>
+#include <ctype.h>
+#include <getopt.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "parse-time-string.h"
+
+#define ARRAY_SIZE(a) (sizeof (a) / sizeof (a[0]))
+
+/*
+ * concat argv[start]...argv[end - 1], separating them by a single
+ * space, to a malloced string
+ */
+static char *
+concat_args (int start, int end, char *argv[])
+{
+    int i;
+    size_t len = 1;
+    char *p;
+
+    for (i = start; i < end; i++)
+	len += strlen (argv[i]) + 1;
+
+    p = malloc (len);
+    if (!p)
+	return NULL;
+
+    *p = 0;
+
+    for (i = start; i < end; i++) {
+	if (i != start)
+	    strcat (p, " ");
+	strcat (p, argv[i]);
+    }
+
+    return p;
+}
+
+#define DEFAULT_FORMAT "%a %b %d %T %z %Y"
+
+static void
+usage (const char *name)
+{
+    printf ("Usage: %s [options ...] [<date/time>]\n\n", name);
+    printf (
+	"Parse <date/time> and display it in given format. If <date/time> is\n"
+	"not given, parse each line in stdin according to:\n\n"
+	"  <date/time> [(==>|==_>|==^>|==^^>)<ignored>] [#<comment>]\n\n"
+	"and produce output:\n\n"
+	"  <date/time> (==>|==_>|==^>|==^^>) <time in --format=FMT> [#<comment>]\n\n"
+	"preserving whitespace and comment in input. The operators ==>, ==_>,\n"
+	"==^>, and ==^^> define rounding as no rounding, round down, round up\n"
+	"inclusive, and round up, respectively.\n\n"
+
+	"  -f, --format=FMT output format, FMT according to strftime(3)\n"
+	"                   (default: \"%s\")\n"
+	"  -r, --ref=N      use N seconds since epoch as reference time\n"
+	"                   (default: now)\n"
+	"  -u, --^          round result up inclusive (default: no rounding)\n"
+	"  -U, --^^         round result up (default: no rounding)\n"
+	"  -d, --_          round result down (default: no rounding)\n"
+	"  -h, --help       print this help\n",
+	DEFAULT_FORMAT);
+}
+
+struct {
+    const char *operator;
+    int round;
+} operators[] = {
+    { "==>",	PARSE_TIME_NO_ROUND },
+    { "==_>",	PARSE_TIME_ROUND_DOWN },
+    { "==^>",	PARSE_TIME_ROUND_UP_INCLUSIVE },
+    { "==^^>",	PARSE_TIME_ROUND_UP },
+};
+
+static const char *
+find_operator_in_string (char *str, char **ptr, int *round)
+{
+    const char *oper = NULL;
+    unsigned int i;
+
+    for (i = 0; i < ARRAY_SIZE (operators); i++) {
+	char *p = strstr (str, operators[i].operator);
+	if (p) {
+	    if (round)
+		*round = operators[i].round;
+	    if (ptr)
+		*ptr = p;
+
+	    oper = operators[i].operator;
+	    break;
+	}
+    }
+
+    return oper;
+}
+
+static const char *
+get_operator (int round)
+{
+    const char *oper = NULL;
+    unsigned int i;
+
+    for (i = 0; i < ARRAY_SIZE(operators); i++) {
+	if (round == operators[i].round) {
+	    oper = operators[i].operator;
+	    break;
+	}
+    }
+
+    return oper;
+}
+
+static int
+parse_stdin (FILE *infile, time_t *ref, int round, const char *format)
+{
+    char *input = NULL;
+    char result[1024];
+    size_t inputsize;
+    ssize_t len;
+    struct tm tm;
+    time_t t;
+    int r;
+
+    while ((len = getline (&input, &inputsize, infile)) != -1) {
+	const char *oper;
+	char *trail, *tmp;
+
+	/* trail is trailing whitespace and (optional) comment */
+	trail = strchr (input, '#');
+	if (!trail)
+	    trail = input + len;
+
+	while (trail > input && isspace ((unsigned char) *(trail-1)))
+	    trail--;
+
+	if (trail == input) {
+	    printf ("%s", input);
+	    continue;
+	}
+
+	tmp = strdup (trail);
+	if (!tmp) {
+	    fprintf (stderr, "strdup() failed\n");
+	    continue;
+	}
+	*trail = '\0';
+	trail = tmp;
+
+	/* operator */
+	oper = find_operator_in_string (input, &tmp, &round);
+	if (oper) {
+	    *tmp = '\0';
+	} else {
+	    oper = get_operator (round);
+	    assert (oper);
+	}
+
+	r = parse_time_string (input, &t, ref, round);
+	if (!r) {
+	    if (!localtime_r (&t, &tm)) {
+		fprintf (stderr, "localtime_r() failed\n");
+		free (trail);
+		continue;
+	    }
+
+	    strftime (result, sizeof (result), format, &tm);
+	} else {
+	    snprintf (result, sizeof (result), "ERROR: %d", r);
+	}
+
+	printf ("%s%s %s%s", input, oper, result, trail);
+	free (trail);
+    }
+
+    free (input);
+
+    return 0;
+}
+
+int
+main (int argc, char *argv[])
+{
+    int r;
+    struct tm tm;
+    time_t result;
+    time_t now;
+    time_t *nowp = NULL;
+    char *argstr;
+    int round = PARSE_TIME_NO_ROUND;
+    char buf[1024];
+    const char *format = DEFAULT_FORMAT;
+    struct option options[] = {
+	{ "help",	no_argument,		NULL,	'h' },
+	{ "^",		no_argument,		NULL,	'u' },
+	{ "^^",		no_argument,		NULL,	'U' },
+	{ "_",		no_argument,		NULL,	'd' },
+	{ "format",	required_argument,	NULL,	'f' },
+	{ "ref",	required_argument,	NULL,	'r' },
+	{ NULL, 0, NULL, 0 },
+    };
+
+    for (;;) {
+	int c;
+
+	c = getopt_long (argc, argv, "huUdf:r:", options, NULL);
+	if (c == -1)
+	    break;
+
+	switch (c) {
+	case 'f':
+	    /* output format */
+	    format = optarg;
+	    break;
+	case 'u':
+	    round = PARSE_TIME_ROUND_UP_INCLUSIVE;
+	    break;
+	case 'U':
+	    round = PARSE_TIME_ROUND_UP;
+	    break;
+	case 'd':
+	    round = PARSE_TIME_ROUND_DOWN;
+	    break;
+	case 'r':
+	    /* specify now in seconds since epoch */
+	    now = (time_t) strtol (optarg, NULL, 10);
+	    if (now >= (time_t) 0)
+		nowp = &now;
+	    break;
+	case 'h':
+	case '?':
+	default:
+	    usage (argv[0]);
+	    return 1;
+	}
+    }
+
+    if (optind == argc)
+	return parse_stdin (stdin, nowp, round, format);
+
+    argstr = concat_args (optind, argc, argv);
+    if (!argstr)
+	return 1;
+
+    r = parse_time_string (argstr, &result, nowp, round);
+
+    free (argstr);
+
+    if (r)
+	return 1;
+
+    if (!localtime_r (&result, &tm))
+	return 1;
+
+    strftime (buf, sizeof (buf), format, &tm);
+    printf ("%s\n", buf);
+
+    return 0;
+}
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 4/9] test: add smoke tests for the date/time parser module
  2012-10-21 21:22 [PATCH v5 0/9] notmuch search date:since..until query support Jani Nikula
                   ` (2 preceding siblings ...)
  2012-10-21 21:22 ` [PATCH v5 3/9] test: add new test tool parse-time for date/time parser Jani Nikula
@ 2012-10-21 21:22 ` Jani Nikula
  2012-10-23  4:23   ` Austin Clements
  2012-10-21 21:22 ` [PATCH v5 5/9] build: build parse-time-string as part of the notmuch lib and static cli Jani Nikula
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Jani Nikula @ 2012-10-21 21:22 UTC (permalink / raw)
  To: notmuch

Test the date/time parser module directly, independent of notmuch,
using the parse-time test tool.

Credits to Michal Sojka <sojkam1@fel.cvut.cz> for writing most of the
tests.
---
 test/notmuch-test      |    1 +
 test/parse-time-string |   71 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+)
 create mode 100755 test/parse-time-string

diff --git a/test/notmuch-test b/test/notmuch-test
index cc732c3..7eadfdf 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -60,6 +60,7 @@ TESTS="
   emacs-hello
   emacs-show
   missing-headers
+  parse-time-string
 "
 TESTS=${NOTMUCH_TESTS:=$TESTS}
 
diff --git a/test/parse-time-string b/test/parse-time-string
new file mode 100755
index 0000000..862e701
--- /dev/null
+++ b/test/parse-time-string
@@ -0,0 +1,71 @@
+#!/usr/bin/env bash
+test_description="date/time parser module"
+. ./test-lib.sh
+
+# Sanity/smoke tests for the date/time parser independent of notmuch
+
+_date ()
+{
+    date -d "$*" +%s
+}
+
+_parse_time ()
+{
+    ${TEST_DIRECTORY}/parse-time --format=%s "$*"
+}
+
+test_begin_subtest "date(1) default format without TZ code"
+test_expect_equal "$(_parse_time Fri Aug 3 23:06:06 2012)" "$(_date Fri Aug 3 23:06:06 2012)"
+
+test_begin_subtest "date(1) --rfc-2822 format"
+test_expect_equal "$(_parse_time Fri, 03 Aug 2012 23:07:46 +0100)" "$(_date Fri, 03 Aug 2012 23:07:46 +0100)"
+
+test_begin_subtest "date(1) --rfc=3339=seconds format"
+test_expect_equal "$(_parse_time 2012-08-03 23:09:37+03:00)" "$(_date 2012-08-03 23:09:37+03:00)"
+
+test_begin_subtest "Date parser tests"
+REFERENCE=$(_date Tue Jan 11 11:11:00 +0000 2011)
+cat <<EOF > INPUT
+now          ==> Tue Jan 11 11:11:00 +0000 2011
+2010-1-1     ==> ERROR: 5
+Jan 2        ==> Sun Jan 02 11:11:00 +0000 2011
+Mon          ==> Mon Jan 10 11:11:00 +0000 2011
+last Friday  ==> ERROR: 4
+2 hours ago  ==> ERROR: 1
+last month   ==> Sat Dec 11 11:11:00 +0000 2010
+month ago    ==> ERROR: 1
+8am          ==> Tue Jan 11 08:00:00 +0000 2011
+9:15         ==> Tue Jan 11 09:15:00 +0000 2011
+12:34        ==> Tue Jan 11 12:34:00 +0000 2011
+monday       ==> Mon Jan 10 11:11:00 +0000 2011
+yesterday    ==> Mon Jan 10 11:11:00 +0000 2011
+tomorrow     ==> ERROR: 1
+             ==> Tue Jan 11 11:11:00 +0000 2011 # empty string is reference time
+
+Aug 3 23:06:06 2012             ==> Fri Aug 03 23:06:06 +0000 2012 # date(1) default format without TZ code
+Fri, 03 Aug 2012 23:07:46 +0100 ==> Fri Aug 03 22:07:46 +0000 2012 # rfc-2822
+2012-08-03 23:09:37+03:00       ==> Fri Aug 03 20:09:37 +0000 2012 # rfc-3339 seconds
+
+10s           ==> Tue Jan 11 11:10:50 +0000 2011
+19701223s     ==> Fri May 28 10:37:17 +0000 2010
+19701223      ==> Wed Dec 23 11:11:00 +0000 1970
+
+19701223 +0100 ==> Wed Dec 23 11:11:00 +0000 1970 # Timezone is ignored without an error
+
+today ==^> Tue Jan 11 23:59:59 +0000 2011
+today ==_> Tue Jan 11 00:00:00 +0000 2011
+
+thisweek ==^> Sat Jan 15 23:59:59 +0000 2011
+thisweek ==_> Sun Jan 09 00:00:00 +0000 2011
+
+two months ago==> ERROR: 1 # "ago" is not supported
+two months ==> Thu Nov 11 11:11:00 +0000 2010
+
+@1348569850 ==> Tue Sep 25 10:44:10 +0000 2012
+@10 ==> Thu Jan 01 00:00:10 +0000 1970
+EOF
+
+${TEST_DIRECTORY}/parse-time --ref=${REFERENCE} < INPUT > OUTPUT
+test_expect_equal_file INPUT OUTPUT
+
+test_done
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 5/9] build: build parse-time-string as part of the notmuch lib and static cli
  2012-10-21 21:22 [PATCH v5 0/9] notmuch search date:since..until query support Jani Nikula
                   ` (3 preceding siblings ...)
  2012-10-21 21:22 ` [PATCH v5 4/9] test: add smoke tests for the date/time parser module Jani Nikula
@ 2012-10-21 21:22 ` Jani Nikula
  2012-10-21 21:22 ` [PATCH v5 6/9] lib: add date range query support Jani Nikula
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Jani Nikula @ 2012-10-21 21:22 UTC (permalink / raw)
  To: notmuch

---
 Makefile.local     |    2 +-
 lib/Makefile.local |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile.local b/Makefile.local
index 7f2d4f1..2b91946 100644
--- a/Makefile.local
+++ b/Makefile.local
@@ -277,7 +277,7 @@ notmuch_client_srcs =		\
 
 notmuch_client_modules = $(notmuch_client_srcs:.c=.o)
 
-notmuch: $(notmuch_client_modules) lib/libnotmuch.a util/libutil.a
+notmuch: $(notmuch_client_modules) lib/libnotmuch.a util/libutil.a parse-time-string/libparse-time-string.a
 	$(call quiet,CXX $(CFLAGS)) $^ $(FINAL_LIBNOTMUCH_LDFLAGS) -o $@
 
 notmuch-shared: $(notmuch_client_modules) lib/$(LINKER_NAME)
diff --git a/lib/Makefile.local b/lib/Makefile.local
index 8a9aa28..d1635cf 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -70,7 +70,7 @@ $(dir)/libnotmuch.a: $(libnotmuch_modules)
 	$(call quiet,AR) rcs $@ $^
 
 $(dir)/$(LIBNAME): $(libnotmuch_modules) notmuch.sym
-	$(call quiet,CXX $(CXXFLAGS)) $(libnotmuch_modules) $(FINAL_LIBNOTMUCH_LDFLAGS) $(LIBRARY_LINK_FLAG) -o $@ util/libutil.a
+	$(call quiet,CXX $(CXXFLAGS)) $(libnotmuch_modules) $(FINAL_LIBNOTMUCH_LDFLAGS) $(LIBRARY_LINK_FLAG) -o $@ util/libutil.a parse-time-string/libparse-time-string.a
 
 notmuch.sym: $(srcdir)/$(dir)/notmuch.h $(libnotmuch_modules)
 	sh $(srcdir)/$(lib)/gen-version-script.sh $< $(libnotmuch_modules) > $@
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 6/9] lib: add date range query support
  2012-10-21 21:22 [PATCH v5 0/9] notmuch search date:since..until query support Jani Nikula
                   ` (4 preceding siblings ...)
  2012-10-21 21:22 ` [PATCH v5 5/9] build: build parse-time-string as part of the notmuch lib and static cli Jani Nikula
@ 2012-10-21 21:22 ` Jani Nikula
  2012-10-23  4:52   ` Austin Clements
  2012-10-21 21:22 ` [PATCH v5 7/9] test: add tests for date:since..until range queries Jani Nikula
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Jani Nikula @ 2012-10-21 21:22 UTC (permalink / raw)
  To: notmuch

Add a custom value range processor to enable date and time searches of
the form date:since..until, where "since" and "until" are expressions
understood by the previously added date/time parser, to restrict the
results to messages within a particular time range (based on the Date:
header).

If "since" or "until" describes date/time at an accuracy of days or
less, the values are rounded according to the accuracy, towards past
for "since" and towards future for "until". For example,
date:november..yesterday would match from the beginning of November
until the end of yesterday. Expressions such as date:today..today
means since the beginning of today until the end of today.

Open-ended ranges are supported (since Xapian 1.2.1), i.e. you can
specify date:..until or date:since.. to not limit the start or end
date, respectively.

CAVEATS:

Xapian does not support spaces in range expressions. You can replace
the spaces with '_', or (in most cases) '-', or (in some cases) leave
the spaces out altogether.

Entering date:expr without ".." (for example date:yesterday) will not
work as you might expect. You can achieve the expected result by
duplicating the expr both sides of ".." (for example
date:yesterday..yesterday).

Open-ended ranges won't work with pre-1.2.1 Xapian, but they don't
produce an error either.

Signed-off-by: Jani Nikula <jani@nikula.org>
---
 lib/Makefile.local     |    1 +
 lib/database-private.h |    1 +
 lib/database.cc        |    5 +++++
 lib/parse-time-vrp.cc  |   40 ++++++++++++++++++++++++++++++++++++++++
 lib/parse-time-vrp.h   |   19 +++++++++++++++++++
 5 files changed, 66 insertions(+)
 create mode 100644 lib/parse-time-vrp.cc
 create mode 100644 lib/parse-time-vrp.h

diff --git a/lib/Makefile.local b/lib/Makefile.local
index d1635cf..6c0f42f 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -58,6 +58,7 @@ libnotmuch_c_srcs =		\
 
 libnotmuch_cxx_srcs =		\
 	$(dir)/database.cc	\
+	$(dir)/parse-time-vrp.cc	\
 	$(dir)/directory.cc	\
 	$(dir)/index.cc		\
 	$(dir)/message.cc	\
diff --git a/lib/database-private.h b/lib/database-private.h
index 88532d5..d3e65fd 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -52,6 +52,7 @@ struct _notmuch_database {
     Xapian::QueryParser *query_parser;
     Xapian::TermGenerator *term_gen;
     Xapian::ValueRangeProcessor *value_range_processor;
+    Xapian::ValueRangeProcessor *date_range_processor;
 };
 
 /* Return the list of terms from the given iterator matching a prefix.
diff --git a/lib/database.cc b/lib/database.cc
index 761dc1a..4df3217 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -19,6 +19,7 @@
  */
 
 #include "database-private.h"
+#include "parse-time-vrp.h"
 
 #include <iostream>
 
@@ -710,12 +711,14 @@ notmuch_database_open (const char *path,
 	notmuch->term_gen = new Xapian::TermGenerator;
 	notmuch->term_gen->set_stemmer (Xapian::Stem ("english"));
 	notmuch->value_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP);
+	notmuch->date_range_processor = new ParseTimeValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP);
 
 	notmuch->query_parser->set_default_op (Xapian::Query::OP_AND);
 	notmuch->query_parser->set_database (*notmuch->xapian_db);
 	notmuch->query_parser->set_stemmer (Xapian::Stem ("english"));
 	notmuch->query_parser->set_stemming_strategy (Xapian::QueryParser::STEM_SOME);
 	notmuch->query_parser->add_valuerangeprocessor (notmuch->value_range_processor);
+	notmuch->query_parser->add_valuerangeprocessor (notmuch->date_range_processor);
 
 	for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
 	    prefix_t *prefix = &BOOLEAN_PREFIX_EXTERNAL[i];
@@ -778,6 +781,8 @@ notmuch_database_close (notmuch_database_t *notmuch)
     notmuch->xapian_db = NULL;
     delete notmuch->value_range_processor;
     notmuch->value_range_processor = NULL;
+    delete notmuch->date_range_processor;
+    notmuch->date_range_processor = NULL;
 }
 
 void
diff --git a/lib/parse-time-vrp.cc b/lib/parse-time-vrp.cc
new file mode 100644
index 0000000..7e4eca4
--- /dev/null
+++ b/lib/parse-time-vrp.cc
@@ -0,0 +1,40 @@
+
+#include "database-private.h"
+#include "parse-time-vrp.h"
+#include "parse-time-string.h"
+
+#define PREFIX "date:"
+
+/* See *ValueRangeProcessor in xapian-core/api/valuerangeproc.cc */
+Xapian::valueno
+ParseTimeValueRangeProcessor::operator() (std::string &begin, std::string &end)
+{
+    time_t t, now;
+
+    /* Require date: prefix in start of the range... */
+    if (STRNCMP_LITERAL (begin.c_str (), PREFIX))
+	return Xapian::BAD_VALUENO;
+
+    /* ...and remove it. */
+    begin.erase (0, sizeof (PREFIX) - 1);
+
+    /* Use the same 'now' for begin and end. */
+    if (time (&now) == (time_t) -1)
+	return Xapian::BAD_VALUENO;
+
+    if (!begin.empty ()) {
+	if (parse_time_string (begin.c_str (), &t, &now, PARSE_TIME_ROUND_DOWN))
+	    return Xapian::BAD_VALUENO;
+
+	begin.assign (Xapian::sortable_serialise ((double) t));
+    }
+
+    if (!end.empty ()) {
+	if (parse_time_string (end.c_str (), &t, &now, PARSE_TIME_ROUND_UP_INCLUSIVE))
+	    return Xapian::BAD_VALUENO;
+
+	end.assign (Xapian::sortable_serialise ((double) t));
+    }
+
+    return valno;
+}
diff --git a/lib/parse-time-vrp.h b/lib/parse-time-vrp.h
new file mode 100644
index 0000000..526c217
--- /dev/null
+++ b/lib/parse-time-vrp.h
@@ -0,0 +1,19 @@
+
+#ifndef NOTMUCH_PARSE_TIME_VRP_H
+#define NOTMUCH_PARSE_TIME_VRP_H
+
+#include <xapian.h>
+
+/* see *ValueRangeProcessor in xapian-core/include/xapian/queryparser.h */
+class ParseTimeValueRangeProcessor : public Xapian::ValueRangeProcessor {
+protected:
+    Xapian::valueno valno;
+
+public:
+    ParseTimeValueRangeProcessor (Xapian::valueno slot_)
+	: valno(slot_) { }
+
+    Xapian::valueno operator() (std::string &begin, std::string &end);
+};
+
+#endif /* NOTMUCH_PARSE_TIME_VRP_H */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 7/9] test: add tests for date:since..until range queries
  2012-10-21 21:22 [PATCH v5 0/9] notmuch search date:since..until query support Jani Nikula
                   ` (5 preceding siblings ...)
  2012-10-21 21:22 ` [PATCH v5 6/9] lib: add date range query support Jani Nikula
@ 2012-10-21 21:22 ` Jani Nikula
  2012-10-21 21:22 ` [PATCH v5 8/9] man: document the " Jani Nikula
  2012-10-21 21:22 ` [PATCH v5 9/9] NEWS: date range search support Jani Nikula
  8 siblings, 0 replies; 21+ messages in thread
From: Jani Nikula @ 2012-10-21 21:22 UTC (permalink / raw)
  To: notmuch

A brief initial test set.
---
 test/notmuch-test |    1 +
 test/search-date  |   21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)
 create mode 100755 test/search-date

diff --git a/test/notmuch-test b/test/notmuch-test
index 7eadfdf..9a1b375 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -61,6 +61,7 @@ TESTS="
   emacs-show
   missing-headers
   parse-time-string
+  search-date
 "
 TESTS=${NOTMUCH_TESTS:=$TESTS}
 
diff --git a/test/search-date b/test/search-date
new file mode 100755
index 0000000..70bcf34
--- /dev/null
+++ b/test/search-date
@@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+test_description="date:since..until queries"
+. ./test-lib.sh
+
+add_email_corpus
+
+test_begin_subtest "Absolute date range"
+output=$(notmuch search date:2010-12-16..12/16/2010 | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2010-12-16 [1/1] Olivier Berger; Essai accentué (inbox unread)"
+
+test_begin_subtest "Absolute time range with TZ"
+notmuch search date:18-Nov-2009_02:19:26-0800..2009-11-18_04:49:52-06:00 | notmuch_search_sanitize > OUTPUT
+cat <<EOF >EXPECTED
+thread:XXX   2009-11-18 [1/3] Carl Worth| Jan Janak; [notmuch] What a great idea! (inbox unread)
+thread:XXX   2009-11-18 [1/2] Carl Worth| Jan Janak; [notmuch] [PATCH] Older versions of install do not support -C. (inbox unread)
+thread:XXX   2009-11-18 [1/3] Carl Worth| Aron Griffis, Keith Packard; [notmuch] archive (inbox unread)
+thread:XXX   2009-11-18 [1/2] Carl Worth| Keith Packard; [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (inbox unread)
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_done
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 8/9] man: document the date:since..until range queries
  2012-10-21 21:22 [PATCH v5 0/9] notmuch search date:since..until query support Jani Nikula
                   ` (6 preceding siblings ...)
  2012-10-21 21:22 ` [PATCH v5 7/9] test: add tests for date:since..until range queries Jani Nikula
@ 2012-10-21 21:22 ` Jani Nikula
  2012-10-24 21:08   ` Austin Clements
  2012-10-21 21:22 ` [PATCH v5 9/9] NEWS: date range search support Jani Nikula
  8 siblings, 1 reply; 21+ messages in thread
From: Jani Nikula @ 2012-10-21 21:22 UTC (permalink / raw)
  To: notmuch

---
 man/man7/notmuch-search-terms.7 |  147 +++++++++++++++++++++++++++++++++++----
 1 file changed, 135 insertions(+), 12 deletions(-)

diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7
index 17a109e..fbd3ee7 100644
--- a/man/man7/notmuch-search-terms.7
+++ b/man/man7/notmuch-search-terms.7
@@ -54,6 +54,8 @@ terms to match against specific portions of an email, (where
 
 	folder:<directory-path>
 
+	date:<since>..<until>
+
 The
 .B from:
 prefix is used to match the name or address of the sender of an email
@@ -104,6 +106,26 @@ contained within particular directories within the mail store. Only
 the directory components below the top-level mail database path are
 available to be searched.
 
+The
+.B date:
+prefix can be used to restrict the results to only messages within a
+particular time range (based on the Date: header) with a range syntax
+of:
+
+	date:<since>..<until>
+
+See \fBDATE AND TIME SEARCH\fR below for details on the range
+expression, and supported syntax for <since> and <until> date and time
+expressions.
+
+The time range can also be specified using timestamps with a syntax
+of:
+
+	<initial-timestamp>..<final-timestamp>
+
+Each timestamp is a number representing the number of seconds since
+1970\-01\-01 00:00:00 UTC.
+
 In addition to individual terms, multiple terms can be
 combined with Boolean operators (
 .BR and ", " or ", " not
@@ -117,20 +139,121 @@ operators, but will have to be protected from interpretation by the
 shell, (such as by putting quotation marks around any parenthesized
 expression).
 
-Finally, results can be restricted to only messages within a
-particular time range, (based on the Date: header) with a syntax of:
+.SH DATE AND TIME SEARCH
 
-	<initial-timestamp>..<final-timestamp>
+This is a non-exhaustive description of the date and time search with
+some pseudo notation. Most of the constructs can be mixed freely, and
+in any order, but the same absolute date or time can't be expressed
+twice.
 
-Each timestamp is a number representing the number of seconds since
-1970\-01\-01 00:00:00 UTC. This is not the most convenient means of
-expressing date ranges, but until notmuch is fixed to accept a more
-convenient form, one can use the date program to construct
-timestamps. For example, with the bash shell the following syntax would
-specify a date range to return messages from 2009\-10\-01 until the
-current time:
-
-	$(date +%s \-d 2009\-10\-01)..$(date +%s)
+.RS 4
+.TP 4
+.B The range expression
+
+date:<since>..<until>
+
+The above expression restricts the results to only messages from
+<since> to <until>, based on the Date: header.
+
+If <since> or <until> describes time at an accuracy of days or less,
+the date/time is rounded, towards past for <since> and towards future
+for <until>, to be inclusive. For example, date:january..february
+matches from the beginning of January until the end of
+February. Similarly, date:yesterday..yesterday matches from the
+beginning of yesterday until the end of yesterday.
+
+Open-ended ranges are supported (since Xapian 1.2.1), i.e. it's
+possible to specify date:..<until> or date:<since>.. to not limit the
+start or end time, respectively. Unfortunately, pre-1.2.1 Xapian does
+not report an error on open ended ranges, but it does not work as
+expected either.
+
+Xapian does not support spaces in range expressions. You can replace
+the spaces with '_', or (in most cases) '-', or (in some cases) leave
+the spaces out altogether.
+
+Entering date:expr without ".." (for example date:yesterday) won't
+work, as it's not interpreted as a range expression at all. You can
+achieve the expected result by duplicating the expr both sides of ".."
+(for example date:yesterday..yesterday).
+.RE
+
+.RS 4
+.TP 4
+.B Relative date and time
+[N|number] (years|months|weeks|days|hours|hrs|minutes|mins|seconds|secs) [...]
+
+All refer to past, can be repeated and will be accumulated.
+
+Units can be abbreviated to any length, with the otherwise ambiguous
+single m being m for minutes and M for months.
+
+Number multiplier can also be written out one, two, ..., ten, dozen,
+hundred. As special cases last means one ("last week") and this means
+zero ("this month").
+
+When combined with absolute date and time, the relative date and time
+specification will be relative from the specified absolute date and
+time.
+
+Examples: 5M2d, two weeks
+.RE
+
+.RS 4
+.TP 4
+.B Supported time formats
+H[H]:MM[:SS] [(am|a.m.|pm|p.m.)]
+
+H[H] (am|a.m.|pm|p.m.)
+
+HHMMSS
+
+now
+
+noon
+
+midnight
+
+Examples: 17:05, 5pm
+.RE
+
+.RS 4
+.TP 4
+.B Supported date formats
+YYYY-MM[-DD]
+
+DD-MM[-[YY]YY]
+
+MM-YYYY
+
+M[M]/D[D][/[YY]YY]
+
+M[M]/YYYY
+
+D[D].M[M][.[YY]YY]
+
+D[D][(st|nd|rd|th)] Mon[thname] [YYYY]
+
+Mon[thname] D[D][(st|nd|rd|th)] [YYYY]
+
+Wee[kday]
+
+Month names can be abbreviated at three or more characters.
+
+Weekday names can be abbreviated at three or more characters.
+
+Examples: 2012-07-31, 31-07-2012, 7/31/2012, August 3
+.RE
+
+.RS 4
+.TP 4
+.B Time zones
+(+|-)HH:MM
+
+(+|-)HH[MM]
+
+Some time zone codes, e.g. UTC, EET.
+.RE
 
 .SH SEE ALSO
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 9/9] NEWS: date range search support
  2012-10-21 21:22 [PATCH v5 0/9] notmuch search date:since..until query support Jani Nikula
                   ` (7 preceding siblings ...)
  2012-10-21 21:22 ` [PATCH v5 8/9] man: document the " Jani Nikula
@ 2012-10-21 21:22 ` Jani Nikula
  8 siblings, 0 replies; 21+ messages in thread
From: Jani Nikula @ 2012-10-21 21:22 UTC (permalink / raw)
  To: notmuch

---
 NEWS |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/NEWS b/NEWS
index 2b50ba3..5f5b726 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,17 @@
+Notmuch 0.15 (YYYY-MM-DD)
+=========================
+
+Library changes
+---------------
+
+Date range search support
+
+  The `date:` prefix can now be used in queries to restrict the results to only
+  messages within a particular time range (based on the Date: header) with a
+  range syntax of `date:<since>..<until>`. Notmuch supports a wide variety of
+  expressions in `<since>` and `<until>`. Please refer to the
+  `notmuch-search-terms(7)` manual page for details.
+
 Notmuch 0.14 (2012-08-20)
 =========================
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch
  2012-10-21 21:22 ` [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch Jani Nikula
@ 2012-10-22  8:14   ` Austin Clements
  2012-10-25 18:58     ` Austin Clements
  2012-10-28 22:30     ` Jani Nikula
  0 siblings, 2 replies; 21+ messages in thread
From: Austin Clements @ 2012-10-22  8:14 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

Overall this looks pretty good to me, and I must say, this parser is
amazingly flexible and copes well with a remarkably hostile grammar.

A lot of little comments below (sorry if any of this ground has
already been covered in the previous four versions).

I do have one broad comment.  While I'm all for ad hoc parsers for ad
hoc grammars like dates, there is one piece of the literature I think
this parser suffers for by ignoring: tokenizing.  I think it would
simplify a lot of this code if it did a tokenizing pass before the
parsing pass.  It doesn't have to be a serious tokenizer with
streaming and keywords and token types and junk; just something that
first splits the input into substrings, possibly just non-overlapping
matches of [[:digit:]]+|[[:alpha:]]+|[-+:/.].  This would simplify the
handling of postponed numbers because, with trivial lookahead in the
token stream, you wouldn't have to postpone them.  Likewise, it would
eliminate last_field.  It would simplify keyword matching because you
wouldn't have to worry about matching substrings (I spent a long time
staring at that code before I figured out what it would and wouldn't
accept).  Most important, I think it would make the parser more
predictable for users; for example, the parser currently accepts
things like "saturtoday" because it's aggressively single-pass.

Quoth Jani Nikula on Oct 22 at 12:22 am:
> Add a date/time parser to notmuch, to be used for adding date range
> query support for notmuch lib later on. Add the parser to a directory
> of its own to make it independent of the rest of the notmuch code
> base.
> 
> Signed-off-by: Jani Nikula <jani@nikula.org>
> ---
>  Makefile                              |    2 +-
>  parse-time-string/Makefile            |    5 +
>  parse-time-string/Makefile.local      |   12 +
>  parse-time-string/README              |    9 +
>  parse-time-string/parse-time-string.c | 1477 +++++++++++++++++++++++++++++++++
>  parse-time-string/parse-time-string.h |  102 +++
>  6 files changed, 1606 insertions(+), 1 deletion(-)
>  create mode 100644 parse-time-string/Makefile
>  create mode 100644 parse-time-string/Makefile.local
>  create mode 100644 parse-time-string/README
>  create mode 100644 parse-time-string/parse-time-string.c
>  create mode 100644 parse-time-string/parse-time-string.h
> 
> diff --git a/Makefile b/Makefile
> index e5e2e3a..bb9c316 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -3,7 +3,7 @@
>  all:
>  
>  # List all subdirectories here. Each contains its own Makefile.local
> -subdirs = compat completion emacs lib man util test
> +subdirs = compat completion emacs lib man parse-time-string util test
>  
>  # We make all targets depend on the Makefiles themselves.
>  global_deps = Makefile Makefile.config Makefile.local \
> diff --git a/parse-time-string/Makefile b/parse-time-string/Makefile
> new file mode 100644
> index 0000000..fa25832
> --- /dev/null
> +++ b/parse-time-string/Makefile
> @@ -0,0 +1,5 @@
> +all:
> +	$(MAKE) -C .. all
> +
> +.DEFAULT:
> +	$(MAKE) -C .. $@
> diff --git a/parse-time-string/Makefile.local b/parse-time-string/Makefile.local
> new file mode 100644
> index 0000000..53534f3
> --- /dev/null
> +++ b/parse-time-string/Makefile.local
> @@ -0,0 +1,12 @@
> +dir := parse-time-string
> +extra_cflags += -I$(srcdir)/$(dir)
> +
> +libparse-time-string_c_srcs := $(dir)/parse-time-string.c
> +
> +libparse-time-string_modules := $(libparse-time-string_c_srcs:.c=.o)
> +
> +$(dir)/libparse-time-string.a: $(libparse-time-string_modules)
> +	$(call quiet,AR) rcs $@ $^
> +
> +SRCS := $(SRCS) $(libparse-time-string_c_srcs)
> +CLEAN := $(CLEAN) $(libparse-time-string_modules) $(dir)/libparse-time-string.a
> diff --git a/parse-time-string/README b/parse-time-string/README
> new file mode 100644
> index 0000000..300ff1f
> --- /dev/null
> +++ b/parse-time-string/README
> @@ -0,0 +1,9 @@
> +PARSE TIME STRING
> +=================
> +
> +parse_time_string() is a date/time parser originally written for
> +notmuch by Jani Nikula <jani@nikula.org>. However, there is nothing
> +notmuch specific in it, and it should be kept reusable for other
> +projects, and ready to be packaged on its own as needed. Please do not
> +add dependencies on or references to anything notmuch specific. The
> +parser should only depend on the C library.
> diff --git a/parse-time-string/parse-time-string.c b/parse-time-string/parse-time-string.c
> new file mode 100644
> index 0000000..942041a
> --- /dev/null
> +++ b/parse-time-string/parse-time-string.c
> @@ -0,0 +1,1477 @@
> +/*
> + * parse time string - user friendly date and time parser
> + * Copyright © 2012 Jani Nikula
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Author: Jani Nikula <jani@nikula.org>
> + */
> +
> +#include <assert.h>
> +#include <ctype.h>
> +#include <errno.h>
> +#include <limits.h>
> +#include <stdio.h>
> +#include <stdarg.h>
> +#include <stdbool.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <strings.h>
> +#include <time.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +
> +#include "parse-time-string.h"
> +
> +/*
> + * IMPLEMENTATION DETAILS
> + *
> + * At a high level, the parsing is done in two phases: 1) actual
> + * parsing of the input string and storing the parsed data into
> + * 'struct state', and 2) processing of the data in 'struct state'
> + * according to current time (or provided reference time) and
> + * rounding. This is evident in the main entry point function
> + * parse_time_string().
> + *
> + * 1) The parsing phase - parse_input()
> + *
> + * Parsing is greedy and happens from left to right. The parsing is as
> + * unambiguous as possible; only unambiguous date/time formats are
> + * accepted. Redundant or contradictory absolute date/time in the
> + * input (e.g. date specified multiple times/ways) is not
> + * accepted. Relative date/time on the other hand just accumulates if
> + * present multiple times (e.g. "5 days 5 days" just turns into 10
> + * days).
> + *
> + * Parsing decisions are made on the input format, not value. For
> + * example, "20/5/2005" fails because the recognized format here is
> + * MM/D/YYYY, even though the values would suggest DD/M/YYYY.
> + *
> + * Parsing is mostly stateless in the sense that parsing decisions are
> + * not made based on the values of previously parsed data, or whether
> + * certain data is present in the first place. (There are a few
> + * exceptions to the latter part, though, such as parsing of time zone
> + * that would otherwise look like plain time.)
> + *
> + * When the parser encounters a number that is not greedily parsed as
> + * part of a format, the interpretation is postponed until the next
> + * token is parsed. The parser for the next token may consume the
> + * previously postponed number. For example, when parsing "20 May" the
> + * meaning of "20" is not known until "May" is parsed. If the parser
> + * for the next token does not consume the postponed number, the
> + * number is handled as a "lone" number before parser for the next
> + * token finishes.
> + *
> + * 2) The processing phase - create_output()
> + *
> + * Once the parser in phase 1 has finished, 'struct state' contains
> + * all the information from the input string, and it's no longer
> + * needed. Since the parser does not even handle the concept of "now",
> + * the processing initializes the fields referring to the current
> + * date/time.
> + *
> + * If requested, the result is rounded towards past or future. The
> + * idea behind rounding is to support parsing date/time ranges in an
> + * obvious way. For example, for a range defined as two dates (without
> + * time), one would typically want to have an inclusive range from the
> + * beginning of start date to the end of the end date. The caller
> + * would use rounding towards past in the start date, and towards
> + * future in the end date.
> + *
> + * The absolute date and time is shifted by the relative date and
> + * time, and time zone adjustments are made. Daylight saving time
> + * (DST) is specifically *not* handled at all.
> + *
> + * Finally, the result is stored to time_t.
> + */
> +
> +#define unused(x) x __attribute__ ((unused))
> +
> +/* XXX: Redefine these to add i18n support. The keyword table uses
> + * N_() to mark strings to be translated; they are accessed
> + * dynamically using _(). */
> +#define _(s) (s)	/* i18n: define as gettext (s) */
> +#define N_(s) (s)	/* i18n: define as gettext_noop (s) */
> +
> +#define ARRAY_SIZE(a) (sizeof (a) / sizeof (a[0]))
> +
> +/*
> + * Field indices in the tm and set arrays of struct state.
> + *
> + * NOTE: There's some code that depends on the ordering of this enum.
> + */
> +enum field {
> +    /* Keep SEC...YEAR in this order. */
> +    TM_ABS_SEC,		/* seconds */
> +    TM_ABS_MIN,		/* minutes */
> +    TM_ABS_HOUR,	/* hours */
> +    TM_ABS_MDAY,	/* day of the month */
> +    TM_ABS_MON,		/* month */
> +    TM_ABS_YEAR,	/* year */
> +
> +    TM_ABS_WDAY,	/* day of the week. special: may be relative */

Given that this may be relative, should it really be called
TM_ABS_WDAY?

> +    TM_ABS_ISDST,	/* daylight saving time */
> +
> +    TM_AMPM,		/* am vs. pm */
> +    TM_TZ,		/* timezone in minutes */
> +
> +    /* Keep SEC...YEAR in this order. */
> +    TM_REL_SEC,		/* seconds relative to absolute or reference time */
> +    TM_REL_MIN,		/* minutes ... */
> +    TM_REL_HOUR,	/* hours ... */
> +    TM_REL_DAY,		/* days ... */
> +    TM_REL_MON,		/* months ... */
> +    TM_REL_YEAR,	/* years ... */
> +    TM_REL_WEEK,	/* weeks ... */
> +
> +    TM_NONE,		/* not a field */
> +
> +    TM_SIZE = TM_NONE,
> +    TM_FIRST_ABS = TM_ABS_SEC,
> +    TM_FIRST_REL = TM_REL_SEC,
> +};
> +
> +/* Values for the set array of struct state. */
> +enum field_set {
> +    FIELD_UNSET,	/* The field has not been touched by parser. */
> +    FIELD_SET,		/* The field has been set by parser. */
> +    FIELD_NOW,		/* The field will be set to reference time. */
> +};
> +
> +static enum field
> +next_abs_field (enum field field)
> +{
> +    /* NOTE: Depends on the enum ordering. */
> +    return field < TM_ABS_YEAR ? field + 1 : TM_NONE;
> +}
> +
> +static enum field
> +abs_to_rel_field (enum field field)
> +{
> +    assert (field <= TM_ABS_YEAR);
> +
> +    /* NOTE: Depends on the enum ordering. */
> +    return field + (TM_FIRST_REL - TM_FIRST_ABS);
> +}
> +
> +/* Get epoch value for field. */

Explain what an "epoch value" for a field is.

> +static int
> +field_epoch (enum field field)
> +{
> +    if (field == TM_ABS_MDAY || field == TM_ABS_MON)
> +	return 1;
> +    else if (field == TM_ABS_YEAR)
> +	return 1970;
> +    else
> +	return 0;
> +}
> +
> +/* The parsing state. */
> +struct state {
> +    int tm[TM_SIZE];			/* parsed date and time */
> +    enum field_set set[TM_SIZE];	/* set status of tm */
> +
> +    enum field last_field;	/* Previously set field. */
> +    char delim;
> +
> +    int postponed_length;	/* Number of digits in postponed value. */
> +    int postponed_value;
> +    char postponed_delim;	/* The delimiter preceding postponed number. */
> +};
> +
> +/*
> + * Helpers for postponed numbers.
> + *
> + * postponed_length is the number of digits in postponed value. 0
> + * means there is no postponed number. -1 means there is a postponed
> + * number, but it comes from a keyword, and it doesn't have digits.
> + */
> +static int
> +get_postponed_length (struct state *state)
> +{
> +    return state->postponed_length;
> +}
> +
> +/*
> + * Consume a previously postponed number. Return true if a number was
> + * in fact postponed, false otherwise. Store the postponed number's
> + * value in *v, length in the input string in *n (or -1 if the number
> + * was written out and parsed as a keyword), and the preceding
> + * delimiter to *d.

Mention that v, n, and d are unchanged if no number is postponed?  You
exploit this for default values elsewhere in the code.

> + */
> +static bool
> +get_postponed_number (struct state *state, int *v, int *n, char *d)

Maybe "consume_postponed_number" to emphasize that this function has
side-effects (and isn't simply a "getter")?

> +{
> +    if (!state->postponed_length)
> +	return false;
> +
> +    if (n)
> +	*n = state->postponed_length;
> +
> +    if (v)
> +	*v = state->postponed_value;
> +
> +    if (d)
> +	*d = state->postponed_delim;
> +
> +    state->postponed_length = 0;
> +    state->postponed_value = 0;
> +    state->postponed_delim = 0;
> +
> +    return true;
> +}
> +
> +static int parse_postponed_number (struct state *state, enum field next_field);
> +
> +/*
> + * Postpone a number to be handled later. If one exists already,
> + * handle it first. n may be -1 to indicate a keyword that has no
> + * number length.
> + */
> +static int
> +set_postponed_number (struct state *state, int v, int n)
> +{
> +    int r;
> +    char d = state->delim;
> +
> +    /* Parse a previously postponed number, if any. */
> +    r = parse_postponed_number (state, TM_NONE);
> +    if (r)
> +	return r;
> +
> +    state->postponed_length = n;
> +    state->postponed_value = v;
> +    state->postponed_delim = d;
> +
> +    return 0;
> +}
> +
> +static void
> +set_delim (struct state *state, char delim)
> +{
> +    state->delim = delim;
> +}
> +
> +static void
> +unset_delim (struct state *state)
> +{
> +    state->delim = 0;
> +}
> +
> +/*
> + * Field set/get/mod helpers.
> + */
> +
> +/* Return true if field has been set. */
> +static bool
> +is_field_set (struct state *state, enum field field)
> +{
> +    assert (field < ARRAY_SIZE (state->tm));
> +
> +    return field < ARRAY_SIZE (state->set) &&

state->tm and state->set are the same size, so this will always by
true given that the assert hasn't fired.  Is this just defensive
programming?

> +	   state->set[field] != FIELD_UNSET;
> +}
> +
> +static void
> +unset_field (struct state *state, enum field field)
> +{
> +    assert (field < ARRAY_SIZE (state->tm));
> +
> +    state->set[field] = FIELD_UNSET;
> +    state->tm[field] = 0;
> +}
> +
> +/*
> + * Set field to value. A field can only be set once to ensure the
> + * input does not contain redundant and potentially conflicting data.
> + */
> +static int
> +set_field (struct state *state, enum field field, int value)
> +{
> +    int r;
> +
> +    assert (field < ARRAY_SIZE (state->tm));
> +
> +    /* Fields can only be set once. */
> +    if (field < ARRAY_SIZE (state->set) && state->set[field] != FIELD_UNSET)

Same comment about array sizes.  Also, this should probably call
is_field_set instead of open-coding it (which would make the array
size check even more redundant!)

> +	return -PARSE_TIME_ERR_ALREADYSET;
> +
> +    state->set[field] = FIELD_SET;
> +
> +    /* Parse a previously postponed number, if any. */
> +    r = parse_postponed_number (state, field);

I don't understand the big picture with postponed number handling yet,
but is it worth mentioning in this function's doc comment that it
processes postponed numbers?

> +    if (r)
> +	return r;
> +
> +    unset_delim (state);
> +
> +    state->tm[field] = value;
> +    state->last_field = field;
> +
> +    return 0;
> +}
> +
> +/*
> + * Mark n fields in fields to be set to the reference date/time in the
> + * specified time zone, or local timezone if not specified. The fields
> + * will be initialized after parsing is complete and timezone is
> + * known.
> + */
> +static int
> +set_fields_to_now (struct state *state, enum field *fields, size_t n)
> +{
> +    size_t i;
> +    int r;
> +
> +    for (i = 0; i < n; i++) {
> +	r = set_field (state, fields[i], 0);
> +	if (r)
> +	    return r;
> +	state->set[fields[i]] = FIELD_NOW;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Modify field by adding value to it. To be used on relative fields,
> + * which can be modified multiple times (to accumulate). */
> +static int
> +mod_field (struct state *state, enum field field, int value)

add_to_field?

> +{
> +    int r;
> +
> +    assert (field < ARRAY_SIZE (state->tm));   /* assert relative??? */
> +
> +    if (field < ARRAY_SIZE (state->set))

Another redundant check?

> +	state->set[field] = FIELD_SET;
> +
> +    /* Parse a previously postponed number, if any. */
> +    r = parse_postponed_number (state, field);

This postponed number stuff is getting really confusing...

> +    if (r)
> +	return r;
> +
> +    unset_delim (state);
> +
> +    state->tm[field] += value;
> +    state->last_field = field;
> +
> +    return 0;
> +}
> +
> +/*
> + * Get field value. Make sure the field is set before query. It's most
> + * likely an error to call this while parsing (for example fields set
> + * as FIELD_NOW will only be set to some value after parsing).
> + */
> +static int
> +get_field (struct state *state, enum field field)
> +{
> +    assert (field < ARRAY_SIZE (state->tm));

Assert that the field is set?

> +
> +    return state->tm[field];
> +}
> +
> +/*
> + * Validity checkers.
> + */
> +static bool is_valid_12hour (int h)
> +{
> +    return h >= 0 && h <= 12;

h >= 1?

> +}
> +
> +static bool is_valid_time (int h, int m, int s)
> +{
> +    /* Allow 24:00:00 to denote end of day. */
> +    if (h == 24 && m == 0 && s == 0)
> +	return true;
> +
> +    return h >= 0 && h <= 23 && m >= 0 && m <= 59 && s >= 0 && s <= 59;
> +}
> +
> +static bool is_valid_mday (int mday)
> +{
> +    return mday >= 1 && mday <= 31;
> +}
> +
> +static bool is_valid_mon (int mon)
> +{
> +    return mon >= 1 && mon <= 12;
> +}
> +
> +static bool is_valid_year (int year)
> +{
> +    return year >= 1970;
> +}
> +
> +static bool is_valid_date (int year, int mon, int mday)
> +{
> +    return is_valid_year (year) && is_valid_mon (mon) && is_valid_mday (mday);
> +}
> +
> +/* Unset indicator for time and date set helpers. */
> +#define UNSET -1
> +
> +/* Time set helper. No input checking. Use UNSET (-1) to leave unset. */
> +static int
> +set_abs_time (struct state *state, int hour, int min, int sec)
> +{
> +    int r;
> +
> +    if (hour != UNSET) {
> +	if ((r = set_field (state, TM_ABS_HOUR, hour)))
> +	    return r;
> +    }
> +
> +    if (min != UNSET) {
> +	if ((r = set_field (state, TM_ABS_MIN, min)))
> +	    return r;
> +    }
> +
> +    if (sec != UNSET) {
> +	if ((r = set_field (state, TM_ABS_SEC, sec)))
> +	    return r;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Date set helper. No input checking. Use UNSET (-1) to leave unset. */
> +static int
> +set_abs_date (struct state *state, int year, int mon, int mday)
> +{
> +    int r;
> +
> +    if (year != UNSET) {
> +	if ((r = set_field (state, TM_ABS_YEAR, year)))
> +	    return r;
> +    }
> +
> +    if (mon != UNSET) {
> +	if ((r = set_field (state, TM_ABS_MON, mon)))
> +	    return r;
> +    }
> +
> +    if (mday != UNSET) {
> +	if ((r = set_field (state, TM_ABS_MDAY, mday)))
> +	    return r;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Keyword parsing and handling.
> + */
> +struct keyword;
> +typedef int (*setter_t)(struct state *state, struct keyword *kw);
> +
> +struct keyword {
> +    const char *name;	/* keyword */
> +    enum field field;	/* field to set, or FIELD_NONE if N/A */
> +    int value;		/* value to set, or 0 if N/A */
> +    setter_t set;	/* function to use for setting, if non-NULL */
> +};
> +
> +/*
> + * Setter callback functions for keywords.
> + */
> +static int
> +kw_set_default (struct state *state, struct keyword *kw)

It took me a while to figure out what the name of this had to do with
the action it performs, then I realized that it's never used in the
table and only called when set is NULL.  Given that, I think it would
make more sense to just put the set_field call in place of the one
current call to kw_set_default.  Currently, this seems like one
indirection too much.

> +{
> +    return set_field (state, kw->field, kw->value);
> +}
> +
> +static int
> +kw_set_rel (struct state *state, struct keyword *kw)
> +{
> +    int multiplier = 1;
> +
> +    /* Get a previously set multiplier, if any. */
> +    get_postponed_number (state, &multiplier, NULL, NULL);
> +
> +    /* Accumulate relative field values. */
> +    return mod_field (state, kw->field, multiplier * kw->value);
> +}
> +
> +static int
> +kw_set_number (struct state *state, struct keyword *kw)
> +{
> +    /* -1 = no length, from keyword. */
> +    return set_postponed_number (state, kw->value, -1);
> +}
> +
> +static int
> +kw_set_month (struct state *state, struct keyword *kw)
> +{
> +    int n = get_postponed_length (state);
> +
> +    /* Consume postponed number if it could be mday. This handles "20
> +     * January". */
> +    if (n == 1 || n == 2) {

Should this be (n && is_valid_mday (state->postponed_value))?  It
seems a little odd that postponed numbers three digits or longer are
treated as independent, but two digits numbers > 31 are an error.

> +	int r, v;
> +
> +	get_postponed_number (state, &v, NULL, NULL);
> +
> +	if (!is_valid_mday (v))
> +	    return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +	r = set_field (state, TM_ABS_MDAY, v);
> +	if (r)
> +	    return r;
> +    }
> +
> +    return set_field (state, kw->field, kw->value);
> +}
> +
> +static int
> +kw_set_ampm (struct state *state, struct keyword *kw)
> +{
> +    int n = get_postponed_length (state);
> +
> +    /* Consume postponed number if it could be hour. This handles
> +     * "5pm". */
> +    if (n == 1 || n == 2) {

Same comment as for kw_set_month.

> +	int r, v;
> +
> +	get_postponed_number (state, &v, NULL, NULL);
> +
> +	if (!is_valid_12hour (v))
> +	    return -PARSE_TIME_ERR_INVALIDTIME;
> +
> +	r = set_abs_time (state, v, 0, 0);
> +	if (r)
> +	    return r;
> +    }
> +
> +    return set_field (state, kw->field, kw->value);
> +}
> +
> +static int
> +kw_set_timeofday (struct state *state, struct keyword *kw)
> +{
> +    return set_abs_time (state, kw->value, 0, 0);
> +}
> +
> +static int
> +kw_set_today (struct state *state, unused (struct keyword *kw))
> +{
> +    enum field fields[] = { TM_ABS_YEAR, TM_ABS_MON, TM_ABS_MDAY };
> +
> +    return set_fields_to_now (state, fields, ARRAY_SIZE (fields));
> +}
> +
> +static int
> +kw_set_now (struct state *state, unused (struct keyword *kw))
> +{
> +    enum field fields[] = { TM_ABS_HOUR, TM_ABS_MIN, TM_ABS_SEC };
> +
> +    return set_fields_to_now (state, fields, ARRAY_SIZE (fields));
> +}
> +
> +static int
> +kw_set_ordinal (struct state *state, struct keyword *kw)
> +{
> +    int n, v;
> +
> +    /* Require a postponed number. */
> +    if (!get_postponed_number (state, &v, &n, NULL))
> +	return -PARSE_TIME_ERR_DATEFORMAT;
> +
> +    /* Ordinals are mday. */
> +    if (n != 1 && n != 2)

Is this redundant with your is_valid_mday test below?

> +	return -PARSE_TIME_ERR_DATEFORMAT;
> +
> +    /* Be strict about st, nd, rd, and lax about th. */
> +    if (strcasecmp (kw->name, "st") == 0 && v != 1 && v != 21 && v != 31)
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +    else if (strcasecmp (kw->name, "nd") == 0 && v != 2 && v != 22)
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +    else if (strcasecmp (kw->name, "rd") == 0 && v != 3 && v != 23)
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +    else if (strcasecmp (kw->name, "th") == 0 && !is_valid_mday (v))
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +    return set_field (state, TM_ABS_MDAY, v);
> +}
> +
> +/*
> + * Accepted keywords.
> + *
> + * A keyword may optionally contain a '|' to indicate the minimum
> + * match length. Without one, full match is required. It's advisable
> + * to keep the minimum match parts unique across all keywords.
> + *
> + * If keyword begins with upper case letter, then the matching will be
> + * case sensitive. Otherwise the matching is case insensitive.
> + *
> + * If setter is NULL, set_default will be used.
> + *
> + * Note: Order matters. Matching is greedy, longest match is used, but
> + * of equal length matches the first one is used, unless there's an
> + * equal length case sensitive match which trumps case insensitive
> + * matches.

If you do have a tokenizer (or disallow mashing keywords together),
then all of complexity arising from longest match goes away because
the keyword token either will or won't match a keyword.  If you also
eliminate the rule for case sensitivity and put case-sensitive things
before conflicting case-insensitive things (so put "M" before
"m|inutes"), then you can simply use the first match.

> + */
> +static struct keyword keywords[] = {
> +    /* Weekdays. */
> +    { N_("sun|day"),	TM_ABS_WDAY,	0,	NULL },
> +    { N_("mon|day"),	TM_ABS_WDAY,	1,	NULL },
> +    { N_("tue|sday"),	TM_ABS_WDAY,	2,	NULL },
> +    { N_("wed|nesday"),	TM_ABS_WDAY,	3,	NULL },
> +    { N_("thu|rsday"),	TM_ABS_WDAY,	4,	NULL },
> +    { N_("fri|day"),	TM_ABS_WDAY,	5,	NULL },
> +    { N_("sat|urday"),	TM_ABS_WDAY,	6,	NULL },
> +
> +    /* Months. */
> +    { N_("jan|uary"),	TM_ABS_MON,	1,	kw_set_month },
> +    { N_("feb|ruary"),	TM_ABS_MON,	2,	kw_set_month },
> +    { N_("mar|ch"),	TM_ABS_MON,	3,	kw_set_month },
> +    { N_("apr|il"),	TM_ABS_MON,	4,	kw_set_month },
> +    { N_("may"),	TM_ABS_MON,	5,	kw_set_month },
> +    { N_("jun|e"),	TM_ABS_MON,	6,	kw_set_month },
> +    { N_("jul|y"),	TM_ABS_MON,	7,	kw_set_month },
> +    { N_("aug|ust"),	TM_ABS_MON,	8,	kw_set_month },
> +    { N_("sep|tember"),	TM_ABS_MON,	9,	kw_set_month },
> +    { N_("oct|ober"),	TM_ABS_MON,	10,	kw_set_month },
> +    { N_("nov|ember"),	TM_ABS_MON,	11,	kw_set_month },
> +    { N_("dec|ember"),	TM_ABS_MON,	12,	kw_set_month },
> +
> +    /* Durations. */
> +    { N_("y|ears"),	TM_REL_YEAR,	1,	kw_set_rel },
> +    { N_("w|eeks"),	TM_REL_WEEK,	1,	kw_set_rel },
> +    { N_("d|ays"),	TM_REL_DAY,	1,	kw_set_rel },
> +    { N_("h|ours"),	TM_REL_HOUR,	1,	kw_set_rel },
> +    { N_("hr|s"),	TM_REL_HOUR,	1,	kw_set_rel },
> +    { N_("m|inutes"),	TM_REL_MIN,	1,	kw_set_rel },
> +    /* M=months, m=minutes */
> +    { N_("M"),		TM_REL_MON,	1,	kw_set_rel },
> +    { N_("mins"),	TM_REL_MIN,	1,	kw_set_rel },
> +    { N_("mo|nths"),	TM_REL_MON,	1,	kw_set_rel },
> +    { N_("s|econds"),	TM_REL_SEC,	1,	kw_set_rel },
> +    { N_("secs"),	TM_REL_SEC,	1,	kw_set_rel },
> +
> +    /* Numbers. */
> +    { N_("one"),	TM_NONE,	1,	kw_set_number },
> +    { N_("two"),	TM_NONE,	2,	kw_set_number },
> +    { N_("three"),	TM_NONE,	3,	kw_set_number },
> +    { N_("four"),	TM_NONE,	4,	kw_set_number },
> +    { N_("five"),	TM_NONE,	5,	kw_set_number },
> +    { N_("six"),	TM_NONE,	6,	kw_set_number },
> +    { N_("seven"),	TM_NONE,	7,	kw_set_number },
> +    { N_("eight"),	TM_NONE,	8,	kw_set_number },
> +    { N_("nine"),	TM_NONE,	9,	kw_set_number },
> +    { N_("ten"),	TM_NONE,	10,	kw_set_number },
> +    { N_("dozen"),	TM_NONE,	12,	kw_set_number },
> +    { N_("hundred"),	TM_NONE,	100,	kw_set_number },
> +
> +    /* Special number forms. */
> +    { N_("this"),	TM_NONE,	0,	kw_set_number },
> +    { N_("last"),	TM_NONE,	1,	kw_set_number },
> +
> +    /* Other special keywords. */
> +    { N_("yesterday"),	TM_REL_DAY,	1,	kw_set_rel },
> +    { N_("today"),	TM_NONE,	0,	kw_set_today },
> +    { N_("now"),	TM_NONE,	0,	kw_set_now },
> +    { N_("noon"),	TM_NONE,	12,	kw_set_timeofday },
> +    { N_("midnight"),	TM_NONE,	0,	kw_set_timeofday },
> +    { N_("am"),		TM_AMPM,	0,	kw_set_ampm },
> +    { N_("a.m."),	TM_AMPM,	0,	kw_set_ampm },
> +    { N_("pm"),		TM_AMPM,	1,	kw_set_ampm },
> +    { N_("p.m."),	TM_AMPM,	1,	kw_set_ampm },
> +    { N_("st"),		TM_NONE,	0,	kw_set_ordinal },
> +    { N_("nd"),		TM_NONE,	0,	kw_set_ordinal },
> +    { N_("rd"),		TM_NONE,	0,	kw_set_ordinal },
> +    { N_("th"),		TM_NONE,	0,	kw_set_ordinal },
> +
> +    /* Timezone codes: offset in minutes. XXX: Add more codes. */
> +    { N_("pst"),	TM_TZ,		-8*60,	NULL },
> +    { N_("mst"),	TM_TZ,		-7*60,	NULL },
> +    { N_("cst"),	TM_TZ,		-6*60,	NULL },
> +    { N_("est"),	TM_TZ,		-5*60,	NULL },
> +    { N_("ast"),	TM_TZ,		-4*60,	NULL },
> +    { N_("nst"),	TM_TZ,		-(3*60+30),	NULL },
> +
> +    { N_("gmt"),	TM_TZ,		0,	NULL },
> +    { N_("utc"),	TM_TZ,		0,	NULL },
> +
> +    { N_("wet"),	TM_TZ,		0,	NULL },
> +    { N_("cet"),	TM_TZ,		1*60,	NULL },
> +    { N_("eet"),	TM_TZ,		2*60,	NULL },
> +    { N_("fet"),	TM_TZ,		3*60,	NULL },
> +
> +    { N_("wat"),	TM_TZ,		1*60,	NULL },
> +    { N_("cat"),	TM_TZ,		2*60,	NULL },
> +    { N_("eat"),	TM_TZ,		3*60,	NULL },
> +};
> +
> +/*
> + * Compare strings s and keyword. Return number of matching chars on
> + * match, 0 for no match. Match must be at least n chars, or all of
> + * keyword if n < 0, otherwise it's not a match. Use match_case for
> + * case sensitive matching.
> + */
> +static size_t
> +match_keyword (const char *s, const char *keyword, ssize_t n, bool match_case)
> +{
> +    ssize_t i;
> +
> +    if (!n)
> +	return 0;
> +
> +    for (i = 0; *s && *keyword; i++, s++, keyword++) {
> +	if (match_case) {
> +	    if (*s != *keyword)

The pointer arithmetic doesn't seem to buy anything here.  What about
just looping over i and using s[i] and keyword[i]?

> +		break;
> +	} else {
> +	    if (tolower ((unsigned char) *s) !=
> +		tolower ((unsigned char) *keyword))

I don't think the cast to unsigned char is necessary.

> +		break;
> +	}
> +    }
> +
> +    if (n > 0)
> +	return i < n ? 0 : i;
> +    else
> +	return *keyword ? 0 : i;
> +}
> +
> +/*
> + * Parse a keyword. Return < 0 on error, number of parsed chars on
> + * success.
> + */
> +static ssize_t
> +parse_keyword (struct state *state, const char *s)
> +{
> +    unsigned int i;
> +    size_t n, max_n = 0;
> +    struct keyword *kw = NULL;
> +    int r;
> +
> +    /* Match longest keyword */
> +    for (i = 0; i < ARRAY_SIZE (keywords); i++) {
> +	/* Match case if keyword begins with upper case letter. */
> +	bool mcase = isupper ((unsigned char) keywords[i].name[0]);

Same with this cast.

> +	ssize_t minlen = -1;
> +	char keyword[128];
> +	char *p;
> +
> +	strncpy (keyword, _(keywords[i].name), sizeof (keyword));
> +
> +	/* Truncate too long keywords. XXX: Make this dynamic? */
> +	keyword[sizeof (keyword) - 1] = '\0';
> +
> +	/* Minimum match length. */
> +	p = strchr (keyword, '|');
> +	if (p) {
> +	    minlen = p - keyword;
> +
> +	    /* Remove the minimum match length separator. */
> +	    memmove (p, p + 1, strlen (p + 1) + 1);
> +	}

Would it make more sense to make match_keyword aware of the |
character?  Then you wouldn't need this dance with copying the keyword
into a scratch buffer.  I'm thinking something like (untested)

static size_t
match_keyword (const char *s, const char *keyword, bool match_case)
{
    size_t i;
    bool prefix_matched = false;

    for (i = 0; *s && *keyword; i++, s++, keyword++) {
        if (*keyword == '|') {
            prefix_matched = true;
            ++keyword;
        }
        if (match_case && *s != *keyword)
            return 0;
        else if (tolower (*s) != tolower (*keyword))
            return 0;
    }

    if (!*keyword || prefix_matched)
        return i;
    return 0;
}

> +
> +	n = match_keyword (s, keyword, minlen, mcase);
> +	if (n > max_n || (n == max_n && mcase)) {
> +	    max_n = n;
> +	    kw = &keywords[i];
> +	}
> +    }
> +
> +    if (!kw)
> +	return -PARSE_TIME_ERR_KEYWORD;
> +
> +    if (kw->set)
> +	r = kw->set (state, kw);
> +    else
> +	r = kw_set_default (state, kw);
> +
> +    if (r < 0)
> +	return r;
> +
> +    return max_n;
> +}
> +
> +/*
> + * Non-keyword parsers and their helpers.
> + */
> +
> +static int
> +set_user_tz (struct state *state, char sign, int hour, int min)
> +{
> +    int tz = hour * 60 + min;
> +
> +    assert (sign == '+' || sign == '-');
> +
> +    if (hour < 0 || hour > 14 || min < 0 || min > 59 || min % 15)

Good to see you're not forgetting our Kiribati notmuch user base.

> +	return -PARSE_TIME_ERR_INVALIDTIME;
> +
> +    if (sign == '-')
> +	tz = -tz;
> +
> +    return set_field (state, TM_TZ, tz);
> +}
> +
> +/*
> + * Parse a previously postponed number if one exists. Independent
> + * parsing of a postponed number when it wasn't consumed during
> + * parsing of the following token.
> + */
> +static int
> +parse_postponed_number (struct state *state, unused (enum field next_field))
> +{
> +    int v, n;
> +    char d;
> +
> +    /* Bail out if there's no postponed number. */
> +    if (!get_postponed_number (state, &v, &n, &d))
> +	return 0;
> +
> +    if (n == 1 || n == 2) {
> +	/* Notable exception: Previous field affects parsing. This
> +	 * handles "January 20". */
> +	if (state->last_field == TM_ABS_MON) {
> +	    /* D[D] */
> +	    if (!is_valid_mday (v))
> +		return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +	    return set_field (state, TM_ABS_MDAY, v);
> +	} else if (n == 2) {
> +	    /* XXX: Only allow if last field is hour, min, or sec? */
> +	    if (d == '+' || d == '-') {
> +		/* +/-HH */
> +		return set_user_tz (state, d, v, 0);
> +	    }
> +	}
> +    } else if (n == 4) {
> +	/* Notable exception: Value affects parsing. Time zones are
> +	 * always at most 1400 and we don't understand years before
> +	 * 1970. */
> +	if (!is_valid_year (v)) {
> +	    if (d == '+' || d == '-') {
> +		/* +/-HHMM */
> +		return set_user_tz (state, d, v / 100, v % 100);
> +	    }
> +	} else {
> +	    /* YYYY */
> +	    return set_field (state, TM_ABS_YEAR, v);
> +	}
> +    } else if (n == 6) {
> +	/* HHMMSS */
> +	int hour = v / 10000;
> +	int min = (v / 100) % 100;
> +	int sec = v % 100;
> +
> +	if (!is_valid_time (hour, min, sec))
> +	    return -PARSE_TIME_ERR_INVALIDTIME;
> +
> +	return set_abs_time (state, hour, min, sec);
> +    } else if (n == 8) {
> +	/* YYYYMMDD */
> +	int year = v / 10000;
> +	int mon = (v / 100) % 100;
> +	int mday = v % 100;
> +
> +	if (!is_valid_date (year, mon, mday))
> +	    return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +	return set_abs_date (state, year, mon, mday);
> +    } else {
> +	return -PARSE_TIME_ERR_FORMAT;

No need for the else block, given the return at the end.

> +    }
> +
> +    return -PARSE_TIME_ERR_FORMAT;
> +}
> +
> +static int tm_get_field (const struct tm *tm, enum field field);
> +
> +static int
> +set_timestamp (struct state *state, time_t t)
> +{
> +    struct tm tm;
> +    enum field f;
> +    int r;
> +
> +    if (gmtime_r (&t, &tm) == NULL)
> +	return -PARSE_TIME_ERR_LIB;
> +
> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
> +	r = set_field (state, f, tm_get_field (&tm, f));
> +	if (r)
> +	    return r;
> +    }
> +
> +    r = set_field (state, TM_TZ, 0);
> +    if (r)
> +	return r;
> +
> +    /* XXX: Prevent TM_AMPM with timestamp, e.g. "@123456 pm" */
> +
> +    return 0;
> +}
> +
> +/* Parse a single number. Typically postpone parsing until later. */
> +static int
> +parse_single_number (struct state *state, unsigned long v,
> +		     unsigned long n)
> +{
> +    assert (n);
> +
> +    if (state->delim == '@')
> +	return set_timestamp (state, (time_t) v);
> +
> +    if (v > INT_MAX)
> +	return -PARSE_TIME_ERR_FORMAT;
> +
> +    return set_postponed_number (state, v, n);
> +}
> +
> +static bool
> +is_time_sep (char c)
> +{
> +    return c == ':';
> +}
> +
> +static bool
> +is_date_sep (char c)
> +{
> +    return c == '/' || c == '-' || c == '.';
> +}
> +
> +static bool
> +is_sep (char c)
> +{
> +    return is_time_sep (c) || is_date_sep (c);
> +}
> +
> +/* Two-digit year: 00...69 is 2000s, 70...99 1900s, if n == 0 keep
> + * unset. */
> +static int
> +expand_year (unsigned long year, size_t n)
> +{
> +    if (n == 2) {
> +	return (year < 70 ? 2000 : 1900) + year;
> +    } else if (n == 4) {
> +	return year;
> +    } else {
> +	return UNSET;
> +    }
> +}
> +
> +/* Parse a date number triplet. */
> +static int
> +parse_date (struct state *state, char sep,
> +	    unsigned long v1, unsigned long v2, unsigned long v3,
> +	    size_t n1, size_t n2, size_t n3)
> +{
> +    int year = UNSET, mon = UNSET, mday = UNSET;
> +
> +    assert (is_date_sep (sep));
> +
> +    switch (sep) {
> +    case '/': /* Date: M[M]/D[D][/YY[YY]] or M[M]/YYYY */
> +	if (n1 != 1 && n1 != 2)
> +	    return -PARSE_TIME_ERR_DATEFORMAT;
> +
> +	if ((n2 == 1 || n2 == 2) && (n3 == 0 || n3 == 2 || n3 == 4)) {
> +	    /* M[M]/D[D][/YY[YY]] */
> +	    year = expand_year (v3, n3);
> +	    mon = v1;
> +	    mday = v2;
> +	} else if (n2 == 4 && n3 == 0) {
> +	    /* M[M]/YYYY */
> +	    year = v2;
> +	    mon = v1;
> +	} else {
> +	    return -PARSE_TIME_ERR_DATEFORMAT;
> +	}
> +	break;
> +
> +    case '-': /* Date: YYYY-MM[-DD] or DD-MM[-YY[YY]] or MM-YYYY */
> +	if (n1 == 4 && n2 == 2 && (n3 == 0 || n3 == 2)) {
> +	    /* YYYY-MM[-DD] */
> +	    year = v1;
> +	    mon = v2;
> +	    if (n3)
> +		mday = v3;
> +	} else if (n1 == 2 && n2 == 2 && (n3 == 0 || n3 == 2 || n3 == 4)) {
> +	    /* DD-MM[-YY[YY]] */
> +	    year = expand_year (v3, n3);
> +	    mon = v2;
> +	    mday = v1;
> +	} else if (n1 == 2 && n2 == 4 && n3 == 0) {
> +	    /* MM-YYYY */
> +	    year = v2;
> +	    mon = v1;
> +	} else {
> +	    return -PARSE_TIME_ERR_DATEFORMAT;
> +	}
> +	break;
> +
> +    case '.': /* Date: D[D].M[M][.[YY[YY]]] */
> +	if ((n1 != 1 && n1 != 2) || (n2 != 1 && n2 != 2) ||
> +	    (n3 != 0 && n3 != 2 && n3 != 4))
> +	    return -PARSE_TIME_ERR_DATEFORMAT;
> +
> +	year = expand_year (v3, n3);
> +	mon = v2;
> +	mday = v1;
> +	break;
> +    }
> +
> +    if (year != UNSET && !is_valid_year (year))
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +    if (mon != UNSET && !is_valid_mon (mon))
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +    if (mday != UNSET && !is_valid_mday (mday))
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +    return set_abs_date (state, year, mon, mday);
> +}
> +
> +/* Parse a time number triplet. */
> +static int
> +parse_time (struct state *state, char sep,
> +	    unsigned long v1, unsigned long v2, unsigned long v3,
> +	    size_t n1, size_t n2, size_t n3)
> +{
> +    assert (is_time_sep (sep));
> +
> +    if ((n1 != 1 && n1 != 2) || n2 != 2 || (n3 != 0 && n3 != 2))
> +	return -PARSE_TIME_ERR_TIMEFORMAT;
> +
> +    /*
> +     * Notable exception: Previously set fields affect
> +     * parsing. Interpret (+|-)HH:MM as time zone only if hour and
> +     * minute have been set.
> +     *
> +     * XXX: This could be fixed by restricting the delimiters
> +     * preceding time. For '+' it would be justified, but for '-' it
> +     * might be inconvenient. However prefer to allow '-' as an
> +     * insignificant delimiter preceding time for convenience, and
> +     * handle '+' the same way for consistency between positive and
> +     * negative time zones.
> +     */
> +    if (is_field_set (state, TM_ABS_HOUR) &&
> +	is_field_set (state, TM_ABS_MIN) &&
> +	n1 == 2 && n2 == 2 && n3 == 0 &&
> +	(state->delim == '+' || state->delim == '-')) {
> +	return set_user_tz (state, state->delim, v1, v2);
> +    }
> +
> +    if (!is_valid_time (v1, v2, v3))
> +	return -PARSE_TIME_ERR_INVALIDTIME;
> +
> +    return set_abs_time (state, v1, v2, n3 ? v3 : 0);
> +}
> +
> +/* strtoul helper that assigns length. */
> +static unsigned long
> +strtoul_len (const char *s, const char **endp, size_t *len)
> +{
> +    unsigned long val = strtoul (s, (char **) endp, 10);

This could technically get confused by really large numbers, but I
don't know if that's worth worrying about.

> +
> +    *len = *endp - s;
> +    return val;
> +}
> +
> +/*
> + * Parse a (group of) number(s). Return < 0 on error, number of parsed
> + * chars on success.
> + */
> +static ssize_t
> +parse_number (struct state *state, const char *s)
> +{
> +    int r;
> +    unsigned long v1, v2, v3 = 0;
> +    size_t n1, n2, n3 = 0;
> +    const char *p = s;
> +    char sep;
> +
> +    v1 = strtoul_len (p, &p, &n1);
> +
> +    if (is_sep (*p) && isdigit ((unsigned char) *(p + 1))) {

Unnecessary cast?

> +	sep = *p;
> +	v2 = strtoul_len (p + 1, &p, &n2);
> +    } else {
> +	/* A single number. */
> +	r = parse_single_number (state, v1, n1);
> +	if (r)
> +	    return r;
> +
> +	return p - s;

I found the control flow here confusing.  You might want to flip the
two conditions so the single number return happens first and the rest
of the code flows straight through:

if (!is_sep (*p) || !isdigit (*(p + 1))) {
    ...
    return p - s;
}

sep = *p;
...

> +    }
> +
> +    /* A group of two or three numbers? */
> +    if (*p == sep && isdigit ((unsigned char) *(p + 1)))
> +	v3 = strtoul_len (p + 1, &p, &n3);
> +
> +    if (is_time_sep (sep))
> +	r = parse_time (state, sep, v1, v2, v3, n1, n2, n3);
> +    else
> +	r = parse_date (state, sep, v1, v2, v3, n1, n2, n3);
> +
> +    if (r)
> +	return r;
> +
> +    return p - s;
> +}
> +
> +/*
> + * Parse delimiter(s). Throw away all except the last one, which is
> + * stored for parsing the next non-delimiter. Return < 0 on error,
> + * number of parsed chars on success.
> + *
> + * XXX: We might want to be more strict here.
> + */
> +static ssize_t
> +parse_delim (struct state *state, const char *s)
> +{
> +    const char *p = s;
> +
> +    /*
> +     * Skip non-alpha and non-digit, and store the last for further
> +     * processing.
> +     */
> +    while (*p && !isalnum ((unsigned char) *p)) {
> +	set_delim (state, *p);
> +	p++;
> +    }
> +
> +    return p - s;
> +}
> +
> +/*
> + * Parse a date/time string. Return < 0 on error, number of parsed
> + * chars on success.
> + */
> +static ssize_t
> +parse_input (struct state *state, const char *s)
> +{
> +    const char *p = s;
> +    ssize_t n;
> +    int r;
> +
> +    while (*p) {
> +	if (isalpha ((unsigned char) *p)) {
> +	    n = parse_keyword (state, p);
> +	} else if (isdigit ((unsigned char) *p)) {
> +	    n = parse_number (state, p);
> +	} else {
> +	    n = parse_delim (state, p);
> +	}
> +
> +	if (n <= 0) {
> +	    if (n == 0)
> +		n = -PARSE_TIME_ERR;
> +
> +	    return n;
> +	}
> +
> +	p += n;
> +    }
> +
> +    /* Parse a previously postponed number, if any. */
> +    r = parse_postponed_number (state, TM_NONE);
> +    if (r < 0)
> +	return r;
> +
> +    return p - s;
> +}
> +
> +/*
> + * Processing the parsed input.
> + */
> +
> +/*
> + * Initialize reference time to tm. Use time zone in state if
> + * specified, otherwise local time. Use now for reference time if
> + * non-NULL, otherwise current time.
> + */
> +static int
> +initialize_now (struct state *state, struct tm *tm, const time_t *now)

Should tm be the last argument, since it's an out-argument?

Why is now a pointer?  Just so it can be NULL?

> +{
> +    time_t t;
> +
> +    if (now) {
> +	t = *now;
> +    } else {
> +	if (time (&t) == (time_t) -1)
> +	    return -PARSE_TIME_ERR_LIB;
> +    }
> +
> +    if (is_field_set (state, TM_TZ)) {
> +	/* Some other time zone. */
> +
> +	/* Adjust now according to the TZ. */
> +	t += get_field (state, TM_TZ) * 60;
> +
> +	/* It's not gm, but this doesn't mess with the TZ. */
> +	if (gmtime_r (&t, tm) == NULL)
> +	    return -PARSE_TIME_ERR_LIB;
> +    } else {
> +	/* Local time. */
> +	if (localtime_r (&t, tm) == NULL)
> +	    return -PARSE_TIME_ERR_LIB;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Normalize tm according to mktime(3). Both mktime(3) and

This comment could elaborate a bit on what it means to normalize a tm.

> + * localtime_r(3) use local time, but they cancel each other out here,
> + * making this function agnostic to time zone.
> + */
> +static int
> +normalize_tm (struct tm *tm)
> +{
> +    time_t t = mktime (tm);
> +
> +    if (t == (time_t) -1)
> +	return -PARSE_TIME_ERR_LIB;
> +
> +    if (!localtime_r (&t, tm))
> +	return -PARSE_TIME_ERR_LIB;

Do you actually need this call to localtime_r or can you just return
after the mktime modifies tm?  Does this have to do with timezones?

> +
> +    return 0;
> +}
> +
> +/* Get field out of a struct tm. */
> +static int
> +tm_get_field (const struct tm *tm, enum field field)
> +{
> +    switch (field) {
> +    case TM_ABS_SEC:	return tm->tm_sec;
> +    case TM_ABS_MIN:	return tm->tm_min;
> +    case TM_ABS_HOUR:	return tm->tm_hour;
> +    case TM_ABS_MDAY:	return tm->tm_mday;
> +    case TM_ABS_MON:	return tm->tm_mon + 1; /* 0- to 1-based */
> +    case TM_ABS_YEAR:	return 1900 + tm->tm_year;
> +    case TM_ABS_WDAY:	return tm->tm_wday;
> +    case TM_ABS_ISDST:	return tm->tm_isdst;
> +    default:
> +	assert (false);
> +	break;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Modify hour according to am/pm setting. */
> +static int
> +fixup_ampm (struct state *state)
> +{
> +    int hour, hdiff = 0;
> +
> +    if (!is_field_set (state, TM_AMPM))
> +	return 0;
> +
> +    if (!is_field_set (state, TM_ABS_HOUR))
> +	return -PARSE_TIME_ERR_TIMEFORMAT;
> +
> +    hour = get_field (state, TM_ABS_HOUR);
> +    if (!is_valid_12hour (hour))
> +	return -PARSE_TIME_ERR_INVALIDTIME;
> +
> +    if (get_field (state, TM_AMPM)) {
> +	/* 12pm is noon. */
> +	if (hour != 12)
> +	    hdiff = 12;
> +    } else {
> +	/* 12am is midnight, beginning of day. */
> +	if (hour == 12)
> +	    hdiff = -12;
> +    }
> +
> +    mod_field (state, TM_REL_HOUR, -hdiff);
> +
> +    return 0;
> +}
> +
> +/* Combine absolute and relative fields, and round. */
> +static int
> +create_output (struct state *state, time_t *t_out, const time_t *ref,
> +	       int round)
> +{
> +    struct tm tm = { .tm_isdst = -1 };
> +    struct tm now;
> +    time_t t;
> +    enum field f;
> +    int r;
> +    int week_round = PARSE_TIME_NO_ROUND;
> +
> +    r = initialize_now (state, &now, ref);
> +    if (r)
> +	return r;
> +
> +    /* Initialize fields flagged as "now" to reference time. */
> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
> +	if (state->set[f] == FIELD_NOW) {
> +	    state->tm[f] = tm_get_field (&now, f);
> +	    state->set[f] = FIELD_SET;
> +	}
> +    }
> +
> +    /*
> +     * If WDAY is set but MDAY is not, we consider WDAY relative
> +     *
> +     * XXX: This fails on stuff like "two months monday" because two
> +     * months ago wasn't the same day as today. Postpone until we know
> +     * date?
> +     */
> +    if (is_field_set (state, TM_ABS_WDAY) &&
> +	!is_field_set (state, TM_ABS_MDAY)) {
> +	int wday = get_field (state, TM_ABS_WDAY);
> +	int today = tm_get_field (&now, TM_ABS_WDAY);
> +	int rel_days;
> +
> +	if (today > wday)
> +	    rel_days = today - wday;
> +	else
> +	    rel_days = today + 7 - wday;
> +
> +	/* This also prevents special week rounding from happening. */
> +	mod_field (state, TM_REL_DAY, rel_days);
> +
> +	unset_field (state, TM_ABS_WDAY);
> +    }
> +
> +    r = fixup_ampm (state);
> +    if (r)
> +	return r;
> +
> +    /*
> +     * Iterate fields from most accurate to least accurate, and set
> +     * unset fields according to requested rounding.
> +     */
> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
> +	if (round != PARSE_TIME_NO_ROUND) {
> +	    enum field r = abs_to_rel_field (f);
> +
> +	    if (is_field_set (state, f) || is_field_set (state, r)) {
> +		if (round >= PARSE_TIME_ROUND_UP && f != TM_ABS_SEC) {
> +		    mod_field (state, r, -1);

Crazy.  This could use a comment.  It took me a while to figure out
why this was -1, though maybe that's just because it's late.

> +		    if (round == PARSE_TIME_ROUND_UP_INCLUSIVE)
> +			mod_field (state, TM_REL_SEC, 1);
> +		}
> +		round = PARSE_TIME_NO_ROUND; /* No more rounding. */
> +	    } else {
> +		if (f == TM_ABS_MDAY &&
> +		    is_field_set (state, TM_REL_WEEK)) {
> +		    /* Week is most accurate. */
> +		    week_round = round;
> +		    round = PARSE_TIME_NO_ROUND;
> +		} else {
> +		    set_field (state, f, field_epoch (f));
> +		}
> +	    }
> +	}
> +
> +	if (!is_field_set (state, f))
> +	    set_field (state, f, tm_get_field (&now, f));
> +    }
> +
> +    /* Special case: rounding with week accuracy. */
> +    if (week_round != PARSE_TIME_NO_ROUND) {
> +	/* Temporarily set more accurate fields to now. */
> +	set_field (state, TM_ABS_SEC, tm_get_field (&now, TM_ABS_SEC));
> +	set_field (state, TM_ABS_MIN, tm_get_field (&now, TM_ABS_MIN));
> +	set_field (state, TM_ABS_HOUR, tm_get_field (&now, TM_ABS_HOUR));
> +	set_field (state, TM_ABS_MDAY, tm_get_field (&now, TM_ABS_MDAY));
> +    }
> +
> +    /*
> +     * Set all fields. They may contain out of range values before
> +     * normalization by mktime(3).
> +     */
> +    tm.tm_sec = get_field (state, TM_ABS_SEC) - get_field (state, TM_REL_SEC);
> +    tm.tm_min = get_field (state, TM_ABS_MIN) - get_field (state, TM_REL_MIN);
> +    tm.tm_hour = get_field (state, TM_ABS_HOUR) - get_field (state, TM_REL_HOUR);
> +    tm.tm_mday = get_field (state, TM_ABS_MDAY) -
> +		 get_field (state, TM_REL_DAY) - 7 * get_field (state, TM_REL_WEEK);
> +    tm.tm_mon = get_field (state, TM_ABS_MON) - get_field (state, TM_REL_MON);
> +    tm.tm_mon--; /* 1- to 0-based */
> +    tm.tm_year = get_field (state, TM_ABS_YEAR) - get_field (state, TM_REL_YEAR) - 1900;
> +
> +    /*
> +     * It's always normal time.
> +     *
> +     * XXX: This is probably not a solution that universally
> +     * works. Just make sure DST is not taken into account. We don't
> +     * want rounding to be affected by DST.
> +     */
> +    tm.tm_isdst = -1;
> +
> +    /* Special case: rounding with week accuracy. */
> +    if (week_round != PARSE_TIME_NO_ROUND) {
> +	/* Normalize to get proper tm.wday. */
> +	r = normalize_tm (&tm);
> +	if (r < 0)
> +	    return r;
> +
> +	/* Set more accurate fields back to zero. */
> +	tm.tm_sec = 0;
> +	tm.tm_min = 0;
> +	tm.tm_hour = 0;
> +	tm.tm_isdst = -1;
> +
> +	/* Monday is the true 1st day of week, but this is easier. */
> +	if (week_round >= PARSE_TIME_ROUND_UP) {
> +	    tm.tm_mday += 7 - tm.tm_wday;
> +	    if (week_round == PARSE_TIME_ROUND_UP_INCLUSIVE)
> +		tm.tm_sec--;
> +	} else {
> +	    tm.tm_mday -= tm.tm_wday;
> +	}
> +    }
> +
> +    if (is_field_set (state, TM_TZ)) {
> +	/* tm is in specified TZ, convert to UTC for timegm(3). */
> +	tm.tm_min -= get_field (state, TM_TZ);
> +	t = timegm (&tm);
> +    } else {
> +	/* tm is in local time. */
> +	t = mktime (&tm);
> +    }
> +
> +    if (t == (time_t) -1)
> +	return -PARSE_TIME_ERR_LIB;
> +
> +    *t_out = t;
> +
> +    return 0;
> +}
> +
> +/* Internally, all errors are < 0. parse_time_string() returns errors > 0. */
> +#define EXTERNAL_ERR(r) (-r)
> +
> +int
> +parse_time_string (const char *s, time_t *t, const time_t *ref, int round)
> +{
> +    struct state state = { .last_field = TM_NONE };
> +    int r;
> +
> +    if (!s || !t)
> +	return EXTERNAL_ERR (-PARSE_TIME_ERR);
> +
> +    r = parse_input (&state, s);
> +    if (r < 0)
> +	return EXTERNAL_ERR (r);
> +
> +    r = create_output (&state, t, ref, round);
> +    if (r < 0)
> +	return EXTERNAL_ERR (r);
> +
> +    return 0;
> +}
> diff --git a/parse-time-string/parse-time-string.h b/parse-time-string/parse-time-string.h
> new file mode 100644
> index 0000000..bfa4ee3
> --- /dev/null
> +++ b/parse-time-string/parse-time-string.h
> @@ -0,0 +1,102 @@
> +/*
> + * parse time string - user friendly date and time parser
> + * Copyright © 2012 Jani Nikula
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Author: Jani Nikula <jani@nikula.org>
> + */
> +
> +#ifndef PARSE_TIME_STRING_H
> +#define PARSE_TIME_STRING_H
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <time.h>
> +
> +/* return values for parse_time_string() */
> +enum {
> +    PARSE_TIME_OK = 0,
> +    PARSE_TIME_ERR,		/* unspecified error */
> +    PARSE_TIME_ERR_LIB,		/* library call failed */
> +    PARSE_TIME_ERR_ALREADYSET,	/* attempt to set unit twice */
> +    PARSE_TIME_ERR_FORMAT,	/* generic date/time format error */
> +    PARSE_TIME_ERR_DATEFORMAT,	/* date format error */
> +    PARSE_TIME_ERR_TIMEFORMAT,	/* time format error */
> +    PARSE_TIME_ERR_INVALIDDATE,	/* date value error */
> +    PARSE_TIME_ERR_INVALIDTIME,	/* time value error */
> +    PARSE_TIME_ERR_KEYWORD,	/* unknown keyword */
> +};
> +
> +/* round values for parse_time_string() */
> +enum {
> +    PARSE_TIME_ROUND_DOWN = -1,
> +    PARSE_TIME_NO_ROUND = 0,
> +    PARSE_TIME_ROUND_UP = 1,
> +    PARSE_TIME_ROUND_UP_INCLUSIVE = 2,
> +};
> +
> +/**
> + * parse_time_string() - user friendly date and time parser
> + * @s:		string to parse
> + * @t:		pointer to time_t to store parsed time in
> + * @ref:	pointer to time_t containing reference date/time, or NULL
> + * @round:	PARSE_TIME_NO_ROUND, PARSE_TIME_ROUND_DOWN, or
> + *		PARSE_TIME_ROUND_UP
> + *
> + * Parse a date/time string 's' and store the parsed date/time result
> + * in 't'.
> + *
> + * A reference date/time is used for determining the "date/time units"
> + * (roughly equivalent to struct tm members) not specified by 's'. If
> + * 'ref' is non-NULL, it must contain a pointer to a time_t to be used
> + * as reference date/time. Otherwise, the current time is used.
> + *
> + * If 's' does not specify a full date/time, the 'round' parameter
> + * specifies if and how the result should be rounded as follows:
> + *
> + *   PARSE_TIME_NO_ROUND: All date/time units that are not specified
> + *   by 's' are set to the corresponding unit derived from the
> + *   reference date/time.
> + *
> + *   PARSE_TIME_ROUND_DOWN: All date/time units that are more accurate
> + *   than the most accurate unit specified by 's' are set to the
> + *   smallest valid value for that unit. Rest of the unspecified units
> + *   are set as in PARSE_TIME_NO_ROUND.
> + *
> + *   PARSE_TIME_ROUND_UP: All date/time units that are more accurate
> + *   than the most accurate unit specified by 's' are set to the
> + *   smallest valid value for that unit. The most accurate unit
> + *   specified by 's' is incremented by one (and this is rolled over
> + *   to the less accurate units as necessary), unless the most
> + *   accurate unit is seconds. Rest of the unspecified units are set
> + *   as in PARSE_TIME_NO_ROUND.
> + *
> + *   PARSE_TIME_ROUND_UP_INCLUSIVE: Same as PARSE_TIME_ROUND_UP, minus
> + *   one second, unless the most accurate unit specified by 's' is
> + *   seconds. This is useful for callers that require a value for
> + *   inclusive comparison of the result.
> + *
> + * Return 0 (PARSE_TIME_OK) for succesfully parsed date/time, or one
> + * of PARSE_TIME_ERR_* on error. 't' is not modified on error.
> + */
> +int parse_time_string (const char *s, time_t *t, const time_t *ref, int round);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* PARSE_TIME_STRING_H */

Made it!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 4/9] test: add smoke tests for the date/time parser module
  2012-10-21 21:22 ` [PATCH v5 4/9] test: add smoke tests for the date/time parser module Jani Nikula
@ 2012-10-23  4:23   ` Austin Clements
  2012-10-28 22:34     ` Jani Nikula
  0 siblings, 1 reply; 21+ messages in thread
From: Austin Clements @ 2012-10-23  4:23 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

Quoth Jani Nikula on Oct 22 at 12:22 am:
> Test the date/time parser module directly, independent of notmuch,
> using the parse-time test tool.
> 
> Credits to Michal Sojka <sojkam1@fel.cvut.cz> for writing most of the
> tests.
> ---
>  test/notmuch-test      |    1 +
>  test/parse-time-string |   71 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 72 insertions(+)
>  create mode 100755 test/parse-time-string
> 
> diff --git a/test/notmuch-test b/test/notmuch-test
> index cc732c3..7eadfdf 100755
> --- a/test/notmuch-test
> +++ b/test/notmuch-test
> @@ -60,6 +60,7 @@ TESTS="
>    emacs-hello
>    emacs-show
>    missing-headers
> +  parse-time-string
>  "
>  TESTS=${NOTMUCH_TESTS:=$TESTS}
>  
> diff --git a/test/parse-time-string b/test/parse-time-string
> new file mode 100755
> index 0000000..862e701
> --- /dev/null
> +++ b/test/parse-time-string
> @@ -0,0 +1,71 @@
> +#!/usr/bin/env bash
> +test_description="date/time parser module"
> +. ./test-lib.sh
> +
> +# Sanity/smoke tests for the date/time parser independent of notmuch
> +
> +_date ()
> +{
> +    date -d "$*" +%s
> +}
> +
> +_parse_time ()
> +{
> +    ${TEST_DIRECTORY}/parse-time --format=%s "$*"
> +}
> +
> +test_begin_subtest "date(1) default format without TZ code"
> +test_expect_equal "$(_parse_time Fri Aug 3 23:06:06 2012)" "$(_date Fri Aug 3 23:06:06 2012)"
> +
> +test_begin_subtest "date(1) --rfc-2822 format"
> +test_expect_equal "$(_parse_time Fri, 03 Aug 2012 23:07:46 +0100)" "$(_date Fri, 03 Aug 2012 23:07:46 +0100)"
> +
> +test_begin_subtest "date(1) --rfc=3339=seconds format"
> +test_expect_equal "$(_parse_time 2012-08-03 23:09:37+03:00)" "$(_date 2012-08-03 23:09:37+03:00)"
> +
> +test_begin_subtest "Date parser tests"
> +REFERENCE=$(_date Tue Jan 11 11:11:00 +0000 2011)
> +cat <<EOF > INPUT
> +now          ==> Tue Jan 11 11:11:00 +0000 2011
> +2010-1-1     ==> ERROR: 5

It would be nice if these errors were strings.  I have no idea if "5"
is the right error for this.

> +Jan 2        ==> Sun Jan 02 11:11:00 +0000 2011
> +Mon          ==> Mon Jan 10 11:11:00 +0000 2011
> +last Friday  ==> ERROR: 4
> +2 hours ago  ==> ERROR: 1
> +last month   ==> Sat Dec 11 11:11:00 +0000 2010
> +month ago    ==> ERROR: 1
> +8am          ==> Tue Jan 11 08:00:00 +0000 2011
> +9:15         ==> Tue Jan 11 09:15:00 +0000 2011
> +12:34        ==> Tue Jan 11 12:34:00 +0000 2011
> +monday       ==> Mon Jan 10 11:11:00 +0000 2011
> +yesterday    ==> Mon Jan 10 11:11:00 +0000 2011
> +tomorrow     ==> ERROR: 1
> +             ==> Tue Jan 11 11:11:00 +0000 2011 # empty string is reference time
> +
> +Aug 3 23:06:06 2012             ==> Fri Aug 03 23:06:06 +0000 2012 # date(1) default format without TZ code
> +Fri, 03 Aug 2012 23:07:46 +0100 ==> Fri Aug 03 22:07:46 +0000 2012 # rfc-2822
> +2012-08-03 23:09:37+03:00       ==> Fri Aug 03 20:09:37 +0000 2012 # rfc-3339 seconds
> +
> +10s           ==> Tue Jan 11 11:10:50 +0000 2011
> +19701223s     ==> Fri May 28 10:37:17 +0000 2010
> +19701223      ==> Wed Dec 23 11:11:00 +0000 1970
> +
> +19701223 +0100 ==> Wed Dec 23 11:11:00 +0000 1970 # Timezone is ignored without an error
> +
> +today ==^> Tue Jan 11 23:59:59 +0000 2011
> +today ==_> Tue Jan 11 00:00:00 +0000 2011
> +
> +thisweek ==^> Sat Jan 15 23:59:59 +0000 2011
> +thisweek ==_> Sun Jan 09 00:00:00 +0000 2011
> +
> +two months ago==> ERROR: 1 # "ago" is not supported
> +two months ==> Thu Nov 11 11:11:00 +0000 2010
> +
> +@1348569850 ==> Tue Sep 25 10:44:10 +0000 2012
> +@10 ==> Thu Jan 01 00:00:10 +0000 1970

Very nice.  The only thing that jumps out at me is that there are no
==^^> tests, though it would be interesting to run a code coverage
tool to see how thorough these tests are.

> +EOF
> +
> +${TEST_DIRECTORY}/parse-time --ref=${REFERENCE} < INPUT > OUTPUT
> +test_expect_equal_file INPUT OUTPUT
> +
> +test_done

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 6/9] lib: add date range query support
  2012-10-21 21:22 ` [PATCH v5 6/9] lib: add date range query support Jani Nikula
@ 2012-10-23  4:52   ` Austin Clements
  2012-10-28 22:39     ` Jani Nikula
  0 siblings, 1 reply; 21+ messages in thread
From: Austin Clements @ 2012-10-23  4:52 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

Quoth Jani Nikula on Oct 22 at 12:22 am:
> Add a custom value range processor to enable date and time searches of
> the form date:since..until, where "since" and "until" are expressions
> understood by the previously added date/time parser, to restrict the
> results to messages within a particular time range (based on the Date:
> header).
> 
> If "since" or "until" describes date/time at an accuracy of days or
> less, the values are rounded according to the accuracy, towards past
> for "since" and towards future for "until". For example,
> date:november..yesterday would match from the beginning of November
> until the end of yesterday. Expressions such as date:today..today
> means since the beginning of today until the end of today.
> 
> Open-ended ranges are supported (since Xapian 1.2.1), i.e. you can
> specify date:..until or date:since.. to not limit the start or end
> date, respectively.
> 
> CAVEATS:
> 
> Xapian does not support spaces in range expressions. You can replace
> the spaces with '_', or (in most cases) '-', or (in some cases) leave
> the spaces out altogether.
> 
> Entering date:expr without ".." (for example date:yesterday) will not
> work as you might expect. You can achieve the expected result by
> duplicating the expr both sides of ".." (for example
> date:yesterday..yesterday).
> 
> Open-ended ranges won't work with pre-1.2.1 Xapian, but they don't
> produce an error either.
> 
> Signed-off-by: Jani Nikula <jani@nikula.org>
> ---
>  lib/Makefile.local     |    1 +
>  lib/database-private.h |    1 +
>  lib/database.cc        |    5 +++++
>  lib/parse-time-vrp.cc  |   40 ++++++++++++++++++++++++++++++++++++++++
>  lib/parse-time-vrp.h   |   19 +++++++++++++++++++
>  5 files changed, 66 insertions(+)
>  create mode 100644 lib/parse-time-vrp.cc
>  create mode 100644 lib/parse-time-vrp.h
> 
> diff --git a/lib/Makefile.local b/lib/Makefile.local
> index d1635cf..6c0f42f 100644
> --- a/lib/Makefile.local
> +++ b/lib/Makefile.local
> @@ -58,6 +58,7 @@ libnotmuch_c_srcs =		\
>  
>  libnotmuch_cxx_srcs =		\
>  	$(dir)/database.cc	\
> +	$(dir)/parse-time-vrp.cc	\
>  	$(dir)/directory.cc	\
>  	$(dir)/index.cc		\
>  	$(dir)/message.cc	\
> diff --git a/lib/database-private.h b/lib/database-private.h
> index 88532d5..d3e65fd 100644
> --- a/lib/database-private.h
> +++ b/lib/database-private.h
> @@ -52,6 +52,7 @@ struct _notmuch_database {
>      Xapian::QueryParser *query_parser;
>      Xapian::TermGenerator *term_gen;
>      Xapian::ValueRangeProcessor *value_range_processor;
> +    Xapian::ValueRangeProcessor *date_range_processor;
>  };
>  
>  /* Return the list of terms from the given iterator matching a prefix.
> diff --git a/lib/database.cc b/lib/database.cc
> index 761dc1a..4df3217 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -19,6 +19,7 @@
>   */
>  
>  #include "database-private.h"
> +#include "parse-time-vrp.h"
>  
>  #include <iostream>
>  
> @@ -710,12 +711,14 @@ notmuch_database_open (const char *path,
>  	notmuch->term_gen = new Xapian::TermGenerator;
>  	notmuch->term_gen->set_stemmer (Xapian::Stem ("english"));
>  	notmuch->value_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP);
> +	notmuch->date_range_processor = new ParseTimeValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP);
>  
>  	notmuch->query_parser->set_default_op (Xapian::Query::OP_AND);
>  	notmuch->query_parser->set_database (*notmuch->xapian_db);
>  	notmuch->query_parser->set_stemmer (Xapian::Stem ("english"));
>  	notmuch->query_parser->set_stemming_strategy (Xapian::QueryParser::STEM_SOME);
>  	notmuch->query_parser->add_valuerangeprocessor (notmuch->value_range_processor);
> +	notmuch->query_parser->add_valuerangeprocessor (notmuch->date_range_processor);
>  
>  	for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
>  	    prefix_t *prefix = &BOOLEAN_PREFIX_EXTERNAL[i];
> @@ -778,6 +781,8 @@ notmuch_database_close (notmuch_database_t *notmuch)
>      notmuch->xapian_db = NULL;
>      delete notmuch->value_range_processor;
>      notmuch->value_range_processor = NULL;
> +    delete notmuch->date_range_processor;
> +    notmuch->date_range_processor = NULL;
>  }
>  
>  void
> diff --git a/lib/parse-time-vrp.cc b/lib/parse-time-vrp.cc
> new file mode 100644
> index 0000000..7e4eca4
> --- /dev/null
> +++ b/lib/parse-time-vrp.cc
> @@ -0,0 +1,40 @@

Should this file have the usual preamble?

> +
> +#include "database-private.h"
> +#include "parse-time-vrp.h"
> +#include "parse-time-string.h"
> +
> +#define PREFIX "date:"
> +
> +/* See *ValueRangeProcessor in xapian-core/api/valuerangeproc.cc */
> +Xapian::valueno
> +ParseTimeValueRangeProcessor::operator() (std::string &begin, std::string &end)
> +{
> +    time_t t, now;
> +
> +    /* Require date: prefix in start of the range... */
> +    if (STRNCMP_LITERAL (begin.c_str (), PREFIX))

Could be
  if (begin.rfind (PREFIX, 0) == string::npos)
but that may not be clearer.

> +	return Xapian::BAD_VALUENO;
> +
> +    /* ...and remove it. */
> +    begin.erase (0, sizeof (PREFIX) - 1);
> +
> +    /* Use the same 'now' for begin and end. */
> +    if (time (&now) == (time_t) -1)
> +	return Xapian::BAD_VALUENO;
> +
> +    if (!begin.empty ()) {
> +	if (parse_time_string (begin.c_str (), &t, &now, PARSE_TIME_ROUND_DOWN))
> +	    return Xapian::BAD_VALUENO;
> +
> +	begin.assign (Xapian::sortable_serialise ((double) t));
> +    }
> +
> +    if (!end.empty ()) {
> +	if (parse_time_string (end.c_str (), &t, &now, PARSE_TIME_ROUND_UP_INCLUSIVE))
> +	    return Xapian::BAD_VALUENO;
> +
> +	end.assign (Xapian::sortable_serialise ((double) t));
> +    }
> +
> +    return valno;
> +}
> diff --git a/lib/parse-time-vrp.h b/lib/parse-time-vrp.h
> new file mode 100644
> index 0000000..526c217
> --- /dev/null
> +++ b/lib/parse-time-vrp.h
> @@ -0,0 +1,19 @@

Same thing about the preamble.

> +
> +#ifndef NOTMUCH_PARSE_TIME_VRP_H
> +#define NOTMUCH_PARSE_TIME_VRP_H
> +
> +#include <xapian.h>
> +
> +/* see *ValueRangeProcessor in xapian-core/include/xapian/queryparser.h */

Out of curiosity, why the Xapian source reference?
ValueRangeProcessor is documented along the rest of Xapian.

> +class ParseTimeValueRangeProcessor : public Xapian::ValueRangeProcessor {
> +protected:
> +    Xapian::valueno valno;
> +
> +public:
> +    ParseTimeValueRangeProcessor (Xapian::valueno slot_)
> +	: valno(slot_) { }
> +
> +    Xapian::valueno operator() (std::string &begin, std::string &end);
> +};
> +
> +#endif /* NOTMUCH_PARSE_TIME_VRP_H */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 8/9] man: document the date:since..until range queries
  2012-10-21 21:22 ` [PATCH v5 8/9] man: document the " Jani Nikula
@ 2012-10-24 21:08   ` Austin Clements
  2012-10-28 22:41     ` Jani Nikula
  0 siblings, 1 reply; 21+ messages in thread
From: Austin Clements @ 2012-10-24 21:08 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

Quoth Jani Nikula on Oct 22 at 12:22 am:
> ---
>  man/man7/notmuch-search-terms.7 |  147 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 135 insertions(+), 12 deletions(-)
> 
> diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7
> index 17a109e..fbd3ee7 100644
> --- a/man/man7/notmuch-search-terms.7
> +++ b/man/man7/notmuch-search-terms.7
> @@ -54,6 +54,8 @@ terms to match against specific portions of an email, (where
>  
>  	folder:<directory-path>
>  
> +	date:<since>..<until>
> +
>  The
>  .B from:
>  prefix is used to match the name or address of the sender of an email
> @@ -104,6 +106,26 @@ contained within particular directories within the mail store. Only
>  the directory components below the top-level mail database path are
>  available to be searched.
>  
> +The
> +.B date:
> +prefix can be used to restrict the results to only messages within a
> +particular time range (based on the Date: header) with a range syntax
> +of:
> +
> +	date:<since>..<until>
> +
> +See \fBDATE AND TIME SEARCH\fR below for details on the range
> +expression, and supported syntax for <since> and <until> date and time
> +expressions.
> +
> +The time range can also be specified using timestamps with a syntax
> +of:
> +
> +	<initial-timestamp>..<final-timestamp>
> +
> +Each timestamp is a number representing the number of seconds since
> +1970\-01\-01 00:00:00 UTC.
> +
>  In addition to individual terms, multiple terms can be
>  combined with Boolean operators (
>  .BR and ", " or ", " not
> @@ -117,20 +139,121 @@ operators, but will have to be protected from interpretation by the
>  shell, (such as by putting quotation marks around any parenthesized
>  expression).
>  
> -Finally, results can be restricted to only messages within a
> -particular time range, (based on the Date: header) with a syntax of:
> +.SH DATE AND TIME SEARCH
>  
> -	<initial-timestamp>..<final-timestamp>
> +This is a non-exhaustive description of the date and time search with
> +some pseudo notation. Most of the constructs can be mixed freely, and
> +in any order, but the same absolute date or time can't be expressed
> +twice.

I'm not sure what the end of this sentence means, though I assume it's
related to the restrictions on repeated absolute components.  It would
also be nice to give a broader view of the syntax here.  Maybe,

notmuch understands a variety of standard and natural ways of
expressing dates and times, both in absolute terms ("2012-10-24") and
in relative terms ("yesterday").  Any number of relative terms can be
combined ("1 hour 25 minutes") and an absolute date/time can be
combined with relative terms to further adjust it.  A non-exhaustive
description of the syntax supported for absolute and relative terms is
given below.

>  
> -Each timestamp is a number representing the number of seconds since
> -1970\-01\-01 00:00:00 UTC. This is not the most convenient means of
> -expressing date ranges, but until notmuch is fixed to accept a more
> -convenient form, one can use the date program to construct
> -timestamps. For example, with the bash shell the following syntax would
> -specify a date range to return messages from 2009\-10\-01 until the
> -current time:
> -
> -	$(date +%s \-d 2009\-10\-01)..$(date +%s)
> +.RS 4
> +.TP 4
> +.B The range expression
> +
> +date:<since>..<until>
> +
> +The above expression restricts the results to only messages from
> +<since> to <until>, based on the Date: header.
> +
> +If <since> or <until> describes time at an accuracy of days or less,
> +the date/time is rounded, towards past for <since> and towards future
> +for <until>, to be inclusive. For example, date:january..february

The accuracy doesn't seem to have have anything to do with days; if I
say "date:1hour..1hour" I get a span of an hour.  Describing it as
rounding also seems like it could be confusing to someone who hasn't
thought a lot about this (though, as someone who has though a lot
about this, I could be wrong).  What about something like,

<since> and <until> can describe imprecise times, such as "yesterday".
In this case, <since> is taken as the earliest time it could describe
(the beginning of yesterday) and <until> is taken as the latest time
it could describe (the end of yesterday).  Similarly,
date:january..february matches from the beginning of January to the
end of February.

> +matches from the beginning of January until the end of
> +February. Similarly, date:yesterday..yesterday matches from the
> +beginning of yesterday until the end of yesterday.
> +
> +Open-ended ranges are supported (since Xapian 1.2.1), i.e. it's
> +possible to specify date:..<until> or date:<since>.. to not limit the
> +start or end time, respectively. Unfortunately, pre-1.2.1 Xapian does

No need for the "Unfortunately".

> +not report an error on open ended ranges, but it does not work as
> +expected either.
> +
> +Xapian does not support spaces in range expressions. You can replace

The man pages essentially don't reference Xapian and the fact that we
use Xapian is transparent to the uninterested user.  Maybe just
"Currently, we do not support spaces ..."?  Or "Due to technical
limitations, we do not currently support spaces ..." if you want to
convey that we feel the user's pain but it's actually hard to fix.

> +the spaces with '_', or (in most cases) '-', or (in some cases) leave
> +the spaces out altogether.

Maybe add "Examples in this man page use spaces for clarity."?  It's
unfortunate that this rather critical piece of information is buried
in the middle of a subsection of the man page.  I wonder if it should
at least go before the previous paragraph?  We are going to get so
many people asking why their date searches don't work...

> +
> +Entering date:expr without ".." (for example date:yesterday) won't
> +work, as it's not interpreted as a range expression at all. You can
> +achieve the expected result by duplicating the expr both sides of ".."
> +(for example date:yesterday..yesterday).
> +.RE
> +
> +.RS 4
> +.TP 4
> +.B Relative date and time
> +[N|number] (years|months|weeks|days|hours|hrs|minutes|mins|seconds|secs) [...]
> +
> +All refer to past, can be repeated and will be accumulated.
> +
> +Units can be abbreviated to any length, with the otherwise ambiguous
> +single m being m for minutes and M for months.
> +
> +Number multiplier can also be written out one, two, ..., ten, dozen,

This is the only use of "multiplier".  I think it would be fine to
just say "the number".

> +hundred. As special cases last means one ("last week") and this means
> +zero ("this month").

Maybe, "Additionally, the unit may be preceded by "last" or "this"
(e.g., "last week" or "this month")."?

> +
> +When combined with absolute date and time, the relative date and time
> +specification will be relative from the specified absolute date and
> +time.
> +
> +Examples: 5M2d, two weeks
> +.RE
> +
> +.RS 4
> +.TP 4
> +.B Supported time formats

Supported absolute time formats?

> +H[H]:MM[:SS] [(am|a.m.|pm|p.m.)]
> +
> +H[H] (am|a.m.|pm|p.m.)
> +
> +HHMMSS
> +
> +now
> +
> +noon
> +
> +midnight
> +
> +Examples: 17:05, 5pm
> +.RE
> +
> +.RS 4
> +.TP 4
> +.B Supported date formats

Supported absolute date formats?

> +YYYY-MM[-DD]
> +
> +DD-MM[-[YY]YY]
> +
> +MM-YYYY
> +
> +M[M]/D[D][/[YY]YY]
> +
> +M[M]/YYYY
> +
> +D[D].M[M][.[YY]YY]
> +
> +D[D][(st|nd|rd|th)] Mon[thname] [YYYY]
> +
> +Mon[thname] D[D][(st|nd|rd|th)] [YYYY]
> +
> +Wee[kday]
> +
> +Month names can be abbreviated at three or more characters.
> +
> +Weekday names can be abbreviated at three or more characters.
> +
> +Examples: 2012-07-31, 31-07-2012, 7/31/2012, August 3
> +.RE
> +
> +.RS 4
> +.TP 4
> +.B Time zones
> +(+|-)HH:MM
> +
> +(+|-)HH[MM]
> +
> +Some time zone codes, e.g. UTC, EET.
> +.RE
>  
>  .SH SEE ALSO
>  

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch
  2012-10-22  8:14   ` Austin Clements
@ 2012-10-25 18:58     ` Austin Clements
  2012-10-27 20:38       ` Tomi Ollila
  2012-10-28 22:30     ` Jani Nikula
  1 sibling, 1 reply; 21+ messages in thread
From: Austin Clements @ 2012-10-25 18:58 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

Quoth myself on Oct 22 at  4:14 am:
> Overall this looks pretty good to me, and I must say, this parser is
> amazingly flexible and copes well with a remarkably hostile grammar.
> 
> A lot of little comments below (sorry if any of this ground has
> already been covered in the previous four versions).
> 
> I do have one broad comment.  While I'm all for ad hoc parsers for ad
> hoc grammars like dates, there is one piece of the literature I think
> this parser suffers for by ignoring: tokenizing.  I think it would
> simplify a lot of this code if it did a tokenizing pass before the
> parsing pass.  It doesn't have to be a serious tokenizer with
> streaming and keywords and token types and junk; just something that
> first splits the input into substrings, possibly just non-overlapping
> matches of [[:digit:]]+|[[:alpha:]]+|[-+:/.].  This would simplify the
> handling of postponed numbers because, with trivial lookahead in the
> token stream, you wouldn't have to postpone them.  Likewise, it would
> eliminate last_field.  It would simplify keyword matching because you
> wouldn't have to worry about matching substrings (I spent a long time
> staring at that code before I figured out what it would and wouldn't
> accept).  Most important, I think it would make the parser more
> predictable for users; for example, the parser currently accepts
> things like "saturtoday" because it's aggressively single-pass.

I should add that I am not at all opposed to this patch as it is
currently designed.  We need a date parser.  My comment about
separating tokenization is just a way that this code could probably be
simplified if someone were so inclined or if simplifying the code
would help it pass any hurdles.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch
  2012-10-25 18:58     ` Austin Clements
@ 2012-10-27 20:38       ` Tomi Ollila
  0 siblings, 0 replies; 21+ messages in thread
From: Tomi Ollila @ 2012-10-27 20:38 UTC (permalink / raw)
  To: Austin Clements, Jani Nikula; +Cc: notmuch

On Thu, Oct 25 2012, Austin Clements wrote:

> Quoth myself on Oct 22 at  4:14 am:
>> Overall this looks pretty good to me, and I must say, this parser is
>> amazingly flexible and copes well with a remarkably hostile grammar.
>> 
>> A lot of little comments below (sorry if any of this ground has
>> already been covered in the previous four versions).
>> 
>> I do have one broad comment.  While I'm all for ad hoc parsers for ad
>> hoc grammars like dates, there is one piece of the literature I think
>> this parser suffers for by ignoring: tokenizing.  I think it would
>> simplify a lot of this code if it did a tokenizing pass before the
>> parsing pass.  It doesn't have to be a serious tokenizer with
>> streaming and keywords and token types and junk; just something that
>> first splits the input into substrings, possibly just non-overlapping
>> matches of [[:digit:]]+|[[:alpha:]]+|[-+:/.].  This would simplify the
>> handling of postponed numbers because, with trivial lookahead in the
>> token stream, you wouldn't have to postpone them.  Likewise, it would
>> eliminate last_field.  It would simplify keyword matching because you
>> wouldn't have to worry about matching substrings (I spent a long time
>> staring at that code before I figured out what it would and wouldn't
>> accept).  Most important, I think it would make the parser more
>> predictable for users; for example, the parser currently accepts
>> things like "saturtoday" because it's aggressively single-pass.
>
> I should add that I am not at all opposed to this patch as it is
> currently designed.  We need a date parser.  My comment about
> separating tokenization is just a way that this code could probably be
> simplified if someone were so inclined or if simplifying the code
> would help it pass any hurdles.

What if the current patch set, i.e. messages

$ grep Message-Id: ~/patch | sed 's/Message-Id: /id:/; y/<>/""/'
id:"e684cadbb5a01b6079ef344b0d6f97541847914a.1350854171.git.jani@nikula.org"
id:"a90d3b687895a26f765539d6c0420038a74ee42f.1350854171.git.jani@nikula.org"
id:"75a8f129d5e0d824b3e04ddfc1816c45fa0ec70d.1350854171.git.jani@nikula.org"
id:"606a94d565e6b21abfc59d6ba9676a807d669127.1350854171.git.jani@nikula.org"
id:"cbd383bfc4bf844bb0366f13f675d48956137c52.1350854171.git.jani@nikula.org"
id:"f21b8702728457c087478b26700e9448bc16c61d.1350854171.git.jani@nikula.org"
id:"37026480956679b12e82e4975f1837e93ef1c531.1350854171.git.jani@nikula.org"
id:"cff9c1dd87b8bc11326dca0b3589c81656500f5e.1350854171.git.jani@nikula.org"

(patches 1-8 / 9 -- NEWS patch is stale) would just be pushed: there are
just few trivial things to be tuned and NEWS rebased -- which I think 
Jani will gladly do... It is just so much easier for him to continue
and us others to review the new diffs than these whole patches again
and again... At least I volunteer to track that these remaining issues
(tokenizer not included :).

Tomi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch
  2012-10-22  8:14   ` Austin Clements
  2012-10-25 18:58     ` Austin Clements
@ 2012-10-28 22:30     ` Jani Nikula
  2012-10-28 22:52       ` Austin Clements
  1 sibling, 1 reply; 21+ messages in thread
From: Jani Nikula @ 2012-10-28 22:30 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

On Mon, 22 Oct 2012, Austin Clements <amdragon@MIT.EDU> wrote:
> Overall this looks pretty good to me, and I must say, this parser is
> amazingly flexible and copes well with a remarkably hostile grammar.
>
> A lot of little comments below (sorry if any of this ground has
> already been covered in the previous four versions).

Nope, apart from "postponed numbers are confusing".

> I do have one broad comment.  While I'm all for ad hoc parsers for ad
> hoc grammars like dates, there is one piece of the literature I think
> this parser suffers for by ignoring: tokenizing.  I think it would
> simplify a lot of this code if it did a tokenizing pass before the
> parsing pass.  It doesn't have to be a serious tokenizer with
> streaming and keywords and token types and junk; just something that
> first splits the input into substrings, possibly just non-overlapping
> matches of [[:digit:]]+|[[:alpha:]]+|[-+:/.].  This would simplify the
> handling of postponed numbers because, with trivial lookahead in the
> token stream, you wouldn't have to postpone them.  Likewise, it would
> eliminate last_field.  It would simplify keyword matching because you
> wouldn't have to worry about matching substrings (I spent a long time
> staring at that code before I figured out what it would and wouldn't
> accept).  Most important, I think it would make the parser more
> predictable for users; for example, the parser currently accepts
> things like "saturtoday" because it's aggressively single-pass.

I'll fix this to require a non-keyword character between keywords, but,
since you're not adamant about it, I'll pass adding the tokenizer, at
least for now.

> Quoth Jani Nikula on Oct 22 at 12:22 am:
>> Add a date/time parser to notmuch, to be used for adding date range
>> query support for notmuch lib later on. Add the parser to a directory
>> of its own to make it independent of the rest of the notmuch code
>> base.
>> 
>> Signed-off-by: Jani Nikula <jani@nikula.org>
>> ---
>>  Makefile                              |    2 +-
>>  parse-time-string/Makefile            |    5 +
>>  parse-time-string/Makefile.local      |   12 +
>>  parse-time-string/README              |    9 +
>>  parse-time-string/parse-time-string.c | 1477 +++++++++++++++++++++++++++++++++
>>  parse-time-string/parse-time-string.h |  102 +++
>>  6 files changed, 1606 insertions(+), 1 deletion(-)
>>  create mode 100644 parse-time-string/Makefile
>>  create mode 100644 parse-time-string/Makefile.local
>>  create mode 100644 parse-time-string/README
>>  create mode 100644 parse-time-string/parse-time-string.c
>>  create mode 100644 parse-time-string/parse-time-string.h
>> 
>> diff --git a/Makefile b/Makefile
>> index e5e2e3a..bb9c316 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -3,7 +3,7 @@
>>  all:
>>  
>>  # List all subdirectories here. Each contains its own Makefile.local
>> -subdirs = compat completion emacs lib man util test
>> +subdirs = compat completion emacs lib man parse-time-string util test
>>  
>>  # We make all targets depend on the Makefiles themselves.
>>  global_deps = Makefile Makefile.config Makefile.local \
>> diff --git a/parse-time-string/Makefile b/parse-time-string/Makefile
>> new file mode 100644
>> index 0000000..fa25832
>> --- /dev/null
>> +++ b/parse-time-string/Makefile
>> @@ -0,0 +1,5 @@
>> +all:
>> +	$(MAKE) -C .. all
>> +
>> +.DEFAULT:
>> +	$(MAKE) -C .. $@
>> diff --git a/parse-time-string/Makefile.local b/parse-time-string/Makefile.local
>> new file mode 100644
>> index 0000000..53534f3
>> --- /dev/null
>> +++ b/parse-time-string/Makefile.local
>> @@ -0,0 +1,12 @@
>> +dir := parse-time-string
>> +extra_cflags += -I$(srcdir)/$(dir)
>> +
>> +libparse-time-string_c_srcs := $(dir)/parse-time-string.c
>> +
>> +libparse-time-string_modules := $(libparse-time-string_c_srcs:.c=.o)
>> +
>> +$(dir)/libparse-time-string.a: $(libparse-time-string_modules)
>> +	$(call quiet,AR) rcs $@ $^
>> +
>> +SRCS := $(SRCS) $(libparse-time-string_c_srcs)
>> +CLEAN := $(CLEAN) $(libparse-time-string_modules) $(dir)/libparse-time-string.a
>> diff --git a/parse-time-string/README b/parse-time-string/README
>> new file mode 100644
>> index 0000000..300ff1f
>> --- /dev/null
>> +++ b/parse-time-string/README
>> @@ -0,0 +1,9 @@
>> +PARSE TIME STRING
>> +=================
>> +
>> +parse_time_string() is a date/time parser originally written for
>> +notmuch by Jani Nikula <jani@nikula.org>. However, there is nothing
>> +notmuch specific in it, and it should be kept reusable for other
>> +projects, and ready to be packaged on its own as needed. Please do not
>> +add dependencies on or references to anything notmuch specific. The
>> +parser should only depend on the C library.
>> diff --git a/parse-time-string/parse-time-string.c b/parse-time-string/parse-time-string.c
>> new file mode 100644
>> index 0000000..942041a
>> --- /dev/null
>> +++ b/parse-time-string/parse-time-string.c
>> @@ -0,0 +1,1477 @@
>> +/*
>> + * parse time string - user friendly date and time parser
>> + * Copyright © 2012 Jani Nikula
>> + *
>> + * This program is free software: you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation, either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Author: Jani Nikula <jani@nikula.org>
>> + */
>> +
>> +#include <assert.h>
>> +#include <ctype.h>
>> +#include <errno.h>
>> +#include <limits.h>
>> +#include <stdio.h>
>> +#include <stdarg.h>
>> +#include <stdbool.h>
>> +#include <stdlib.h>
>> +#include <string.h>
>> +#include <strings.h>
>> +#include <time.h>
>> +#include <sys/time.h>
>> +#include <sys/types.h>
>> +
>> +#include "parse-time-string.h"
>> +
>> +/*
>> + * IMPLEMENTATION DETAILS
>> + *
>> + * At a high level, the parsing is done in two phases: 1) actual
>> + * parsing of the input string and storing the parsed data into
>> + * 'struct state', and 2) processing of the data in 'struct state'
>> + * according to current time (or provided reference time) and
>> + * rounding. This is evident in the main entry point function
>> + * parse_time_string().
>> + *
>> + * 1) The parsing phase - parse_input()
>> + *
>> + * Parsing is greedy and happens from left to right. The parsing is as
>> + * unambiguous as possible; only unambiguous date/time formats are
>> + * accepted. Redundant or contradictory absolute date/time in the
>> + * input (e.g. date specified multiple times/ways) is not
>> + * accepted. Relative date/time on the other hand just accumulates if
>> + * present multiple times (e.g. "5 days 5 days" just turns into 10
>> + * days).
>> + *
>> + * Parsing decisions are made on the input format, not value. For
>> + * example, "20/5/2005" fails because the recognized format here is
>> + * MM/D/YYYY, even though the values would suggest DD/M/YYYY.
>> + *
>> + * Parsing is mostly stateless in the sense that parsing decisions are
>> + * not made based on the values of previously parsed data, or whether
>> + * certain data is present in the first place. (There are a few
>> + * exceptions to the latter part, though, such as parsing of time zone
>> + * that would otherwise look like plain time.)
>> + *
>> + * When the parser encounters a number that is not greedily parsed as
>> + * part of a format, the interpretation is postponed until the next
>> + * token is parsed. The parser for the next token may consume the
>> + * previously postponed number. For example, when parsing "20 May" the
>> + * meaning of "20" is not known until "May" is parsed. If the parser
>> + * for the next token does not consume the postponed number, the
>> + * number is handled as a "lone" number before parser for the next
>> + * token finishes.
>> + *
>> + * 2) The processing phase - create_output()
>> + *
>> + * Once the parser in phase 1 has finished, 'struct state' contains
>> + * all the information from the input string, and it's no longer
>> + * needed. Since the parser does not even handle the concept of "now",
>> + * the processing initializes the fields referring to the current
>> + * date/time.
>> + *
>> + * If requested, the result is rounded towards past or future. The
>> + * idea behind rounding is to support parsing date/time ranges in an
>> + * obvious way. For example, for a range defined as two dates (without
>> + * time), one would typically want to have an inclusive range from the
>> + * beginning of start date to the end of the end date. The caller
>> + * would use rounding towards past in the start date, and towards
>> + * future in the end date.
>> + *
>> + * The absolute date and time is shifted by the relative date and
>> + * time, and time zone adjustments are made. Daylight saving time
>> + * (DST) is specifically *not* handled at all.
>> + *
>> + * Finally, the result is stored to time_t.
>> + */
>> +
>> +#define unused(x) x __attribute__ ((unused))
>> +
>> +/* XXX: Redefine these to add i18n support. The keyword table uses
>> + * N_() to mark strings to be translated; they are accessed
>> + * dynamically using _(). */
>> +#define _(s) (s)	/* i18n: define as gettext (s) */
>> +#define N_(s) (s)	/* i18n: define as gettext_noop (s) */
>> +
>> +#define ARRAY_SIZE(a) (sizeof (a) / sizeof (a[0]))
>> +
>> +/*
>> + * Field indices in the tm and set arrays of struct state.
>> + *
>> + * NOTE: There's some code that depends on the ordering of this enum.
>> + */
>> +enum field {
>> +    /* Keep SEC...YEAR in this order. */
>> +    TM_ABS_SEC,		/* seconds */
>> +    TM_ABS_MIN,		/* minutes */
>> +    TM_ABS_HOUR,	/* hours */
>> +    TM_ABS_MDAY,	/* day of the month */
>> +    TM_ABS_MON,		/* month */
>> +    TM_ABS_YEAR,	/* year */
>> +
>> +    TM_ABS_WDAY,	/* day of the week. special: may be relative */
>
> Given that this may be relative, should it really be called
> TM_ABS_WDAY?

Will change to TM_WDAY.

>> +    TM_ABS_ISDST,	/* daylight saving time */
>> +
>> +    TM_AMPM,		/* am vs. pm */
>> +    TM_TZ,		/* timezone in minutes */
>> +
>> +    /* Keep SEC...YEAR in this order. */
>> +    TM_REL_SEC,		/* seconds relative to absolute or reference time */
>> +    TM_REL_MIN,		/* minutes ... */
>> +    TM_REL_HOUR,	/* hours ... */
>> +    TM_REL_DAY,		/* days ... */
>> +    TM_REL_MON,		/* months ... */
>> +    TM_REL_YEAR,	/* years ... */
>> +    TM_REL_WEEK,	/* weeks ... */
>> +
>> +    TM_NONE,		/* not a field */
>> +
>> +    TM_SIZE = TM_NONE,
>> +    TM_FIRST_ABS = TM_ABS_SEC,
>> +    TM_FIRST_REL = TM_REL_SEC,
>> +};
>> +
>> +/* Values for the set array of struct state. */
>> +enum field_set {
>> +    FIELD_UNSET,	/* The field has not been touched by parser. */
>> +    FIELD_SET,		/* The field has been set by parser. */
>> +    FIELD_NOW,		/* The field will be set to reference time. */
>> +};
>> +
>> +static enum field
>> +next_abs_field (enum field field)
>> +{
>> +    /* NOTE: Depends on the enum ordering. */
>> +    return field < TM_ABS_YEAR ? field + 1 : TM_NONE;
>> +}
>> +
>> +static enum field
>> +abs_to_rel_field (enum field field)
>> +{
>> +    assert (field <= TM_ABS_YEAR);
>> +
>> +    /* NOTE: Depends on the enum ordering. */
>> +    return field + (TM_FIRST_REL - TM_FIRST_ABS);
>> +}
>> +
>> +/* Get epoch value for field. */
>
> Explain what an "epoch value" for a field is.

Will do.

>> +static int
>> +field_epoch (enum field field)
>> +{
>> +    if (field == TM_ABS_MDAY || field == TM_ABS_MON)
>> +	return 1;
>> +    else if (field == TM_ABS_YEAR)
>> +	return 1970;
>> +    else
>> +	return 0;
>> +}
>> +
>> +/* The parsing state. */
>> +struct state {
>> +    int tm[TM_SIZE];			/* parsed date and time */
>> +    enum field_set set[TM_SIZE];	/* set status of tm */
>> +
>> +    enum field last_field;	/* Previously set field. */
>> +    char delim;
>> +
>> +    int postponed_length;	/* Number of digits in postponed value. */
>> +    int postponed_value;
>> +    char postponed_delim;	/* The delimiter preceding postponed number. */
>> +};
>> +
>> +/*
>> + * Helpers for postponed numbers.
>> + *
>> + * postponed_length is the number of digits in postponed value. 0
>> + * means there is no postponed number. -1 means there is a postponed
>> + * number, but it comes from a keyword, and it doesn't have digits.
>> + */
>> +static int
>> +get_postponed_length (struct state *state)
>> +{
>> +    return state->postponed_length;
>> +}
>> +
>> +/*
>> + * Consume a previously postponed number. Return true if a number was
>> + * in fact postponed, false otherwise. Store the postponed number's
>> + * value in *v, length in the input string in *n (or -1 if the number
>> + * was written out and parsed as a keyword), and the preceding
>> + * delimiter to *d.
>
> Mention that v, n, and d are unchanged if no number is postponed?  You
> exploit this for default values elsewhere in the code.

Will do.

>> + */
>> +static bool
>> +get_postponed_number (struct state *state, int *v, int *n, char *d)
>
> Maybe "consume_postponed_number" to emphasize that this function has
> side-effects (and isn't simply a "getter")?

Agreed.

>> +{
>> +    if (!state->postponed_length)
>> +	return false;
>> +
>> +    if (n)
>> +	*n = state->postponed_length;
>> +
>> +    if (v)
>> +	*v = state->postponed_value;
>> +
>> +    if (d)
>> +	*d = state->postponed_delim;
>> +
>> +    state->postponed_length = 0;
>> +    state->postponed_value = 0;
>> +    state->postponed_delim = 0;
>> +
>> +    return true;
>> +}
>> +
>> +static int parse_postponed_number (struct state *state, enum field next_field);
>> +
>> +/*
>> + * Postpone a number to be handled later. If one exists already,
>> + * handle it first. n may be -1 to indicate a keyword that has no
>> + * number length.
>> + */
>> +static int
>> +set_postponed_number (struct state *state, int v, int n)
>> +{
>> +    int r;
>> +    char d = state->delim;
>> +
>> +    /* Parse a previously postponed number, if any. */
>> +    r = parse_postponed_number (state, TM_NONE);
>> +    if (r)
>> +	return r;
>> +
>> +    state->postponed_length = n;
>> +    state->postponed_value = v;
>> +    state->postponed_delim = d;
>> +
>> +    return 0;
>> +}
>> +
>> +static void
>> +set_delim (struct state *state, char delim)
>> +{
>> +    state->delim = delim;
>> +}
>> +
>> +static void
>> +unset_delim (struct state *state)
>> +{
>> +    state->delim = 0;
>> +}
>> +
>> +/*
>> + * Field set/get/mod helpers.
>> + */
>> +
>> +/* Return true if field has been set. */
>> +static bool
>> +is_field_set (struct state *state, enum field field)
>> +{
>> +    assert (field < ARRAY_SIZE (state->tm));
>> +
>> +    return field < ARRAY_SIZE (state->set) &&
>
> state->tm and state->set are the same size, so this will always by
> true given that the assert hasn't fired.  Is this just defensive
> programming?

This is leftover from when state->tm and state->set weren't the same
size. Will clean up the asserts and checks.

>> +	   state->set[field] != FIELD_UNSET;
>> +}
>> +
>> +static void
>> +unset_field (struct state *state, enum field field)
>> +{
>> +    assert (field < ARRAY_SIZE (state->tm));
>> +
>> +    state->set[field] = FIELD_UNSET;
>> +    state->tm[field] = 0;
>> +}
>> +
>> +/*
>> + * Set field to value. A field can only be set once to ensure the
>> + * input does not contain redundant and potentially conflicting data.
>> + */
>> +static int
>> +set_field (struct state *state, enum field field, int value)
>> +{
>> +    int r;
>> +
>> +    assert (field < ARRAY_SIZE (state->tm));
>> +
>> +    /* Fields can only be set once. */
>> +    if (field < ARRAY_SIZE (state->set) && state->set[field] != FIELD_UNSET)
>
> Same comment about array sizes.  Also, this should probably call
> is_field_set instead of open-coding it (which would make the array
> size check even more redundant!)

Agreed.

>> +	return -PARSE_TIME_ERR_ALREADYSET;
>> +
>> +    state->set[field] = FIELD_SET;
>> +
>> +    /* Parse a previously postponed number, if any. */
>> +    r = parse_postponed_number (state, field);
>
> I don't understand the big picture with postponed number handling yet,
> but is it worth mentioning in this function's doc comment that it
> processes postponed numbers?
>
>> +    if (r)
>> +	return r;
>> +
>> +    unset_delim (state);
>> +
>> +    state->tm[field] = value;
>> +    state->last_field = field;
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Mark n fields in fields to be set to the reference date/time in the
>> + * specified time zone, or local timezone if not specified. The fields
>> + * will be initialized after parsing is complete and timezone is
>> + * known.
>> + */
>> +static int
>> +set_fields_to_now (struct state *state, enum field *fields, size_t n)
>> +{
>> +    size_t i;
>> +    int r;
>> +
>> +    for (i = 0; i < n; i++) {
>> +	r = set_field (state, fields[i], 0);
>> +	if (r)
>> +	    return r;
>> +	state->set[fields[i]] = FIELD_NOW;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* Modify field by adding value to it. To be used on relative fields,
>> + * which can be modified multiple times (to accumulate). */
>> +static int
>> +mod_field (struct state *state, enum field field, int value)
>
> add_to_field?

Agreed.

>> +{
>> +    int r;
>> +
>> +    assert (field < ARRAY_SIZE (state->tm));   /* assert relative??? */
>> +
>> +    if (field < ARRAY_SIZE (state->set))
>
> Another redundant check?

Yes.

>> +	state->set[field] = FIELD_SET;
>> +
>> +    /* Parse a previously postponed number, if any. */
>> +    r = parse_postponed_number (state, field);
>
> This postponed number stuff is getting really confusing...

Ouch...

>> +    if (r)
>> +	return r;
>> +
>> +    unset_delim (state);
>> +
>> +    state->tm[field] += value;
>> +    state->last_field = field;
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Get field value. Make sure the field is set before query. It's most
>> + * likely an error to call this while parsing (for example fields set
>> + * as FIELD_NOW will only be set to some value after parsing).
>> + */
>> +static int
>> +get_field (struct state *state, enum field field)
>> +{
>> +    assert (field < ARRAY_SIZE (state->tm));
>
> Assert that the field is set?

The relative fields might not be set, but have 0 value by default,
during create_output().

>> +
>> +    return state->tm[field];
>> +}
>> +
>> +/*
>> + * Validity checkers.
>> + */
>> +static bool is_valid_12hour (int h)
>> +{
>> +    return h >= 0 && h <= 12;
>
> h >= 1?

Will fix.

>> +}
>> +
>> +static bool is_valid_time (int h, int m, int s)
>> +{
>> +    /* Allow 24:00:00 to denote end of day. */
>> +    if (h == 24 && m == 0 && s == 0)
>> +	return true;
>> +
>> +    return h >= 0 && h <= 23 && m >= 0 && m <= 59 && s >= 0 && s <= 59;
>> +}
>> +
>> +static bool is_valid_mday (int mday)
>> +{
>> +    return mday >= 1 && mday <= 31;
>> +}
>> +
>> +static bool is_valid_mon (int mon)
>> +{
>> +    return mon >= 1 && mon <= 12;
>> +}
>> +
>> +static bool is_valid_year (int year)
>> +{
>> +    return year >= 1970;
>> +}
>> +
>> +static bool is_valid_date (int year, int mon, int mday)
>> +{
>> +    return is_valid_year (year) && is_valid_mon (mon) && is_valid_mday (mday);
>> +}
>> +
>> +/* Unset indicator for time and date set helpers. */
>> +#define UNSET -1
>> +
>> +/* Time set helper. No input checking. Use UNSET (-1) to leave unset. */
>> +static int
>> +set_abs_time (struct state *state, int hour, int min, int sec)
>> +{
>> +    int r;
>> +
>> +    if (hour != UNSET) {
>> +	if ((r = set_field (state, TM_ABS_HOUR, hour)))
>> +	    return r;
>> +    }
>> +
>> +    if (min != UNSET) {
>> +	if ((r = set_field (state, TM_ABS_MIN, min)))
>> +	    return r;
>> +    }
>> +
>> +    if (sec != UNSET) {
>> +	if ((r = set_field (state, TM_ABS_SEC, sec)))
>> +	    return r;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* Date set helper. No input checking. Use UNSET (-1) to leave unset. */
>> +static int
>> +set_abs_date (struct state *state, int year, int mon, int mday)
>> +{
>> +    int r;
>> +
>> +    if (year != UNSET) {
>> +	if ((r = set_field (state, TM_ABS_YEAR, year)))
>> +	    return r;
>> +    }
>> +
>> +    if (mon != UNSET) {
>> +	if ((r = set_field (state, TM_ABS_MON, mon)))
>> +	    return r;
>> +    }
>> +
>> +    if (mday != UNSET) {
>> +	if ((r = set_field (state, TM_ABS_MDAY, mday)))
>> +	    return r;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Keyword parsing and handling.
>> + */
>> +struct keyword;
>> +typedef int (*setter_t)(struct state *state, struct keyword *kw);
>> +
>> +struct keyword {
>> +    const char *name;	/* keyword */
>> +    enum field field;	/* field to set, or FIELD_NONE if N/A */
>> +    int value;		/* value to set, or 0 if N/A */
>> +    setter_t set;	/* function to use for setting, if non-NULL */
>> +};
>> +
>> +/*
>> + * Setter callback functions for keywords.
>> + */
>> +static int
>> +kw_set_default (struct state *state, struct keyword *kw)
>
> It took me a while to figure out what the name of this had to do with
> the action it performs, then I realized that it's never used in the
> table and only called when set is NULL.  Given that, I think it would
> make more sense to just put the set_field call in place of the one
> current call to kw_set_default.  Currently, this seems like one
> indirection too much.

Agreed.

>> +{
>> +    return set_field (state, kw->field, kw->value);
>> +}
>> +
>> +static int
>> +kw_set_rel (struct state *state, struct keyword *kw)
>> +{
>> +    int multiplier = 1;
>> +
>> +    /* Get a previously set multiplier, if any. */
>> +    get_postponed_number (state, &multiplier, NULL, NULL);
>> +
>> +    /* Accumulate relative field values. */
>> +    return mod_field (state, kw->field, multiplier * kw->value);
>> +}
>> +
>> +static int
>> +kw_set_number (struct state *state, struct keyword *kw)
>> +{
>> +    /* -1 = no length, from keyword. */
>> +    return set_postponed_number (state, kw->value, -1);
>> +}
>> +
>> +static int
>> +kw_set_month (struct state *state, struct keyword *kw)
>> +{
>> +    int n = get_postponed_length (state);
>> +
>> +    /* Consume postponed number if it could be mday. This handles "20
>> +     * January". */
>> +    if (n == 1 || n == 2) {
>
> Should this be (n && is_valid_mday (state->postponed_value))?  It
> seems a little odd that postponed numbers three digits or longer are
> treated as independent, but two digits numbers > 31 are an error.

I'm inclined to treating any one- or two-digit number preceding a month
name as day of the month, and not letting the value affect that
decision.

>> +	int r, v;
>> +
>> +	get_postponed_number (state, &v, NULL, NULL);
>> +
>> +	if (!is_valid_mday (v))
>> +	    return -PARSE_TIME_ERR_INVALIDDATE;
>> +
>> +	r = set_field (state, TM_ABS_MDAY, v);
>> +	if (r)
>> +	    return r;
>> +    }
>> +
>> +    return set_field (state, kw->field, kw->value);
>> +}
>> +
>> +static int
>> +kw_set_ampm (struct state *state, struct keyword *kw)
>> +{
>> +    int n = get_postponed_length (state);
>> +
>> +    /* Consume postponed number if it could be hour. This handles
>> +     * "5pm". */
>> +    if (n == 1 || n == 2) {
>
> Same comment as for kw_set_month.

Same as above.

>> +	int r, v;
>> +
>> +	get_postponed_number (state, &v, NULL, NULL);
>> +
>> +	if (!is_valid_12hour (v))
>> +	    return -PARSE_TIME_ERR_INVALIDTIME;
>> +
>> +	r = set_abs_time (state, v, 0, 0);
>> +	if (r)
>> +	    return r;
>> +    }
>> +
>> +    return set_field (state, kw->field, kw->value);
>> +}
>> +
>> +static int
>> +kw_set_timeofday (struct state *state, struct keyword *kw)
>> +{
>> +    return set_abs_time (state, kw->value, 0, 0);
>> +}
>> +
>> +static int
>> +kw_set_today (struct state *state, unused (struct keyword *kw))
>> +{
>> +    enum field fields[] = { TM_ABS_YEAR, TM_ABS_MON, TM_ABS_MDAY };
>> +
>> +    return set_fields_to_now (state, fields, ARRAY_SIZE (fields));
>> +}
>> +
>> +static int
>> +kw_set_now (struct state *state, unused (struct keyword *kw))
>> +{
>> +    enum field fields[] = { TM_ABS_HOUR, TM_ABS_MIN, TM_ABS_SEC };
>> +
>> +    return set_fields_to_now (state, fields, ARRAY_SIZE (fields));
>> +}
>> +
>> +static int
>> +kw_set_ordinal (struct state *state, struct keyword *kw)
>> +{
>> +    int n, v;
>> +
>> +    /* Require a postponed number. */
>> +    if (!get_postponed_number (state, &v, &n, NULL))
>> +	return -PARSE_TIME_ERR_DATEFORMAT;
>> +
>> +    /* Ordinals are mday. */
>> +    if (n != 1 && n != 2)
>
> Is this redundant with your is_valid_mday test below?

No, this rejects stuff like "005th" and "five th".

>> +	return -PARSE_TIME_ERR_DATEFORMAT;
>> +
>> +    /* Be strict about st, nd, rd, and lax about th. */
>> +    if (strcasecmp (kw->name, "st") == 0 && v != 1 && v != 21 && v != 31)
>> +	return -PARSE_TIME_ERR_INVALIDDATE;
>> +    else if (strcasecmp (kw->name, "nd") == 0 && v != 2 && v != 22)
>> +	return -PARSE_TIME_ERR_INVALIDDATE;
>> +    else if (strcasecmp (kw->name, "rd") == 0 && v != 3 && v != 23)
>> +	return -PARSE_TIME_ERR_INVALIDDATE;
>> +    else if (strcasecmp (kw->name, "th") == 0 && !is_valid_mday (v))
>> +	return -PARSE_TIME_ERR_INVALIDDATE;
>> +
>> +    return set_field (state, TM_ABS_MDAY, v);
>> +}
>> +
>> +/*
>> + * Accepted keywords.
>> + *
>> + * A keyword may optionally contain a '|' to indicate the minimum
>> + * match length. Without one, full match is required. It's advisable
>> + * to keep the minimum match parts unique across all keywords.
>> + *
>> + * If keyword begins with upper case letter, then the matching will be
>> + * case sensitive. Otherwise the matching is case insensitive.
>> + *
>> + * If setter is NULL, set_default will be used.
>> + *
>> + * Note: Order matters. Matching is greedy, longest match is used, but
>> + * of equal length matches the first one is used, unless there's an
>> + * equal length case sensitive match which trumps case insensitive
>> + * matches.
>
> If you do have a tokenizer (or disallow mashing keywords together),
> then all of complexity arising from longest match goes away because
> the keyword token either will or won't match a keyword.  If you also
> eliminate the rule for case sensitivity and put case-sensitive things
> before conflicting case-insensitive things (so put "M" before
> "m|inutes"), then you can simply use the first match.

At least one reason for going through the whole table is that if this
ever gets i18n support, the conflicting things might be different. While
order matters in principle, you should create the table so that it
really doesn't matter.

>
>> + */
>> +static struct keyword keywords[] = {
>> +    /* Weekdays. */
>> +    { N_("sun|day"),	TM_ABS_WDAY,	0,	NULL },
>> +    { N_("mon|day"),	TM_ABS_WDAY,	1,	NULL },
>> +    { N_("tue|sday"),	TM_ABS_WDAY,	2,	NULL },
>> +    { N_("wed|nesday"),	TM_ABS_WDAY,	3,	NULL },
>> +    { N_("thu|rsday"),	TM_ABS_WDAY,	4,	NULL },
>> +    { N_("fri|day"),	TM_ABS_WDAY,	5,	NULL },
>> +    { N_("sat|urday"),	TM_ABS_WDAY,	6,	NULL },
>> +
>> +    /* Months. */
>> +    { N_("jan|uary"),	TM_ABS_MON,	1,	kw_set_month },
>> +    { N_("feb|ruary"),	TM_ABS_MON,	2,	kw_set_month },
>> +    { N_("mar|ch"),	TM_ABS_MON,	3,	kw_set_month },
>> +    { N_("apr|il"),	TM_ABS_MON,	4,	kw_set_month },
>> +    { N_("may"),	TM_ABS_MON,	5,	kw_set_month },
>> +    { N_("jun|e"),	TM_ABS_MON,	6,	kw_set_month },
>> +    { N_("jul|y"),	TM_ABS_MON,	7,	kw_set_month },
>> +    { N_("aug|ust"),	TM_ABS_MON,	8,	kw_set_month },
>> +    { N_("sep|tember"),	TM_ABS_MON,	9,	kw_set_month },
>> +    { N_("oct|ober"),	TM_ABS_MON,	10,	kw_set_month },
>> +    { N_("nov|ember"),	TM_ABS_MON,	11,	kw_set_month },
>> +    { N_("dec|ember"),	TM_ABS_MON,	12,	kw_set_month },
>> +
>> +    /* Durations. */
>> +    { N_("y|ears"),	TM_REL_YEAR,	1,	kw_set_rel },
>> +    { N_("w|eeks"),	TM_REL_WEEK,	1,	kw_set_rel },
>> +    { N_("d|ays"),	TM_REL_DAY,	1,	kw_set_rel },
>> +    { N_("h|ours"),	TM_REL_HOUR,	1,	kw_set_rel },
>> +    { N_("hr|s"),	TM_REL_HOUR,	1,	kw_set_rel },
>> +    { N_("m|inutes"),	TM_REL_MIN,	1,	kw_set_rel },
>> +    /* M=months, m=minutes */
>> +    { N_("M"),		TM_REL_MON,	1,	kw_set_rel },
>> +    { N_("mins"),	TM_REL_MIN,	1,	kw_set_rel },
>> +    { N_("mo|nths"),	TM_REL_MON,	1,	kw_set_rel },
>> +    { N_("s|econds"),	TM_REL_SEC,	1,	kw_set_rel },
>> +    { N_("secs"),	TM_REL_SEC,	1,	kw_set_rel },
>> +
>> +    /* Numbers. */
>> +    { N_("one"),	TM_NONE,	1,	kw_set_number },
>> +    { N_("two"),	TM_NONE,	2,	kw_set_number },
>> +    { N_("three"),	TM_NONE,	3,	kw_set_number },
>> +    { N_("four"),	TM_NONE,	4,	kw_set_number },
>> +    { N_("five"),	TM_NONE,	5,	kw_set_number },
>> +    { N_("six"),	TM_NONE,	6,	kw_set_number },
>> +    { N_("seven"),	TM_NONE,	7,	kw_set_number },
>> +    { N_("eight"),	TM_NONE,	8,	kw_set_number },
>> +    { N_("nine"),	TM_NONE,	9,	kw_set_number },
>> +    { N_("ten"),	TM_NONE,	10,	kw_set_number },
>> +    { N_("dozen"),	TM_NONE,	12,	kw_set_number },
>> +    { N_("hundred"),	TM_NONE,	100,	kw_set_number },
>> +
>> +    /* Special number forms. */
>> +    { N_("this"),	TM_NONE,	0,	kw_set_number },
>> +    { N_("last"),	TM_NONE,	1,	kw_set_number },
>> +
>> +    /* Other special keywords. */
>> +    { N_("yesterday"),	TM_REL_DAY,	1,	kw_set_rel },
>> +    { N_("today"),	TM_NONE,	0,	kw_set_today },
>> +    { N_("now"),	TM_NONE,	0,	kw_set_now },
>> +    { N_("noon"),	TM_NONE,	12,	kw_set_timeofday },
>> +    { N_("midnight"),	TM_NONE,	0,	kw_set_timeofday },
>> +    { N_("am"),		TM_AMPM,	0,	kw_set_ampm },
>> +    { N_("a.m."),	TM_AMPM,	0,	kw_set_ampm },
>> +    { N_("pm"),		TM_AMPM,	1,	kw_set_ampm },
>> +    { N_("p.m."),	TM_AMPM,	1,	kw_set_ampm },
>> +    { N_("st"),		TM_NONE,	0,	kw_set_ordinal },
>> +    { N_("nd"),		TM_NONE,	0,	kw_set_ordinal },
>> +    { N_("rd"),		TM_NONE,	0,	kw_set_ordinal },
>> +    { N_("th"),		TM_NONE,	0,	kw_set_ordinal },
>> +
>> +    /* Timezone codes: offset in minutes. XXX: Add more codes. */
>> +    { N_("pst"),	TM_TZ,		-8*60,	NULL },
>> +    { N_("mst"),	TM_TZ,		-7*60,	NULL },
>> +    { N_("cst"),	TM_TZ,		-6*60,	NULL },
>> +    { N_("est"),	TM_TZ,		-5*60,	NULL },
>> +    { N_("ast"),	TM_TZ,		-4*60,	NULL },
>> +    { N_("nst"),	TM_TZ,		-(3*60+30),	NULL },
>> +
>> +    { N_("gmt"),	TM_TZ,		0,	NULL },
>> +    { N_("utc"),	TM_TZ,		0,	NULL },
>> +
>> +    { N_("wet"),	TM_TZ,		0,	NULL },
>> +    { N_("cet"),	TM_TZ,		1*60,	NULL },
>> +    { N_("eet"),	TM_TZ,		2*60,	NULL },
>> +    { N_("fet"),	TM_TZ,		3*60,	NULL },
>> +
>> +    { N_("wat"),	TM_TZ,		1*60,	NULL },
>> +    { N_("cat"),	TM_TZ,		2*60,	NULL },
>> +    { N_("eat"),	TM_TZ,		3*60,	NULL },
>> +};
>> +
>> +/*
>> + * Compare strings s and keyword. Return number of matching chars on
>> + * match, 0 for no match. Match must be at least n chars, or all of
>> + * keyword if n < 0, otherwise it's not a match. Use match_case for
>> + * case sensitive matching.
>> + */
>> +static size_t
>> +match_keyword (const char *s, const char *keyword, ssize_t n, bool match_case)
>> +{
>> +    ssize_t i;
>> +
>> +    if (!n)
>> +	return 0;
>> +
>> +    for (i = 0; *s && *keyword; i++, s++, keyword++) {
>> +	if (match_case) {
>> +	    if (*s != *keyword)
>
> The pointer arithmetic doesn't seem to buy anything here.  What about
> just looping over i and using s[i] and keyword[i]?

The pointer arithmetic will be useful when I implement your other
suggestion of handling '|' here. ;) Otherwise, I'd need two index
variables.

>
>> +		break;
>> +	} else {
>> +	    if (tolower ((unsigned char) *s) !=
>> +		tolower ((unsigned char) *keyword))
>
> I don't think the cast to unsigned char is necessary.

As discussed on IRC, pedantically it is necessary, as ctype.h functions
accept an int that must have the value of an unsigned char or EOF, and
char might be signed.

>> +		break;
>> +	}
>> +    }
>> +
>> +    if (n > 0)
>> +	return i < n ? 0 : i;
>> +    else
>> +	return *keyword ? 0 : i;
>> +}
>> +
>> +/*
>> + * Parse a keyword. Return < 0 on error, number of parsed chars on
>> + * success.
>> + */
>> +static ssize_t
>> +parse_keyword (struct state *state, const char *s)
>> +{
>> +    unsigned int i;
>> +    size_t n, max_n = 0;
>> +    struct keyword *kw = NULL;
>> +    int r;
>> +
>> +    /* Match longest keyword */
>> +    for (i = 0; i < ARRAY_SIZE (keywords); i++) {
>> +	/* Match case if keyword begins with upper case letter. */
>> +	bool mcase = isupper ((unsigned char) keywords[i].name[0]);
>
> Same with this cast.
>
>> +	ssize_t minlen = -1;
>> +	char keyword[128];
>> +	char *p;
>> +
>> +	strncpy (keyword, _(keywords[i].name), sizeof (keyword));
>> +
>> +	/* Truncate too long keywords. XXX: Make this dynamic? */
>> +	keyword[sizeof (keyword) - 1] = '\0';
>> +
>> +	/* Minimum match length. */
>> +	p = strchr (keyword, '|');
>> +	if (p) {
>> +	    minlen = p - keyword;
>> +
>> +	    /* Remove the minimum match length separator. */
>> +	    memmove (p, p + 1, strlen (p + 1) + 1);
>> +	}
>
> Would it make more sense to make match_keyword aware of the |
> character?  Then you wouldn't need this dance with copying the keyword
> into a scratch buffer.  I'm thinking something like (untested)

Agreed.

> static size_t
> match_keyword (const char *s, const char *keyword, bool match_case)
> {
>     size_t i;
>     bool prefix_matched = false;
>
>     for (i = 0; *s && *keyword; i++, s++, keyword++) {
>         if (*keyword == '|') {
>             prefix_matched = true;
>             ++keyword;
>         }
>         if (match_case && *s != *keyword)
>             return 0;
>         else if (tolower (*s) != tolower (*keyword))
>             return 0;
>     }
>
>     if (!*keyword || prefix_matched)
>         return i;
>     return 0;
> }
>
>> +
>> +	n = match_keyword (s, keyword, minlen, mcase);
>> +	if (n > max_n || (n == max_n && mcase)) {
>> +	    max_n = n;
>> +	    kw = &keywords[i];
>> +	}
>> +    }
>> +
>> +    if (!kw)
>> +	return -PARSE_TIME_ERR_KEYWORD;
>> +
>> +    if (kw->set)
>> +	r = kw->set (state, kw);
>> +    else
>> +	r = kw_set_default (state, kw);
>> +
>> +    if (r < 0)
>> +	return r;
>> +
>> +    return max_n;
>> +}
>> +
>> +/*
>> + * Non-keyword parsers and their helpers.
>> + */
>> +
>> +static int
>> +set_user_tz (struct state *state, char sign, int hour, int min)
>> +{
>> +    int tz = hour * 60 + min;
>> +
>> +    assert (sign == '+' || sign == '-');
>> +
>> +    if (hour < 0 || hour > 14 || min < 0 || min > 59 || min % 15)
>
> Good to see you're not forgetting our Kiribati notmuch user base.

:)

>> +	return -PARSE_TIME_ERR_INVALIDTIME;
>> +
>> +    if (sign == '-')
>> +	tz = -tz;
>> +
>> +    return set_field (state, TM_TZ, tz);
>> +}
>> +
>> +/*
>> + * Parse a previously postponed number if one exists. Independent
>> + * parsing of a postponed number when it wasn't consumed during
>> + * parsing of the following token.
>> + */
>> +static int
>> +parse_postponed_number (struct state *state, unused (enum field next_field))
>> +{
>> +    int v, n;
>> +    char d;
>> +
>> +    /* Bail out if there's no postponed number. */
>> +    if (!get_postponed_number (state, &v, &n, &d))
>> +	return 0;
>> +
>> +    if (n == 1 || n == 2) {
>> +	/* Notable exception: Previous field affects parsing. This
>> +	 * handles "January 20". */
>> +	if (state->last_field == TM_ABS_MON) {
>> +	    /* D[D] */
>> +	    if (!is_valid_mday (v))
>> +		return -PARSE_TIME_ERR_INVALIDDATE;
>> +
>> +	    return set_field (state, TM_ABS_MDAY, v);
>> +	} else if (n == 2) {
>> +	    /* XXX: Only allow if last field is hour, min, or sec? */
>> +	    if (d == '+' || d == '-') {
>> +		/* +/-HH */
>> +		return set_user_tz (state, d, v, 0);
>> +	    }
>> +	}
>> +    } else if (n == 4) {
>> +	/* Notable exception: Value affects parsing. Time zones are
>> +	 * always at most 1400 and we don't understand years before
>> +	 * 1970. */
>> +	if (!is_valid_year (v)) {
>> +	    if (d == '+' || d == '-') {
>> +		/* +/-HHMM */
>> +		return set_user_tz (state, d, v / 100, v % 100);
>> +	    }
>> +	} else {
>> +	    /* YYYY */
>> +	    return set_field (state, TM_ABS_YEAR, v);
>> +	}
>> +    } else if (n == 6) {
>> +	/* HHMMSS */
>> +	int hour = v / 10000;
>> +	int min = (v / 100) % 100;
>> +	int sec = v % 100;
>> +
>> +	if (!is_valid_time (hour, min, sec))
>> +	    return -PARSE_TIME_ERR_INVALIDTIME;
>> +
>> +	return set_abs_time (state, hour, min, sec);
>> +    } else if (n == 8) {
>> +	/* YYYYMMDD */
>> +	int year = v / 10000;
>> +	int mon = (v / 100) % 100;
>> +	int mday = v % 100;
>> +
>> +	if (!is_valid_date (year, mon, mday))
>> +	    return -PARSE_TIME_ERR_INVALIDDATE;
>> +
>> +	return set_abs_date (state, year, mon, mday);
>> +    } else {
>> +	return -PARSE_TIME_ERR_FORMAT;
>
> No need for the else block, given the return at the end.

Will fix.

>> +    }
>> +
>> +    return -PARSE_TIME_ERR_FORMAT;
>> +}
>> +
>> +static int tm_get_field (const struct tm *tm, enum field field);
>> +
>> +static int
>> +set_timestamp (struct state *state, time_t t)
>> +{
>> +    struct tm tm;
>> +    enum field f;
>> +    int r;
>> +
>> +    if (gmtime_r (&t, &tm) == NULL)
>> +	return -PARSE_TIME_ERR_LIB;
>> +
>> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
>> +	r = set_field (state, f, tm_get_field (&tm, f));
>> +	if (r)
>> +	    return r;
>> +    }
>> +
>> +    r = set_field (state, TM_TZ, 0);
>> +    if (r)
>> +	return r;
>> +
>> +    /* XXX: Prevent TM_AMPM with timestamp, e.g. "@123456 pm" */
>> +
>> +    return 0;
>> +}
>> +
>> +/* Parse a single number. Typically postpone parsing until later. */
>> +static int
>> +parse_single_number (struct state *state, unsigned long v,
>> +		     unsigned long n)
>> +{
>> +    assert (n);
>> +
>> +    if (state->delim == '@')
>> +	return set_timestamp (state, (time_t) v);
>> +
>> +    if (v > INT_MAX)
>> +	return -PARSE_TIME_ERR_FORMAT;
>> +
>> +    return set_postponed_number (state, v, n);
>> +}
>> +
>> +static bool
>> +is_time_sep (char c)
>> +{
>> +    return c == ':';
>> +}
>> +
>> +static bool
>> +is_date_sep (char c)
>> +{
>> +    return c == '/' || c == '-' || c == '.';
>> +}
>> +
>> +static bool
>> +is_sep (char c)
>> +{
>> +    return is_time_sep (c) || is_date_sep (c);
>> +}
>> +
>> +/* Two-digit year: 00...69 is 2000s, 70...99 1900s, if n == 0 keep
>> + * unset. */
>> +static int
>> +expand_year (unsigned long year, size_t n)
>> +{
>> +    if (n == 2) {
>> +	return (year < 70 ? 2000 : 1900) + year;
>> +    } else if (n == 4) {
>> +	return year;
>> +    } else {
>> +	return UNSET;
>> +    }
>> +}
>> +
>> +/* Parse a date number triplet. */
>> +static int
>> +parse_date (struct state *state, char sep,
>> +	    unsigned long v1, unsigned long v2, unsigned long v3,
>> +	    size_t n1, size_t n2, size_t n3)
>> +{
>> +    int year = UNSET, mon = UNSET, mday = UNSET;
>> +
>> +    assert (is_date_sep (sep));
>> +
>> +    switch (sep) {
>> +    case '/': /* Date: M[M]/D[D][/YY[YY]] or M[M]/YYYY */
>> +	if (n1 != 1 && n1 != 2)
>> +	    return -PARSE_TIME_ERR_DATEFORMAT;
>> +
>> +	if ((n2 == 1 || n2 == 2) && (n3 == 0 || n3 == 2 || n3 == 4)) {
>> +	    /* M[M]/D[D][/YY[YY]] */
>> +	    year = expand_year (v3, n3);
>> +	    mon = v1;
>> +	    mday = v2;
>> +	} else if (n2 == 4 && n3 == 0) {
>> +	    /* M[M]/YYYY */
>> +	    year = v2;
>> +	    mon = v1;
>> +	} else {
>> +	    return -PARSE_TIME_ERR_DATEFORMAT;
>> +	}
>> +	break;
>> +
>> +    case '-': /* Date: YYYY-MM[-DD] or DD-MM[-YY[YY]] or MM-YYYY */
>> +	if (n1 == 4 && n2 == 2 && (n3 == 0 || n3 == 2)) {
>> +	    /* YYYY-MM[-DD] */
>> +	    year = v1;
>> +	    mon = v2;
>> +	    if (n3)
>> +		mday = v3;
>> +	} else if (n1 == 2 && n2 == 2 && (n3 == 0 || n3 == 2 || n3 == 4)) {
>> +	    /* DD-MM[-YY[YY]] */
>> +	    year = expand_year (v3, n3);
>> +	    mon = v2;
>> +	    mday = v1;
>> +	} else if (n1 == 2 && n2 == 4 && n3 == 0) {
>> +	    /* MM-YYYY */
>> +	    year = v2;
>> +	    mon = v1;
>> +	} else {
>> +	    return -PARSE_TIME_ERR_DATEFORMAT;
>> +	}
>> +	break;
>> +
>> +    case '.': /* Date: D[D].M[M][.[YY[YY]]] */
>> +	if ((n1 != 1 && n1 != 2) || (n2 != 1 && n2 != 2) ||
>> +	    (n3 != 0 && n3 != 2 && n3 != 4))
>> +	    return -PARSE_TIME_ERR_DATEFORMAT;
>> +
>> +	year = expand_year (v3, n3);
>> +	mon = v2;
>> +	mday = v1;
>> +	break;
>> +    }
>> +
>> +    if (year != UNSET && !is_valid_year (year))
>> +	return -PARSE_TIME_ERR_INVALIDDATE;
>> +
>> +    if (mon != UNSET && !is_valid_mon (mon))
>> +	return -PARSE_TIME_ERR_INVALIDDATE;
>> +
>> +    if (mday != UNSET && !is_valid_mday (mday))
>> +	return -PARSE_TIME_ERR_INVALIDDATE;
>> +
>> +    return set_abs_date (state, year, mon, mday);
>> +}
>> +
>> +/* Parse a time number triplet. */
>> +static int
>> +parse_time (struct state *state, char sep,
>> +	    unsigned long v1, unsigned long v2, unsigned long v3,
>> +	    size_t n1, size_t n2, size_t n3)
>> +{
>> +    assert (is_time_sep (sep));
>> +
>> +    if ((n1 != 1 && n1 != 2) || n2 != 2 || (n3 != 0 && n3 != 2))
>> +	return -PARSE_TIME_ERR_TIMEFORMAT;
>> +
>> +    /*
>> +     * Notable exception: Previously set fields affect
>> +     * parsing. Interpret (+|-)HH:MM as time zone only if hour and
>> +     * minute have been set.
>> +     *
>> +     * XXX: This could be fixed by restricting the delimiters
>> +     * preceding time. For '+' it would be justified, but for '-' it
>> +     * might be inconvenient. However prefer to allow '-' as an
>> +     * insignificant delimiter preceding time for convenience, and
>> +     * handle '+' the same way for consistency between positive and
>> +     * negative time zones.
>> +     */
>> +    if (is_field_set (state, TM_ABS_HOUR) &&
>> +	is_field_set (state, TM_ABS_MIN) &&
>> +	n1 == 2 && n2 == 2 && n3 == 0 &&
>> +	(state->delim == '+' || state->delim == '-')) {
>> +	return set_user_tz (state, state->delim, v1, v2);
>> +    }
>> +
>> +    if (!is_valid_time (v1, v2, v3))
>> +	return -PARSE_TIME_ERR_INVALIDTIME;
>> +
>> +    return set_abs_time (state, v1, v2, n3 ? v3 : 0);
>> +}
>> +
>> +/* strtoul helper that assigns length. */
>> +static unsigned long
>> +strtoul_len (const char *s, const char **endp, size_t *len)
>> +{
>> +    unsigned long val = strtoul (s, (char **) endp, 10);
>
> This could technically get confused by really large numbers, but I
> don't know if that's worth worrying about.

I think I'll just ignore that for now.

>> +
>> +    *len = *endp - s;
>> +    return val;
>> +}
>> +
>> +/*
>> + * Parse a (group of) number(s). Return < 0 on error, number of parsed
>> + * chars on success.
>> + */
>> +static ssize_t
>> +parse_number (struct state *state, const char *s)
>> +{
>> +    int r;
>> +    unsigned long v1, v2, v3 = 0;
>> +    size_t n1, n2, n3 = 0;
>> +    const char *p = s;
>> +    char sep;
>> +
>> +    v1 = strtoul_len (p, &p, &n1);
>> +
>> +    if (is_sep (*p) && isdigit ((unsigned char) *(p + 1))) {
>
> Unnecessary cast?
>
>> +	sep = *p;
>> +	v2 = strtoul_len (p + 1, &p, &n2);
>> +    } else {
>> +	/* A single number. */
>> +	r = parse_single_number (state, v1, n1);
>> +	if (r)
>> +	    return r;
>> +
>> +	return p - s;
>
> I found the control flow here confusing.  You might want to flip the
> two conditions so the single number return happens first and the rest
> of the code flows straight through:

Agreed.

> if (!is_sep (*p) || !isdigit (*(p + 1))) {
>     ...
>     return p - s;
> }
>
> sep = *p;
> ...
>
>> +    }
>> +
>> +    /* A group of two or three numbers? */
>> +    if (*p == sep && isdigit ((unsigned char) *(p + 1)))
>> +	v3 = strtoul_len (p + 1, &p, &n3);
>> +
>> +    if (is_time_sep (sep))
>> +	r = parse_time (state, sep, v1, v2, v3, n1, n2, n3);
>> +    else
>> +	r = parse_date (state, sep, v1, v2, v3, n1, n2, n3);
>> +
>> +    if (r)
>> +	return r;
>> +
>> +    return p - s;
>> +}
>> +
>> +/*
>> + * Parse delimiter(s). Throw away all except the last one, which is
>> + * stored for parsing the next non-delimiter. Return < 0 on error,
>> + * number of parsed chars on success.
>> + *
>> + * XXX: We might want to be more strict here.
>> + */
>> +static ssize_t
>> +parse_delim (struct state *state, const char *s)
>> +{
>> +    const char *p = s;
>> +
>> +    /*
>> +     * Skip non-alpha and non-digit, and store the last for further
>> +     * processing.
>> +     */
>> +    while (*p && !isalnum ((unsigned char) *p)) {
>> +	set_delim (state, *p);
>> +	p++;
>> +    }
>> +
>> +    return p - s;
>> +}
>> +
>> +/*
>> + * Parse a date/time string. Return < 0 on error, number of parsed
>> + * chars on success.
>> + */
>> +static ssize_t
>> +parse_input (struct state *state, const char *s)
>> +{
>> +    const char *p = s;
>> +    ssize_t n;
>> +    int r;
>> +
>> +    while (*p) {
>> +	if (isalpha ((unsigned char) *p)) {
>> +	    n = parse_keyword (state, p);
>> +	} else if (isdigit ((unsigned char) *p)) {
>> +	    n = parse_number (state, p);
>> +	} else {
>> +	    n = parse_delim (state, p);
>> +	}
>> +
>> +	if (n <= 0) {
>> +	    if (n == 0)
>> +		n = -PARSE_TIME_ERR;
>> +
>> +	    return n;
>> +	}
>> +
>> +	p += n;
>> +    }
>> +
>> +    /* Parse a previously postponed number, if any. */
>> +    r = parse_postponed_number (state, TM_NONE);
>> +    if (r < 0)
>> +	return r;
>> +
>> +    return p - s;
>> +}
>> +
>> +/*
>> + * Processing the parsed input.
>> + */
>> +
>> +/*
>> + * Initialize reference time to tm. Use time zone in state if
>> + * specified, otherwise local time. Use now for reference time if
>> + * non-NULL, otherwise current time.
>> + */
>> +static int
>> +initialize_now (struct state *state, struct tm *tm, const time_t *now)
>
> Should tm be the last argument, since it's an out-argument?

Will change.

> Why is now a pointer?  Just so it can be NULL?

Yes, coming all the way from the API.

>> +{
>> +    time_t t;
>> +
>> +    if (now) {
>> +	t = *now;
>> +    } else {
>> +	if (time (&t) == (time_t) -1)
>> +	    return -PARSE_TIME_ERR_LIB;
>> +    }
>> +
>> +    if (is_field_set (state, TM_TZ)) {
>> +	/* Some other time zone. */
>> +
>> +	/* Adjust now according to the TZ. */
>> +	t += get_field (state, TM_TZ) * 60;
>> +
>> +	/* It's not gm, but this doesn't mess with the TZ. */
>> +	if (gmtime_r (&t, tm) == NULL)
>> +	    return -PARSE_TIME_ERR_LIB;
>> +    } else {
>> +	/* Local time. */
>> +	if (localtime_r (&t, tm) == NULL)
>> +	    return -PARSE_TIME_ERR_LIB;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>> + * Normalize tm according to mktime(3). Both mktime(3) and
>
> This comment could elaborate a bit on what it means to normalize a tm.

Agreed.

>> + * localtime_r(3) use local time, but they cancel each other out here,
>> + * making this function agnostic to time zone.
>> + */
>> +static int
>> +normalize_tm (struct tm *tm)
>> +{
>> +    time_t t = mktime (tm);
>> +
>> +    if (t == (time_t) -1)
>> +	return -PARSE_TIME_ERR_LIB;
>> +
>> +    if (!localtime_r (&t, tm))
>> +	return -PARSE_TIME_ERR_LIB;
>
> Do you actually need this call to localtime_r or can you just return
> after the mktime modifies tm?  Does this have to do with timezones?

Hmm, I'm not sure. I think I'll just keep it like this, because that's
the way it has worked for months...

>> +
>> +    return 0;
>> +}
>> +
>> +/* Get field out of a struct tm. */
>> +static int
>> +tm_get_field (const struct tm *tm, enum field field)
>> +{
>> +    switch (field) {
>> +    case TM_ABS_SEC:	return tm->tm_sec;
>> +    case TM_ABS_MIN:	return tm->tm_min;
>> +    case TM_ABS_HOUR:	return tm->tm_hour;
>> +    case TM_ABS_MDAY:	return tm->tm_mday;
>> +    case TM_ABS_MON:	return tm->tm_mon + 1; /* 0- to 1-based */
>> +    case TM_ABS_YEAR:	return 1900 + tm->tm_year;
>> +    case TM_ABS_WDAY:	return tm->tm_wday;
>> +    case TM_ABS_ISDST:	return tm->tm_isdst;
>> +    default:
>> +	assert (false);
>> +	break;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* Modify hour according to am/pm setting. */
>> +static int
>> +fixup_ampm (struct state *state)
>> +{
>> +    int hour, hdiff = 0;
>> +
>> +    if (!is_field_set (state, TM_AMPM))
>> +	return 0;
>> +
>> +    if (!is_field_set (state, TM_ABS_HOUR))
>> +	return -PARSE_TIME_ERR_TIMEFORMAT;
>> +
>> +    hour = get_field (state, TM_ABS_HOUR);
>> +    if (!is_valid_12hour (hour))
>> +	return -PARSE_TIME_ERR_INVALIDTIME;
>> +
>> +    if (get_field (state, TM_AMPM)) {
>> +	/* 12pm is noon. */
>> +	if (hour != 12)
>> +	    hdiff = 12;
>> +    } else {
>> +	/* 12am is midnight, beginning of day. */
>> +	if (hour == 12)
>> +	    hdiff = -12;
>> +    }
>> +
>> +    mod_field (state, TM_REL_HOUR, -hdiff);
>> +
>> +    return 0;
>> +}
>> +
>> +/* Combine absolute and relative fields, and round. */
>> +static int
>> +create_output (struct state *state, time_t *t_out, const time_t *ref,
>> +	       int round)
>> +{
>> +    struct tm tm = { .tm_isdst = -1 };
>> +    struct tm now;
>> +    time_t t;
>> +    enum field f;
>> +    int r;
>> +    int week_round = PARSE_TIME_NO_ROUND;
>> +
>> +    r = initialize_now (state, &now, ref);
>> +    if (r)
>> +	return r;
>> +
>> +    /* Initialize fields flagged as "now" to reference time. */
>> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
>> +	if (state->set[f] == FIELD_NOW) {
>> +	    state->tm[f] = tm_get_field (&now, f);
>> +	    state->set[f] = FIELD_SET;
>> +	}
>> +    }
>> +
>> +    /*
>> +     * If WDAY is set but MDAY is not, we consider WDAY relative
>> +     *
>> +     * XXX: This fails on stuff like "two months monday" because two
>> +     * months ago wasn't the same day as today. Postpone until we know
>> +     * date?
>> +     */
>> +    if (is_field_set (state, TM_ABS_WDAY) &&
>> +	!is_field_set (state, TM_ABS_MDAY)) {
>> +	int wday = get_field (state, TM_ABS_WDAY);
>> +	int today = tm_get_field (&now, TM_ABS_WDAY);
>> +	int rel_days;
>> +
>> +	if (today > wday)
>> +	    rel_days = today - wday;
>> +	else
>> +	    rel_days = today + 7 - wday;
>> +
>> +	/* This also prevents special week rounding from happening. */
>> +	mod_field (state, TM_REL_DAY, rel_days);
>> +
>> +	unset_field (state, TM_ABS_WDAY);
>> +    }
>> +
>> +    r = fixup_ampm (state);
>> +    if (r)
>> +	return r;
>> +
>> +    /*
>> +     * Iterate fields from most accurate to least accurate, and set
>> +     * unset fields according to requested rounding.
>> +     */
>> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
>> +	if (round != PARSE_TIME_NO_ROUND) {
>> +	    enum field r = abs_to_rel_field (f);
>> +
>> +	    if (is_field_set (state, f) || is_field_set (state, r)) {
>> +		if (round >= PARSE_TIME_ROUND_UP && f != TM_ABS_SEC) {
>> +		    mod_field (state, r, -1);
>
> Crazy.  This could use a comment.  It took me a while to figure out
> why this was -1, though maybe that's just because it's late.

Will do.

/* You're not expected to understand this */ ;)

>> +		    if (round == PARSE_TIME_ROUND_UP_INCLUSIVE)
>> +			mod_field (state, TM_REL_SEC, 1);
>> +		}
>> +		round = PARSE_TIME_NO_ROUND; /* No more rounding. */
>> +	    } else {
>> +		if (f == TM_ABS_MDAY &&
>> +		    is_field_set (state, TM_REL_WEEK)) {
>> +		    /* Week is most accurate. */
>> +		    week_round = round;
>> +		    round = PARSE_TIME_NO_ROUND;
>> +		} else {
>> +		    set_field (state, f, field_epoch (f));
>> +		}
>> +	    }
>> +	}
>> +
>> +	if (!is_field_set (state, f))
>> +	    set_field (state, f, tm_get_field (&now, f));
>> +    }
>> +
>> +    /* Special case: rounding with week accuracy. */
>> +    if (week_round != PARSE_TIME_NO_ROUND) {
>> +	/* Temporarily set more accurate fields to now. */
>> +	set_field (state, TM_ABS_SEC, tm_get_field (&now, TM_ABS_SEC));
>> +	set_field (state, TM_ABS_MIN, tm_get_field (&now, TM_ABS_MIN));
>> +	set_field (state, TM_ABS_HOUR, tm_get_field (&now, TM_ABS_HOUR));
>> +	set_field (state, TM_ABS_MDAY, tm_get_field (&now, TM_ABS_MDAY));
>> +    }
>> +
>> +    /*
>> +     * Set all fields. They may contain out of range values before
>> +     * normalization by mktime(3).
>> +     */
>> +    tm.tm_sec = get_field (state, TM_ABS_SEC) - get_field (state, TM_REL_SEC);
>> +    tm.tm_min = get_field (state, TM_ABS_MIN) - get_field (state, TM_REL_MIN);
>> +    tm.tm_hour = get_field (state, TM_ABS_HOUR) - get_field (state, TM_REL_HOUR);
>> +    tm.tm_mday = get_field (state, TM_ABS_MDAY) -
>> +		 get_field (state, TM_REL_DAY) - 7 * get_field (state, TM_REL_WEEK);
>> +    tm.tm_mon = get_field (state, TM_ABS_MON) - get_field (state, TM_REL_MON);
>> +    tm.tm_mon--; /* 1- to 0-based */
>> +    tm.tm_year = get_field (state, TM_ABS_YEAR) - get_field (state, TM_REL_YEAR) - 1900;
>> +
>> +    /*
>> +     * It's always normal time.
>> +     *
>> +     * XXX: This is probably not a solution that universally
>> +     * works. Just make sure DST is not taken into account. We don't
>> +     * want rounding to be affected by DST.
>> +     */
>> +    tm.tm_isdst = -1;
>> +
>> +    /* Special case: rounding with week accuracy. */
>> +    if (week_round != PARSE_TIME_NO_ROUND) {
>> +	/* Normalize to get proper tm.wday. */
>> +	r = normalize_tm (&tm);
>> +	if (r < 0)
>> +	    return r;
>> +
>> +	/* Set more accurate fields back to zero. */
>> +	tm.tm_sec = 0;
>> +	tm.tm_min = 0;
>> +	tm.tm_hour = 0;
>> +	tm.tm_isdst = -1;
>> +
>> +	/* Monday is the true 1st day of week, but this is easier. */
>> +	if (week_round >= PARSE_TIME_ROUND_UP) {
>> +	    tm.tm_mday += 7 - tm.tm_wday;
>> +	    if (week_round == PARSE_TIME_ROUND_UP_INCLUSIVE)
>> +		tm.tm_sec--;
>> +	} else {
>> +	    tm.tm_mday -= tm.tm_wday;
>> +	}
>> +    }
>> +
>> +    if (is_field_set (state, TM_TZ)) {
>> +	/* tm is in specified TZ, convert to UTC for timegm(3). */
>> +	tm.tm_min -= get_field (state, TM_TZ);
>> +	t = timegm (&tm);
>> +    } else {
>> +	/* tm is in local time. */
>> +	t = mktime (&tm);
>> +    }
>> +
>> +    if (t == (time_t) -1)
>> +	return -PARSE_TIME_ERR_LIB;
>> +
>> +    *t_out = t;
>> +
>> +    return 0;
>> +}
>> +
>> +/* Internally, all errors are < 0. parse_time_string() returns errors > 0. */
>> +#define EXTERNAL_ERR(r) (-r)
>> +
>> +int
>> +parse_time_string (const char *s, time_t *t, const time_t *ref, int round)
>> +{
>> +    struct state state = { .last_field = TM_NONE };
>> +    int r;
>> +
>> +    if (!s || !t)
>> +	return EXTERNAL_ERR (-PARSE_TIME_ERR);
>> +
>> +    r = parse_input (&state, s);
>> +    if (r < 0)
>> +	return EXTERNAL_ERR (r);
>> +
>> +    r = create_output (&state, t, ref, round);
>> +    if (r < 0)
>> +	return EXTERNAL_ERR (r);
>> +
>> +    return 0;
>> +}
>> diff --git a/parse-time-string/parse-time-string.h b/parse-time-string/parse-time-string.h
>> new file mode 100644
>> index 0000000..bfa4ee3
>> --- /dev/null
>> +++ b/parse-time-string/parse-time-string.h
>> @@ -0,0 +1,102 @@
>> +/*
>> + * parse time string - user friendly date and time parser
>> + * Copyright © 2012 Jani Nikula
>> + *
>> + * This program is free software: you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation, either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + * Author: Jani Nikula <jani@nikula.org>
>> + */
>> +
>> +#ifndef PARSE_TIME_STRING_H
>> +#define PARSE_TIME_STRING_H
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +#include <time.h>
>> +
>> +/* return values for parse_time_string() */
>> +enum {
>> +    PARSE_TIME_OK = 0,
>> +    PARSE_TIME_ERR,		/* unspecified error */
>> +    PARSE_TIME_ERR_LIB,		/* library call failed */
>> +    PARSE_TIME_ERR_ALREADYSET,	/* attempt to set unit twice */
>> +    PARSE_TIME_ERR_FORMAT,	/* generic date/time format error */
>> +    PARSE_TIME_ERR_DATEFORMAT,	/* date format error */
>> +    PARSE_TIME_ERR_TIMEFORMAT,	/* time format error */
>> +    PARSE_TIME_ERR_INVALIDDATE,	/* date value error */
>> +    PARSE_TIME_ERR_INVALIDTIME,	/* time value error */
>> +    PARSE_TIME_ERR_KEYWORD,	/* unknown keyword */
>> +};
>> +
>> +/* round values for parse_time_string() */
>> +enum {
>> +    PARSE_TIME_ROUND_DOWN = -1,
>> +    PARSE_TIME_NO_ROUND = 0,
>> +    PARSE_TIME_ROUND_UP = 1,
>> +    PARSE_TIME_ROUND_UP_INCLUSIVE = 2,
>> +};
>> +
>> +/**
>> + * parse_time_string() - user friendly date and time parser
>> + * @s:		string to parse
>> + * @t:		pointer to time_t to store parsed time in
>> + * @ref:	pointer to time_t containing reference date/time, or NULL
>> + * @round:	PARSE_TIME_NO_ROUND, PARSE_TIME_ROUND_DOWN, or
>> + *		PARSE_TIME_ROUND_UP
>> + *
>> + * Parse a date/time string 's' and store the parsed date/time result
>> + * in 't'.
>> + *
>> + * A reference date/time is used for determining the "date/time units"
>> + * (roughly equivalent to struct tm members) not specified by 's'. If
>> + * 'ref' is non-NULL, it must contain a pointer to a time_t to be used
>> + * as reference date/time. Otherwise, the current time is used.
>> + *
>> + * If 's' does not specify a full date/time, the 'round' parameter
>> + * specifies if and how the result should be rounded as follows:
>> + *
>> + *   PARSE_TIME_NO_ROUND: All date/time units that are not specified
>> + *   by 's' are set to the corresponding unit derived from the
>> + *   reference date/time.
>> + *
>> + *   PARSE_TIME_ROUND_DOWN: All date/time units that are more accurate
>> + *   than the most accurate unit specified by 's' are set to the
>> + *   smallest valid value for that unit. Rest of the unspecified units
>> + *   are set as in PARSE_TIME_NO_ROUND.
>> + *
>> + *   PARSE_TIME_ROUND_UP: All date/time units that are more accurate
>> + *   than the most accurate unit specified by 's' are set to the
>> + *   smallest valid value for that unit. The most accurate unit
>> + *   specified by 's' is incremented by one (and this is rolled over
>> + *   to the less accurate units as necessary), unless the most
>> + *   accurate unit is seconds. Rest of the unspecified units are set
>> + *   as in PARSE_TIME_NO_ROUND.
>> + *
>> + *   PARSE_TIME_ROUND_UP_INCLUSIVE: Same as PARSE_TIME_ROUND_UP, minus
>> + *   one second, unless the most accurate unit specified by 's' is
>> + *   seconds. This is useful for callers that require a value for
>> + *   inclusive comparison of the result.
>> + *
>> + * Return 0 (PARSE_TIME_OK) for succesfully parsed date/time, or one
>> + * of PARSE_TIME_ERR_* on error. 't' is not modified on error.
>> + */
>> +int parse_time_string (const char *s, time_t *t, const time_t *ref, int round);
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif /* PARSE_TIME_STRING_H */
>
> Made it!

Thanks for your very helpful and constructive review, as always!

BR,
Jani.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 4/9] test: add smoke tests for the date/time parser module
  2012-10-23  4:23   ` Austin Clements
@ 2012-10-28 22:34     ` Jani Nikula
  0 siblings, 0 replies; 21+ messages in thread
From: Jani Nikula @ 2012-10-28 22:34 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

On Tue, 23 Oct 2012, Austin Clements <amdragon@MIT.EDU> wrote:
> Quoth Jani Nikula on Oct 22 at 12:22 am:
>> Test the date/time parser module directly, independent of notmuch,
>> using the parse-time test tool.
>> 
>> Credits to Michal Sojka <sojkam1@fel.cvut.cz> for writing most of the
>> tests.
>> ---
>>  test/notmuch-test      |    1 +
>>  test/parse-time-string |   71 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 72 insertions(+)
>>  create mode 100755 test/parse-time-string
>> 
>> diff --git a/test/notmuch-test b/test/notmuch-test
>> index cc732c3..7eadfdf 100755
>> --- a/test/notmuch-test
>> +++ b/test/notmuch-test
>> @@ -60,6 +60,7 @@ TESTS="
>>    emacs-hello
>>    emacs-show
>>    missing-headers
>> +  parse-time-string
>>  "
>>  TESTS=${NOTMUCH_TESTS:=$TESTS}
>>  
>> diff --git a/test/parse-time-string b/test/parse-time-string
>> new file mode 100755
>> index 0000000..862e701
>> --- /dev/null
>> +++ b/test/parse-time-string
>> @@ -0,0 +1,71 @@
>> +#!/usr/bin/env bash
>> +test_description="date/time parser module"
>> +. ./test-lib.sh
>> +
>> +# Sanity/smoke tests for the date/time parser independent of notmuch
>> +
>> +_date ()
>> +{
>> +    date -d "$*" +%s
>> +}
>> +
>> +_parse_time ()
>> +{
>> +    ${TEST_DIRECTORY}/parse-time --format=%s "$*"
>> +}
>> +
>> +test_begin_subtest "date(1) default format without TZ code"
>> +test_expect_equal "$(_parse_time Fri Aug 3 23:06:06 2012)" "$(_date Fri Aug 3 23:06:06 2012)"
>> +
>> +test_begin_subtest "date(1) --rfc-2822 format"
>> +test_expect_equal "$(_parse_time Fri, 03 Aug 2012 23:07:46 +0100)" "$(_date Fri, 03 Aug 2012 23:07:46 +0100)"
>> +
>> +test_begin_subtest "date(1) --rfc=3339=seconds format"
>> +test_expect_equal "$(_parse_time 2012-08-03 23:09:37+03:00)" "$(_date 2012-08-03 23:09:37+03:00)"
>> +
>> +test_begin_subtest "Date parser tests"
>> +REFERENCE=$(_date Tue Jan 11 11:11:00 +0000 2011)
>> +cat <<EOF > INPUT
>> +now          ==> Tue Jan 11 11:11:00 +0000 2011
>> +2010-1-1     ==> ERROR: 5
>
> It would be nice if these errors were strings.  I have no idea if "5"
> is the right error for this.

Good idea. Will fix.

>> +Jan 2        ==> Sun Jan 02 11:11:00 +0000 2011
>> +Mon          ==> Mon Jan 10 11:11:00 +0000 2011
>> +last Friday  ==> ERROR: 4
>> +2 hours ago  ==> ERROR: 1

I'll silently eat away "ago" too.

>> +last month   ==> Sat Dec 11 11:11:00 +0000 2010
>> +month ago    ==> ERROR: 1
>> +8am          ==> Tue Jan 11 08:00:00 +0000 2011
>> +9:15         ==> Tue Jan 11 09:15:00 +0000 2011
>> +12:34        ==> Tue Jan 11 12:34:00 +0000 2011
>> +monday       ==> Mon Jan 10 11:11:00 +0000 2011
>> +yesterday    ==> Mon Jan 10 11:11:00 +0000 2011
>> +tomorrow     ==> ERROR: 1
>> +             ==> Tue Jan 11 11:11:00 +0000 2011 # empty string is reference time
>> +
>> +Aug 3 23:06:06 2012             ==> Fri Aug 03 23:06:06 +0000 2012 # date(1) default format without TZ code
>> +Fri, 03 Aug 2012 23:07:46 +0100 ==> Fri Aug 03 22:07:46 +0000 2012 # rfc-2822
>> +2012-08-03 23:09:37+03:00       ==> Fri Aug 03 20:09:37 +0000 2012 # rfc-3339 seconds
>> +
>> +10s           ==> Tue Jan 11 11:10:50 +0000 2011
>> +19701223s     ==> Fri May 28 10:37:17 +0000 2010
>> +19701223      ==> Wed Dec 23 11:11:00 +0000 1970
>> +
>> +19701223 +0100 ==> Wed Dec 23 11:11:00 +0000 1970 # Timezone is ignored without an error
>> +
>> +today ==^> Tue Jan 11 23:59:59 +0000 2011
>> +today ==_> Tue Jan 11 00:00:00 +0000 2011
>> +
>> +thisweek ==^> Sat Jan 15 23:59:59 +0000 2011
>> +thisweek ==_> Sun Jan 09 00:00:00 +0000 2011
>> +
>> +two months ago==> ERROR: 1 # "ago" is not supported
>> +two months ==> Thu Nov 11 11:11:00 +0000 2010
>> +
>> +@1348569850 ==> Tue Sep 25 10:44:10 +0000 2012
>> +@10 ==> Thu Jan 01 00:00:10 +0000 1970
>
> Very nice.  The only thing that jumps out at me is that there are no
> ==^^> tests, though it would be interesting to run a code coverage
> tool to see how thorough these tests are.

Again, most of the credit here goes to Michal Sojka.

Will add some ==^^> tests too.


BR,
Jani.


>
>> +EOF
>> +
>> +${TEST_DIRECTORY}/parse-time --ref=${REFERENCE} < INPUT > OUTPUT
>> +test_expect_equal_file INPUT OUTPUT
>> +
>> +test_done

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 6/9] lib: add date range query support
  2012-10-23  4:52   ` Austin Clements
@ 2012-10-28 22:39     ` Jani Nikula
  0 siblings, 0 replies; 21+ messages in thread
From: Jani Nikula @ 2012-10-28 22:39 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

On Tue, 23 Oct 2012, Austin Clements <amdragon@MIT.EDU> wrote:
> Quoth Jani Nikula on Oct 22 at 12:22 am:
>> Add a custom value range processor to enable date and time searches of
>> the form date:since..until, where "since" and "until" are expressions
>> understood by the previously added date/time parser, to restrict the
>> results to messages within a particular time range (based on the Date:
>> header).
>> 
>> If "since" or "until" describes date/time at an accuracy of days or
>> less, the values are rounded according to the accuracy, towards past
>> for "since" and towards future for "until". For example,
>> date:november..yesterday would match from the beginning of November
>> until the end of yesterday. Expressions such as date:today..today
>> means since the beginning of today until the end of today.
>> 
>> Open-ended ranges are supported (since Xapian 1.2.1), i.e. you can
>> specify date:..until or date:since.. to not limit the start or end
>> date, respectively.
>> 
>> CAVEATS:
>> 
>> Xapian does not support spaces in range expressions. You can replace
>> the spaces with '_', or (in most cases) '-', or (in some cases) leave
>> the spaces out altogether.
>> 
>> Entering date:expr without ".." (for example date:yesterday) will not
>> work as you might expect. You can achieve the expected result by
>> duplicating the expr both sides of ".." (for example
>> date:yesterday..yesterday).
>> 
>> Open-ended ranges won't work with pre-1.2.1 Xapian, but they don't
>> produce an error either.
>> 
>> Signed-off-by: Jani Nikula <jani@nikula.org>
>> ---
>>  lib/Makefile.local     |    1 +
>>  lib/database-private.h |    1 +
>>  lib/database.cc        |    5 +++++
>>  lib/parse-time-vrp.cc  |   40 ++++++++++++++++++++++++++++++++++++++++
>>  lib/parse-time-vrp.h   |   19 +++++++++++++++++++
>>  5 files changed, 66 insertions(+)
>>  create mode 100644 lib/parse-time-vrp.cc
>>  create mode 100644 lib/parse-time-vrp.h
>> 
>> diff --git a/lib/Makefile.local b/lib/Makefile.local
>> index d1635cf..6c0f42f 100644
>> --- a/lib/Makefile.local
>> +++ b/lib/Makefile.local
>> @@ -58,6 +58,7 @@ libnotmuch_c_srcs =		\
>>  
>>  libnotmuch_cxx_srcs =		\
>>  	$(dir)/database.cc	\
>> +	$(dir)/parse-time-vrp.cc	\
>>  	$(dir)/directory.cc	\
>>  	$(dir)/index.cc		\
>>  	$(dir)/message.cc	\
>> diff --git a/lib/database-private.h b/lib/database-private.h
>> index 88532d5..d3e65fd 100644
>> --- a/lib/database-private.h
>> +++ b/lib/database-private.h
>> @@ -52,6 +52,7 @@ struct _notmuch_database {
>>      Xapian::QueryParser *query_parser;
>>      Xapian::TermGenerator *term_gen;
>>      Xapian::ValueRangeProcessor *value_range_processor;
>> +    Xapian::ValueRangeProcessor *date_range_processor;
>>  };
>>  
>>  /* Return the list of terms from the given iterator matching a prefix.
>> diff --git a/lib/database.cc b/lib/database.cc
>> index 761dc1a..4df3217 100644
>> --- a/lib/database.cc
>> +++ b/lib/database.cc
>> @@ -19,6 +19,7 @@
>>   */
>>  
>>  #include "database-private.h"
>> +#include "parse-time-vrp.h"
>>  
>>  #include <iostream>
>>  
>> @@ -710,12 +711,14 @@ notmuch_database_open (const char *path,
>>  	notmuch->term_gen = new Xapian::TermGenerator;
>>  	notmuch->term_gen->set_stemmer (Xapian::Stem ("english"));
>>  	notmuch->value_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP);
>> +	notmuch->date_range_processor = new ParseTimeValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP);
>>  
>>  	notmuch->query_parser->set_default_op (Xapian::Query::OP_AND);
>>  	notmuch->query_parser->set_database (*notmuch->xapian_db);
>>  	notmuch->query_parser->set_stemmer (Xapian::Stem ("english"));
>>  	notmuch->query_parser->set_stemming_strategy (Xapian::QueryParser::STEM_SOME);
>>  	notmuch->query_parser->add_valuerangeprocessor (notmuch->value_range_processor);
>> +	notmuch->query_parser->add_valuerangeprocessor (notmuch->date_range_processor);
>>  
>>  	for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
>>  	    prefix_t *prefix = &BOOLEAN_PREFIX_EXTERNAL[i];
>> @@ -778,6 +781,8 @@ notmuch_database_close (notmuch_database_t *notmuch)
>>      notmuch->xapian_db = NULL;
>>      delete notmuch->value_range_processor;
>>      notmuch->value_range_processor = NULL;
>> +    delete notmuch->date_range_processor;
>> +    notmuch->date_range_processor = NULL;
>>  }
>>  
>>  void
>> diff --git a/lib/parse-time-vrp.cc b/lib/parse-time-vrp.cc
>> new file mode 100644
>> index 0000000..7e4eca4
>> --- /dev/null
>> +++ b/lib/parse-time-vrp.cc
>> @@ -0,0 +1,40 @@
>
> Should this file have the usual preamble?

Probably, yes.

>> +
>> +#include "database-private.h"
>> +#include "parse-time-vrp.h"
>> +#include "parse-time-string.h"
>> +
>> +#define PREFIX "date:"
>> +
>> +/* See *ValueRangeProcessor in xapian-core/api/valuerangeproc.cc */
>> +Xapian::valueno
>> +ParseTimeValueRangeProcessor::operator() (std::string &begin, std::string &end)
>> +{
>> +    time_t t, now;
>> +
>> +    /* Require date: prefix in start of the range... */
>> +    if (STRNCMP_LITERAL (begin.c_str (), PREFIX))
>
> Could be
>   if (begin.rfind (PREFIX, 0) == string::npos)
> but that may not be clearer.

Not to me at least; my C++ is rusty.

>> +	return Xapian::BAD_VALUENO;
>> +
>> +    /* ...and remove it. */
>> +    begin.erase (0, sizeof (PREFIX) - 1);
>> +
>> +    /* Use the same 'now' for begin and end. */
>> +    if (time (&now) == (time_t) -1)
>> +	return Xapian::BAD_VALUENO;
>> +
>> +    if (!begin.empty ()) {
>> +	if (parse_time_string (begin.c_str (), &t, &now, PARSE_TIME_ROUND_DOWN))
>> +	    return Xapian::BAD_VALUENO;
>> +
>> +	begin.assign (Xapian::sortable_serialise ((double) t));
>> +    }
>> +
>> +    if (!end.empty ()) {
>> +	if (parse_time_string (end.c_str (), &t, &now, PARSE_TIME_ROUND_UP_INCLUSIVE))
>> +	    return Xapian::BAD_VALUENO;
>> +
>> +	end.assign (Xapian::sortable_serialise ((double) t));
>> +    }
>> +
>> +    return valno;
>> +}
>> diff --git a/lib/parse-time-vrp.h b/lib/parse-time-vrp.h
>> new file mode 100644
>> index 0000000..526c217
>> --- /dev/null
>> +++ b/lib/parse-time-vrp.h
>> @@ -0,0 +1,19 @@
>
> Same thing about the preamble.
>
>> +
>> +#ifndef NOTMUCH_PARSE_TIME_VRP_H
>> +#define NOTMUCH_PARSE_TIME_VRP_H
>> +
>> +#include <xapian.h>
>> +
>> +/* see *ValueRangeProcessor in xapian-core/include/xapian/queryparser.h */
>
> Out of curiosity, why the Xapian source reference?
> ValueRangeProcessor is documented along the rest of Xapian.

To be honest, I couldn't write this with the documentation alone, and
Xapian has quite a bit of source code, so I wrote it down for me. I
figured it does no harm to leave it there.

BR,
Jani.

>> +class ParseTimeValueRangeProcessor : public Xapian::ValueRangeProcessor {
>> +protected:
>> +    Xapian::valueno valno;
>> +
>> +public:
>> +    ParseTimeValueRangeProcessor (Xapian::valueno slot_)
>> +	: valno(slot_) { }
>> +
>> +    Xapian::valueno operator() (std::string &begin, std::string &end);
>> +};
>> +
>> +#endif /* NOTMUCH_PARSE_TIME_VRP_H */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 8/9] man: document the date:since..until range queries
  2012-10-24 21:08   ` Austin Clements
@ 2012-10-28 22:41     ` Jani Nikula
  0 siblings, 0 replies; 21+ messages in thread
From: Jani Nikula @ 2012-10-28 22:41 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch


Many thanks, I'll incorporate most of your suggestions as-is.

BR,
Jani.


On Thu, 25 Oct 2012, Austin Clements <amdragon@MIT.EDU> wrote:
> Quoth Jani Nikula on Oct 22 at 12:22 am:
>> ---
>>  man/man7/notmuch-search-terms.7 |  147 +++++++++++++++++++++++++++++++++++----
>>  1 file changed, 135 insertions(+), 12 deletions(-)
>> 
>> diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7
>> index 17a109e..fbd3ee7 100644
>> --- a/man/man7/notmuch-search-terms.7
>> +++ b/man/man7/notmuch-search-terms.7
>> @@ -54,6 +54,8 @@ terms to match against specific portions of an email, (where
>>  
>>  	folder:<directory-path>
>>  
>> +	date:<since>..<until>
>> +
>>  The
>>  .B from:
>>  prefix is used to match the name or address of the sender of an email
>> @@ -104,6 +106,26 @@ contained within particular directories within the mail store. Only
>>  the directory components below the top-level mail database path are
>>  available to be searched.
>>  
>> +The
>> +.B date:
>> +prefix can be used to restrict the results to only messages within a
>> +particular time range (based on the Date: header) with a range syntax
>> +of:
>> +
>> +	date:<since>..<until>
>> +
>> +See \fBDATE AND TIME SEARCH\fR below for details on the range
>> +expression, and supported syntax for <since> and <until> date and time
>> +expressions.
>> +
>> +The time range can also be specified using timestamps with a syntax
>> +of:
>> +
>> +	<initial-timestamp>..<final-timestamp>
>> +
>> +Each timestamp is a number representing the number of seconds since
>> +1970\-01\-01 00:00:00 UTC.
>> +
>>  In addition to individual terms, multiple terms can be
>>  combined with Boolean operators (
>>  .BR and ", " or ", " not
>> @@ -117,20 +139,121 @@ operators, but will have to be protected from interpretation by the
>>  shell, (such as by putting quotation marks around any parenthesized
>>  expression).
>>  
>> -Finally, results can be restricted to only messages within a
>> -particular time range, (based on the Date: header) with a syntax of:
>> +.SH DATE AND TIME SEARCH
>>  
>> -	<initial-timestamp>..<final-timestamp>
>> +This is a non-exhaustive description of the date and time search with
>> +some pseudo notation. Most of the constructs can be mixed freely, and
>> +in any order, but the same absolute date or time can't be expressed
>> +twice.
>
> I'm not sure what the end of this sentence means, though I assume it's
> related to the restrictions on repeated absolute components.  It would
> also be nice to give a broader view of the syntax here.  Maybe,
>
> notmuch understands a variety of standard and natural ways of
> expressing dates and times, both in absolute terms ("2012-10-24") and
> in relative terms ("yesterday").  Any number of relative terms can be
> combined ("1 hour 25 minutes") and an absolute date/time can be
> combined with relative terms to further adjust it.  A non-exhaustive
> description of the syntax supported for absolute and relative terms is
> given below.
>
>>  
>> -Each timestamp is a number representing the number of seconds since
>> -1970\-01\-01 00:00:00 UTC. This is not the most convenient means of
>> -expressing date ranges, but until notmuch is fixed to accept a more
>> -convenient form, one can use the date program to construct
>> -timestamps. For example, with the bash shell the following syntax would
>> -specify a date range to return messages from 2009\-10\-01 until the
>> -current time:
>> -
>> -	$(date +%s \-d 2009\-10\-01)..$(date +%s)
>> +.RS 4
>> +.TP 4
>> +.B The range expression
>> +
>> +date:<since>..<until>
>> +
>> +The above expression restricts the results to only messages from
>> +<since> to <until>, based on the Date: header.
>> +
>> +If <since> or <until> describes time at an accuracy of days or less,
>> +the date/time is rounded, towards past for <since> and towards future
>> +for <until>, to be inclusive. For example, date:january..february
>
> The accuracy doesn't seem to have have anything to do with days; if I
> say "date:1hour..1hour" I get a span of an hour.  Describing it as
> rounding also seems like it could be confusing to someone who hasn't
> thought a lot about this (though, as someone who has though a lot
> about this, I could be wrong).  What about something like,
>
> <since> and <until> can describe imprecise times, such as "yesterday".
> In this case, <since> is taken as the earliest time it could describe
> (the beginning of yesterday) and <until> is taken as the latest time
> it could describe (the end of yesterday).  Similarly,
> date:january..february matches from the beginning of January to the
> end of February.
>
>> +matches from the beginning of January until the end of
>> +February. Similarly, date:yesterday..yesterday matches from the
>> +beginning of yesterday until the end of yesterday.
>> +
>> +Open-ended ranges are supported (since Xapian 1.2.1), i.e. it's
>> +possible to specify date:..<until> or date:<since>.. to not limit the
>> +start or end time, respectively. Unfortunately, pre-1.2.1 Xapian does
>
> No need for the "Unfortunately".
>
>> +not report an error on open ended ranges, but it does not work as
>> +expected either.
>> +
>> +Xapian does not support spaces in range expressions. You can replace
>
> The man pages essentially don't reference Xapian and the fact that we
> use Xapian is transparent to the uninterested user.  Maybe just
> "Currently, we do not support spaces ..."?  Or "Due to technical
> limitations, we do not currently support spaces ..." if you want to
> convey that we feel the user's pain but it's actually hard to fix.
>
>> +the spaces with '_', or (in most cases) '-', or (in some cases) leave
>> +the spaces out altogether.
>
> Maybe add "Examples in this man page use spaces for clarity."?  It's
> unfortunate that this rather critical piece of information is buried
> in the middle of a subsection of the man page.  I wonder if it should
> at least go before the previous paragraph?  We are going to get so
> many people asking why their date searches don't work...
>
>> +
>> +Entering date:expr without ".." (for example date:yesterday) won't
>> +work, as it's not interpreted as a range expression at all. You can
>> +achieve the expected result by duplicating the expr both sides of ".."
>> +(for example date:yesterday..yesterday).
>> +.RE
>> +
>> +.RS 4
>> +.TP 4
>> +.B Relative date and time
>> +[N|number] (years|months|weeks|days|hours|hrs|minutes|mins|seconds|secs) [...]
>> +
>> +All refer to past, can be repeated and will be accumulated.
>> +
>> +Units can be abbreviated to any length, with the otherwise ambiguous
>> +single m being m for minutes and M for months.
>> +
>> +Number multiplier can also be written out one, two, ..., ten, dozen,
>
> This is the only use of "multiplier".  I think it would be fine to
> just say "the number".
>
>> +hundred. As special cases last means one ("last week") and this means
>> +zero ("this month").
>
> Maybe, "Additionally, the unit may be preceded by "last" or "this"
> (e.g., "last week" or "this month")."?
>
>> +
>> +When combined with absolute date and time, the relative date and time
>> +specification will be relative from the specified absolute date and
>> +time.
>> +
>> +Examples: 5M2d, two weeks
>> +.RE
>> +
>> +.RS 4
>> +.TP 4
>> +.B Supported time formats
>
> Supported absolute time formats?
>
>> +H[H]:MM[:SS] [(am|a.m.|pm|p.m.)]
>> +
>> +H[H] (am|a.m.|pm|p.m.)
>> +
>> +HHMMSS
>> +
>> +now
>> +
>> +noon
>> +
>> +midnight
>> +
>> +Examples: 17:05, 5pm
>> +.RE
>> +
>> +.RS 4
>> +.TP 4
>> +.B Supported date formats
>
> Supported absolute date formats?
>
>> +YYYY-MM[-DD]
>> +
>> +DD-MM[-[YY]YY]
>> +
>> +MM-YYYY
>> +
>> +M[M]/D[D][/[YY]YY]
>> +
>> +M[M]/YYYY
>> +
>> +D[D].M[M][.[YY]YY]
>> +
>> +D[D][(st|nd|rd|th)] Mon[thname] [YYYY]
>> +
>> +Mon[thname] D[D][(st|nd|rd|th)] [YYYY]
>> +
>> +Wee[kday]
>> +
>> +Month names can be abbreviated at three or more characters.
>> +
>> +Weekday names can be abbreviated at three or more characters.
>> +
>> +Examples: 2012-07-31, 31-07-2012, 7/31/2012, August 3
>> +.RE
>> +
>> +.RS 4
>> +.TP 4
>> +.B Time zones
>> +(+|-)HH:MM
>> +
>> +(+|-)HH[MM]
>> +
>> +Some time zone codes, e.g. UTC, EET.
>> +.RE
>>  
>>  .SH SEE ALSO
>>  

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch
  2012-10-28 22:30     ` Jani Nikula
@ 2012-10-28 22:52       ` Austin Clements
  0 siblings, 0 replies; 21+ messages in thread
From: Austin Clements @ 2012-10-28 22:52 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

Quoth Jani Nikula on Oct 29 at 12:30 am:
> On Mon, 22 Oct 2012, Austin Clements <amdragon@MIT.EDU> wrote:
> >> +/*
> >> + * Accepted keywords.
> >> + *
> >> + * A keyword may optionally contain a '|' to indicate the minimum
> >> + * match length. Without one, full match is required. It's advisable
> >> + * to keep the minimum match parts unique across all keywords.
> >> + *
> >> + * If keyword begins with upper case letter, then the matching will be
> >> + * case sensitive. Otherwise the matching is case insensitive.
> >> + *
> >> + * If setter is NULL, set_default will be used.
> >> + *
> >> + * Note: Order matters. Matching is greedy, longest match is used, but
> >> + * of equal length matches the first one is used, unless there's an
> >> + * equal length case sensitive match which trumps case insensitive
> >> + * matches.
> >
> > If you do have a tokenizer (or disallow mashing keywords together),
> > then all of complexity arising from longest match goes away because
> > the keyword token either will or won't match a keyword.  If you also
> > eliminate the rule for case sensitivity and put case-sensitive things
> > before conflicting case-insensitive things (so put "M" before
> > "m|inutes"), then you can simply use the first match.
> 
> At least one reason for going through the whole table is that if this
> ever gets i18n support, the conflicting things might be different. While
> order matters in principle, you should create the table so that it
> really doesn't matter.

While that's true, if the input keyword has to be syntactically
delimited, there's still no such thing as a "longest match", since the
length of any match will be the length of the input.  You may still
want to scan the whole table, but if you find multiple matches, it's a
bug in the table indicating that |ed prefixes aren't unique.  Hence,
if you're not interested in finding bugs in the table, you can just
find the first match.

Or you could remove the |'s from the table, scan the whole table, and
consider the input string ambiguous if it matches multiple table
entries (being careful with case sensitivity), just like you do now if
the input string is shorter than the |ed prefixes.  That would
simplify your table, your matching logic, and possibly your scanning
logic.

> >
> >> + */
> >> +static struct keyword keywords[] = {
> >> +    /* Weekdays. */
> >> +    { N_("sun|day"),	TM_ABS_WDAY,	0,	NULL },
> >> +    { N_("mon|day"),	TM_ABS_WDAY,	1,	NULL },
> >> +    { N_("tue|sday"),	TM_ABS_WDAY,	2,	NULL },
> >> +    { N_("wed|nesday"),	TM_ABS_WDAY,	3,	NULL },
> >> +    { N_("thu|rsday"),	TM_ABS_WDAY,	4,	NULL },
> >> +    { N_("fri|day"),	TM_ABS_WDAY,	5,	NULL },
> >> +    { N_("sat|urday"),	TM_ABS_WDAY,	6,	NULL },
> >> +
> >> +    /* Months. */
> >> +    { N_("jan|uary"),	TM_ABS_MON,	1,	kw_set_month },
> >> +    { N_("feb|ruary"),	TM_ABS_MON,	2,	kw_set_month },
> >> +    { N_("mar|ch"),	TM_ABS_MON,	3,	kw_set_month },
> >> +    { N_("apr|il"),	TM_ABS_MON,	4,	kw_set_month },
> >> +    { N_("may"),	TM_ABS_MON,	5,	kw_set_month },
> >> +    { N_("jun|e"),	TM_ABS_MON,	6,	kw_set_month },
> >> +    { N_("jul|y"),	TM_ABS_MON,	7,	kw_set_month },
> >> +    { N_("aug|ust"),	TM_ABS_MON,	8,	kw_set_month },
> >> +    { N_("sep|tember"),	TM_ABS_MON,	9,	kw_set_month },
> >> +    { N_("oct|ober"),	TM_ABS_MON,	10,	kw_set_month },
> >> +    { N_("nov|ember"),	TM_ABS_MON,	11,	kw_set_month },
> >> +    { N_("dec|ember"),	TM_ABS_MON,	12,	kw_set_month },
> >> +
> >> +    /* Durations. */
> >> +    { N_("y|ears"),	TM_REL_YEAR,	1,	kw_set_rel },
> >> +    { N_("w|eeks"),	TM_REL_WEEK,	1,	kw_set_rel },
> >> +    { N_("d|ays"),	TM_REL_DAY,	1,	kw_set_rel },
> >> +    { N_("h|ours"),	TM_REL_HOUR,	1,	kw_set_rel },
> >> +    { N_("hr|s"),	TM_REL_HOUR,	1,	kw_set_rel },
> >> +    { N_("m|inutes"),	TM_REL_MIN,	1,	kw_set_rel },
> >> +    /* M=months, m=minutes */
> >> +    { N_("M"),		TM_REL_MON,	1,	kw_set_rel },
> >> +    { N_("mins"),	TM_REL_MIN,	1,	kw_set_rel },
> >> +    { N_("mo|nths"),	TM_REL_MON,	1,	kw_set_rel },
> >> +    { N_("s|econds"),	TM_REL_SEC,	1,	kw_set_rel },
> >> +    { N_("secs"),	TM_REL_SEC,	1,	kw_set_rel },
> >> +
> >> +    /* Numbers. */
> >> +    { N_("one"),	TM_NONE,	1,	kw_set_number },
> >> +    { N_("two"),	TM_NONE,	2,	kw_set_number },
> >> +    { N_("three"),	TM_NONE,	3,	kw_set_number },
> >> +    { N_("four"),	TM_NONE,	4,	kw_set_number },
> >> +    { N_("five"),	TM_NONE,	5,	kw_set_number },
> >> +    { N_("six"),	TM_NONE,	6,	kw_set_number },
> >> +    { N_("seven"),	TM_NONE,	7,	kw_set_number },
> >> +    { N_("eight"),	TM_NONE,	8,	kw_set_number },
> >> +    { N_("nine"),	TM_NONE,	9,	kw_set_number },
> >> +    { N_("ten"),	TM_NONE,	10,	kw_set_number },
> >> +    { N_("dozen"),	TM_NONE,	12,	kw_set_number },
> >> +    { N_("hundred"),	TM_NONE,	100,	kw_set_number },
> >> +
> >> +    /* Special number forms. */
> >> +    { N_("this"),	TM_NONE,	0,	kw_set_number },
> >> +    { N_("last"),	TM_NONE,	1,	kw_set_number },
> >> +
> >> +    /* Other special keywords. */
> >> +    { N_("yesterday"),	TM_REL_DAY,	1,	kw_set_rel },
> >> +    { N_("today"),	TM_NONE,	0,	kw_set_today },
> >> +    { N_("now"),	TM_NONE,	0,	kw_set_now },
> >> +    { N_("noon"),	TM_NONE,	12,	kw_set_timeofday },
> >> +    { N_("midnight"),	TM_NONE,	0,	kw_set_timeofday },
> >> +    { N_("am"),		TM_AMPM,	0,	kw_set_ampm },
> >> +    { N_("a.m."),	TM_AMPM,	0,	kw_set_ampm },
> >> +    { N_("pm"),		TM_AMPM,	1,	kw_set_ampm },
> >> +    { N_("p.m."),	TM_AMPM,	1,	kw_set_ampm },
> >> +    { N_("st"),		TM_NONE,	0,	kw_set_ordinal },
> >> +    { N_("nd"),		TM_NONE,	0,	kw_set_ordinal },
> >> +    { N_("rd"),		TM_NONE,	0,	kw_set_ordinal },
> >> +    { N_("th"),		TM_NONE,	0,	kw_set_ordinal },
> >> +
> >> +    /* Timezone codes: offset in minutes. XXX: Add more codes. */
> >> +    { N_("pst"),	TM_TZ,		-8*60,	NULL },
> >> +    { N_("mst"),	TM_TZ,		-7*60,	NULL },
> >> +    { N_("cst"),	TM_TZ,		-6*60,	NULL },
> >> +    { N_("est"),	TM_TZ,		-5*60,	NULL },
> >> +    { N_("ast"),	TM_TZ,		-4*60,	NULL },
> >> +    { N_("nst"),	TM_TZ,		-(3*60+30),	NULL },
> >> +
> >> +    { N_("gmt"),	TM_TZ,		0,	NULL },
> >> +    { N_("utc"),	TM_TZ,		0,	NULL },
> >> +
> >> +    { N_("wet"),	TM_TZ,		0,	NULL },
> >> +    { N_("cet"),	TM_TZ,		1*60,	NULL },
> >> +    { N_("eet"),	TM_TZ,		2*60,	NULL },
> >> +    { N_("fet"),	TM_TZ,		3*60,	NULL },
> >> +
> >> +    { N_("wat"),	TM_TZ,		1*60,	NULL },
> >> +    { N_("cat"),	TM_TZ,		2*60,	NULL },
> >> +    { N_("eat"),	TM_TZ,		3*60,	NULL },
> >> +};
> >> +
> >> +/*
> >> + * Compare strings s and keyword. Return number of matching chars on
> >> + * match, 0 for no match. Match must be at least n chars, or all of
> >> + * keyword if n < 0, otherwise it's not a match. Use match_case for
> >> + * case sensitive matching.
> >> + */
> >> +static size_t
> >> +match_keyword (const char *s, const char *keyword, ssize_t n, bool match_case)
> >> +{
> >> +    ssize_t i;
> >> +
> >> +    if (!n)
> >> +	return 0;
> >> +
> >> +    for (i = 0; *s && *keyword; i++, s++, keyword++) {
> >> +	if (match_case) {
> >> +	    if (*s != *keyword)
> >
> > The pointer arithmetic doesn't seem to buy anything here.  What about
> > just looping over i and using s[i] and keyword[i]?
> 
> The pointer arithmetic will be useful when I implement your other
> suggestion of handling '|' here. ;) Otherwise, I'd need two index
> variables.

Fair enough.

> >
> >> +		break;
> >> +	} else {
> >> +	    if (tolower ((unsigned char) *s) !=
> >> +		tolower ((unsigned char) *keyword))
> >
> > I don't think the cast to unsigned char is necessary.
> 
> As discussed on IRC, pedantically it is necessary, as ctype.h functions
> accept an int that must have the value of an unsigned char or EOF, and
> char might be signed.

It wouldn't be C without the pedantic.

> >> +/* Combine absolute and relative fields, and round. */
> >> +static int
> >> +create_output (struct state *state, time_t *t_out, const time_t *ref,
> >> +	       int round)
> >> +{
> >> +    struct tm tm = { .tm_isdst = -1 };
> >> +    struct tm now;
> >> +    time_t t;
> >> +    enum field f;
> >> +    int r;
> >> +    int week_round = PARSE_TIME_NO_ROUND;
> >> +
> >> +    r = initialize_now (state, &now, ref);
> >> +    if (r)
> >> +	return r;
> >> +
> >> +    /* Initialize fields flagged as "now" to reference time. */
> >> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
> >> +	if (state->set[f] == FIELD_NOW) {
> >> +	    state->tm[f] = tm_get_field (&now, f);
> >> +	    state->set[f] = FIELD_SET;
> >> +	}
> >> +    }
> >> +
> >> +    /*
> >> +     * If WDAY is set but MDAY is not, we consider WDAY relative
> >> +     *
> >> +     * XXX: This fails on stuff like "two months monday" because two
> >> +     * months ago wasn't the same day as today. Postpone until we know
> >> +     * date?
> >> +     */
> >> +    if (is_field_set (state, TM_ABS_WDAY) &&
> >> +	!is_field_set (state, TM_ABS_MDAY)) {
> >> +	int wday = get_field (state, TM_ABS_WDAY);
> >> +	int today = tm_get_field (&now, TM_ABS_WDAY);
> >> +	int rel_days;
> >> +
> >> +	if (today > wday)
> >> +	    rel_days = today - wday;
> >> +	else
> >> +	    rel_days = today + 7 - wday;
> >> +
> >> +	/* This also prevents special week rounding from happening. */
> >> +	mod_field (state, TM_REL_DAY, rel_days);
> >> +
> >> +	unset_field (state, TM_ABS_WDAY);
> >> +    }
> >> +
> >> +    r = fixup_ampm (state);
> >> +    if (r)
> >> +	return r;
> >> +
> >> +    /*
> >> +     * Iterate fields from most accurate to least accurate, and set
> >> +     * unset fields according to requested rounding.
> >> +     */
> >> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
> >> +	if (round != PARSE_TIME_NO_ROUND) {
> >> +	    enum field r = abs_to_rel_field (f);
> >> +
> >> +	    if (is_field_set (state, f) || is_field_set (state, r)) {
> >> +		if (round >= PARSE_TIME_ROUND_UP && f != TM_ABS_SEC) {
> >> +		    mod_field (state, r, -1);
> >
> > Crazy.  This could use a comment.  It took me a while to figure out
> > why this was -1, though maybe that's just because it's late.
> 
> Will do.
> 
> /* You're not expected to understand this */ ;)

Hah.  You're not allowed to use that on me!  I *do* understand the
code that comment is originally from.  ]:--8)

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-10-28 22:52 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-21 21:22 [PATCH v5 0/9] notmuch search date:since..until query support Jani Nikula
2012-10-21 21:22 ` [PATCH v5 1/9] build: drop the -Wswitch-enum warning Jani Nikula
2012-10-21 21:22 ` [PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch Jani Nikula
2012-10-22  8:14   ` Austin Clements
2012-10-25 18:58     ` Austin Clements
2012-10-27 20:38       ` Tomi Ollila
2012-10-28 22:30     ` Jani Nikula
2012-10-28 22:52       ` Austin Clements
2012-10-21 21:22 ` [PATCH v5 3/9] test: add new test tool parse-time for date/time parser Jani Nikula
2012-10-21 21:22 ` [PATCH v5 4/9] test: add smoke tests for the date/time parser module Jani Nikula
2012-10-23  4:23   ` Austin Clements
2012-10-28 22:34     ` Jani Nikula
2012-10-21 21:22 ` [PATCH v5 5/9] build: build parse-time-string as part of the notmuch lib and static cli Jani Nikula
2012-10-21 21:22 ` [PATCH v5 6/9] lib: add date range query support Jani Nikula
2012-10-23  4:52   ` Austin Clements
2012-10-28 22:39     ` Jani Nikula
2012-10-21 21:22 ` [PATCH v5 7/9] test: add tests for date:since..until range queries Jani Nikula
2012-10-21 21:22 ` [PATCH v5 8/9] man: document the " Jani Nikula
2012-10-24 21:08   ` Austin Clements
2012-10-28 22:41     ` Jani Nikula
2012-10-21 21:22 ` [PATCH v5 9/9] NEWS: date range search support Jani Nikula

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).