* [PATCH v3 0/9] notmuch search date:since..until query support @ 2012-09-12 21:27 Jani Nikula 2012-09-12 21:27 ` [PATCH v3 1/9] build: drop the -Wswitch-enum warning Jani Nikula ` (8 more replies) 0 siblings, 9 replies; 30+ messages in thread From: Jani Nikula @ 2012-09-12 21:27 UTC (permalink / raw) To: notmuch, David Bremner Hi all, v3 of id:"cover.1344065790.git.jani@nikula.org". Notable changes since v2: * Address most of David's comments in id:"877gtdmqol.fsf@zancas.localnet". Delegating the list of time zones to the system is not without problems, so not done. Also create_output() is not split further. * Move the parser to a subdirectory of its own to be independent of the rest of the notmuch code base, and build it as a static library. This should be useful if the parser is ever packaged as a library of its own. * Add a high level documentation comment, and improve comments all around. * Add NEWS with hopes that this would make 0.15. :) BR, Jani. Jani Nikula (9): build: drop the -Wswitch-enum warning parse-time-string: add a date/time parser to notmuch test: add new test tool parse-time for date/time parser test: add smoke tests for the date/time parser module build: build parse-time-string as part of the notmuch lib and static cli lib: add date range query support test: add tests for date:since..until range queries man: document the date:since..until range queries NEWS: date range search support Makefile | 2 +- Makefile.local | 2 +- NEWS | 14 + configure | 2 +- lib/Makefile.local | 3 +- lib/database-private.h | 1 + lib/database.cc | 5 + lib/parse-time-vrp.cc | 40 + lib/parse-time-vrp.h | 19 + man/man7/notmuch-search-terms.7 | 147 +++- parse-time-string/Makefile | 5 + parse-time-string/Makefile.local | 12 + parse-time-string/README | 9 + parse-time-string/parse-time-string.c | 1484 +++++++++++++++++++++++++++++++++ parse-time-string/parse-time-string.h | 95 +++ test/Makefile.local | 7 +- test/basic | 2 +- test/notmuch-test | 2 + test/parse-time-string | 26 + test/parse-time.c | 145 ++++ test/search-date | 21 + 21 files changed, 2025 insertions(+), 18 deletions(-) create mode 100644 lib/parse-time-vrp.cc create mode 100644 lib/parse-time-vrp.h create mode 100644 parse-time-string/Makefile create mode 100644 parse-time-string/Makefile.local create mode 100644 parse-time-string/README create mode 100644 parse-time-string/parse-time-string.c create mode 100644 parse-time-string/parse-time-string.h create mode 100755 test/parse-time-string create mode 100644 test/parse-time.c create mode 100755 test/search-date -- 1.7.9.5 ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 1/9] build: drop the -Wswitch-enum warning 2012-09-12 21:27 [PATCH v3 0/9] notmuch search date:since..until query support Jani Nikula @ 2012-09-12 21:27 ` Jani Nikula 2012-09-12 21:27 ` [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch Jani Nikula ` (7 subsequent siblings) 8 siblings, 0 replies; 30+ messages in thread From: Jani Nikula @ 2012-09-12 21:27 UTC (permalink / raw) To: notmuch, David Bremner -Wswitch-enum is a bit awkward if a switch statement is intended to handle just some of the named codes of an enumeration especially, and leave the rest to the default label. We already have -Wall, which enables -Wswitch by default, and per GCC documentation, "The only difference between -Wswitch and this option [-Wswitch-enum] is that this option gives a warning about an omitted enumeration code even if there is a default label." Drop -Wswitch-enum to not force listing all named codes of enumerations in switch statements that have a default label. --- This will be useful in the next patch. --- configure | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure b/configure index acb90a8..afa5c16 100755 --- a/configure +++ b/configure @@ -532,7 +532,7 @@ fi WARN_CXXFLAGS="" printf "Checking for available C++ compiler warning flags... " -for flag in -Wall -Wextra -Wwrite-strings -Wswitch-enum; do +for flag in -Wall -Wextra -Wwrite-strings; do if ${CC} $flag -o minimal minimal.c > /dev/null 2>&1 then WARN_CXXFLAGS="${WARN_CXXFLAGS}${WARN_CXXFLAGS:+ }${flag}" -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch 2012-09-12 21:27 [PATCH v3 0/9] notmuch search date:since..until query support Jani Nikula 2012-09-12 21:27 ` [PATCH v3 1/9] build: drop the -Wswitch-enum warning Jani Nikula @ 2012-09-12 21:27 ` Jani Nikula 2012-09-13 11:10 ` Michal Nazarewicz 2012-09-25 11:56 ` Michal Sojka 2012-09-12 21:27 ` [PATCH v3 3/9] test: add new test tool parse-time for date/time parser Jani Nikula ` (6 subsequent siblings) 8 siblings, 2 replies; 30+ messages in thread From: Jani Nikula @ 2012-09-12 21:27 UTC (permalink / raw) To: notmuch, David Bremner Add a date/time parser to notmuch, to be used for adding date range query support for notmuch lib later on. Add the parser to a directory of its own to make it independent of the rest of the notmuch code base. Signed-off-by: Jani Nikula <jani@nikula.org> --- Makefile | 2 +- parse-time-string/Makefile | 5 + parse-time-string/Makefile.local | 12 + parse-time-string/README | 9 + parse-time-string/parse-time-string.c | 1484 +++++++++++++++++++++++++++++++++ parse-time-string/parse-time-string.h | 95 +++ 6 files changed, 1606 insertions(+), 1 deletion(-) create mode 100644 parse-time-string/Makefile create mode 100644 parse-time-string/Makefile.local create mode 100644 parse-time-string/README create mode 100644 parse-time-string/parse-time-string.c create mode 100644 parse-time-string/parse-time-string.h diff --git a/Makefile b/Makefile index e5e2e3a..bb9c316 100644 --- a/Makefile +++ b/Makefile @@ -3,7 +3,7 @@ all: # List all subdirectories here. Each contains its own Makefile.local -subdirs = compat completion emacs lib man util test +subdirs = compat completion emacs lib man parse-time-string util test # We make all targets depend on the Makefiles themselves. global_deps = Makefile Makefile.config Makefile.local \ diff --git a/parse-time-string/Makefile b/parse-time-string/Makefile new file mode 100644 index 0000000..fa25832 --- /dev/null +++ b/parse-time-string/Makefile @@ -0,0 +1,5 @@ +all: + $(MAKE) -C .. all + +.DEFAULT: + $(MAKE) -C .. $@ diff --git a/parse-time-string/Makefile.local b/parse-time-string/Makefile.local new file mode 100644 index 0000000..53534f3 --- /dev/null +++ b/parse-time-string/Makefile.local @@ -0,0 +1,12 @@ +dir := parse-time-string +extra_cflags += -I$(srcdir)/$(dir) + +libparse-time-string_c_srcs := $(dir)/parse-time-string.c + +libparse-time-string_modules := $(libparse-time-string_c_srcs:.c=.o) + +$(dir)/libparse-time-string.a: $(libparse-time-string_modules) + $(call quiet,AR) rcs $@ $^ + +SRCS := $(SRCS) $(libparse-time-string_c_srcs) +CLEAN := $(CLEAN) $(libparse-time-string_modules) $(dir)/libparse-time-string.a diff --git a/parse-time-string/README b/parse-time-string/README new file mode 100644 index 0000000..300ff1f --- /dev/null +++ b/parse-time-string/README @@ -0,0 +1,9 @@ +PARSE TIME STRING +================= + +parse_time_string() is a date/time parser originally written for +notmuch by Jani Nikula <jani@nikula.org>. However, there is nothing +notmuch specific in it, and it should be kept reusable for other +projects, and ready to be packaged on its own as needed. Please do not +add dependencies on or references to anything notmuch specific. The +parser should only depend on the C library. diff --git a/parse-time-string/parse-time-string.c b/parse-time-string/parse-time-string.c new file mode 100644 index 0000000..15cf686 --- /dev/null +++ b/parse-time-string/parse-time-string.c @@ -0,0 +1,1484 @@ +/* + * parse time string - user friendly date and time parser + * Copyright © 2012 Jani Nikula + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + * + * Author: Jani Nikula <jani@nikula.org> + */ + +#include <assert.h> +#include <ctype.h> +#include <errno.h> +#include <limits.h> +#include <stdio.h> +#include <stdarg.h> +#include <stdbool.h> +#include <stdlib.h> +#include <string.h> +#include <strings.h> +#include <time.h> +#include <sys/time.h> +#include <sys/types.h> + +#include "parse-time-string.h" + +/* + * IMPLEMENTATION DETAILS + * + * At a high level, the parsing is done in two phases: 1) actual + * parsing of the input string and storing the parsed data into + * 'struct state', and 2) processing of the data in 'struct state' + * according to current time (or provided reference time) and + * rounding. This is evident in the main entry point function + * parse_time_string(). + * + * 1) The parsing phase - parse_input() + * + * Parsing is greedy and happens from left to right. The parsing is as + * unambiguous as possible; only unambiguous date/time formats are + * accepted. Redundant or contradictory absolute date/time in the + * input (e.g. date specified multiple times/ways) is not + * accepted. Relative date/time on the other hand just accumulates if + * present multiple times (e.g. "5 days 5 days" just turns into 10 + * days). + * + * Parsing decisions are made on the input format, not value. For + * example, "20/5/2005" fails because the recognized format here is + * MM/D/YYYY, even though the values would suggest DD/M/YYYY. + * + * Parsing is mostly stateless in the sense that parsing decisions are + * not made based on the values of previously parsed data, or whether + * certain data is present in the first place. (There are a few + * exceptions to the latter part, though, such as parsing of time zone + * that would otherwise look like plain time.) + * + * When the parser encounters a number that is not greedily parsed as + * part of a format, the interpretation is postponed until the next + * token is parsed. The parser for the next token may consume the + * previously postponed number. For example, when parsing "20 May" the + * meaning of "20" is not known until "May" is parsed. If the parser + * for the next token does not consume the postponed number, the + * number is handled as a "lone" number before parser for the next + * token finishes. + * + * 2) The processing phase - create_output() + * + * Once the parser in phase 1 has finished, 'struct state' contains + * all the information from the input string, and it's no longer + * needed. Since the parser does not even handle the concept of "now", + * the processing initializes the fields referring to the current + * date/time. + * + * If requested, the result is rounded towards past or future. The + * idea behind rounding is to support parsing date/time ranges in an + * obvious way. For example, for a range defined as two dates (without + * time), one would typically want to have an inclusive range from the + * beginning of start date to the end of the end date. The caller + * would use rounding towards past in the start date, and towards + * future in the end date. + * + * The absolute date and time is shifted by the relative date and + * time, and time zone adjustments are made. Daylight saving time + * (DST) is specifically *not* handled at all. + * + * Finally, the result is stored to time_t. + */ + +#define unused(x) x __attribute__ ((unused)) + +/* XXX: Redefine these to add i18n support. The keyword table uses + * N_() to mark strings to be translated; they are accessed + * dynamically using _(). */ +#define _(s) (s) /* i18n: define as gettext (s) */ +#define N_(s) (s) /* i18n: define as gettext_noop (s) */ + +#define ARRAY_SIZE(a) (sizeof (a) / sizeof (a[0])) + +/* + * Field indices in the tm and set arrays of struct state. + * + * NOTE: There's some code that depends on the ordering of this enum. + */ +enum field { + /* Keep SEC...YEAR in this order. */ + TM_ABS_SEC, /* seconds */ + TM_ABS_MIN, /* minutes */ + TM_ABS_HOUR, /* hours */ + TM_ABS_MDAY, /* day of the month */ + TM_ABS_MON, /* month */ + TM_ABS_YEAR, /* year */ + + TM_ABS_WDAY, /* day of the week. special: may be relative */ + TM_ABS_ISDST, /* daylight saving time */ + + TM_AMPM, /* am vs. pm */ + TM_TZ, /* timezone in minutes */ + + /* Keep SEC...YEAR in this order. */ + TM_REL_SEC, /* seconds relative to now */ + TM_REL_MIN, /* minutes ... */ + TM_REL_HOUR, /* hours ... */ + TM_REL_DAY, /* days ... */ + TM_REL_MON, /* months ... */ + TM_REL_YEAR, /* years ... */ + TM_REL_WEEK, /* weeks ... */ + + TM_NONE, /* not a field */ + + TM_SIZE = TM_NONE, + TM_FIRST_ABS = TM_ABS_SEC, + TM_FIRST_REL = TM_REL_SEC, +}; + +/* Values for the set array of struct state. */ +enum field_set { + FIELD_UNSET, /* The field has not been touched by parser. */ + FIELD_SET, /* The field has been set by parser. */ + FIELD_NOW, /* The field will be set to "now". */ +}; + +static enum field +next_abs_field (enum field field) +{ + /* NOTE: Depends on the enum ordering. */ + return field < TM_ABS_YEAR ? field + 1 : TM_NONE; +} + +static enum field +abs_to_rel_field (enum field field) +{ + assert (field <= TM_ABS_YEAR); + + /* NOTE: Depends on the enum ordering. */ + return field + (TM_FIRST_REL - TM_FIRST_ABS); +} + +/* Get epoch value for field. */ +static int +field_epoch (enum field field) +{ + if (field == TM_ABS_MDAY || field == TM_ABS_MON) + return 1; + else if (field == TM_ABS_YEAR) + return 1970; + else + return 0; +} + +/* The parsing state. */ +struct state { + int tm[TM_SIZE]; /* parsed date and time */ + enum field_set set[TM_SIZE]; /* set status of tm */ + + enum field last_field; /* Previously set field. */ + enum field next_field; /* Next field for parse_postponed_number() */ + char delim; + + int postponed_length; /* Number of digits in postponed value. */ + int postponed_value; + char postponed_delim; /* The delimiter preceding postponed number. */ +}; + +/* + * Helpers for postponed numbers. + * + * postponed_length is the number of digits in postponed value. 0 + * means there is no postponed number. -1 means there is a postponed + * number, but it comes from a keyword, and it doesn't have digits. + */ +static int +get_postponed_length (struct state *state) +{ + return state->postponed_length; +} + +/* + * Consume a previously postponed number. Return true if a number was + * in fact postponed, false otherwise. Store the postponed number's + * value in *v, length in the input string in *n (or -1 if the number + * was written out and parsed as a keyword), and the preceding + * delimiter to *d. + */ +static bool +get_postponed_number (struct state *state, int *v, int *n, char *d) +{ + if (!state->postponed_length) + return false; + + if (n) + *n = state->postponed_length; + + if (v) + *v = state->postponed_value; + + if (d) + *d = state->postponed_delim; + + state->postponed_length = 0; + state->postponed_value = 0; + state->postponed_delim = 0; + + return true; +} + +/* Parse a previously postponed number if one exists. */ +static int parse_postponed_number (struct state *state, int v, int n, char d); +static int +handle_postponed_number (struct state *state, enum field next_field) +{ + int v = state->postponed_value; + int n = state->postponed_length; + char d = state->postponed_delim; + int r; + + if (!n) + return 0; + + state->postponed_value = 0; + state->postponed_length = 0; + state->postponed_delim = 0; + + state->next_field = next_field; + r = parse_postponed_number (state, v, n, d); + state->next_field = TM_NONE; + + return r; +} + +/* + * Postpone a number to be handled later. If one exists already, + * handle it first. n may be -1 to indicate a keyword that has no + * number length. + */ +static int +set_postponed_number (struct state *state, int v, int n) +{ + int r; + char d = state->delim; + + /* Parse a previously postponed number, if any. */ + r = handle_postponed_number (state, TM_NONE); + if (r) + return r; + + state->postponed_length = n; + state->postponed_value = v; + state->postponed_delim = d; + + return 0; +} + +static void +set_delim (struct state *state, char delim) +{ + state->delim = delim; +} + +static void +unset_delim (struct state *state) +{ + state->delim = 0; +} + +/* + * Field set/get/mod helpers. + */ + +/* Return true if field has been set. */ +static bool +is_field_set (struct state *state, enum field field) +{ + assert (field < ARRAY_SIZE (state->tm)); + + return field < ARRAY_SIZE (state->set) && + state->set[field] != FIELD_UNSET; +} + +static void +unset_field (struct state *state, enum field field) +{ + assert (field < ARRAY_SIZE (state->tm)); + + state->set[field] = FIELD_UNSET; + state->tm[field] = 0; +} + +/* + * Set field to value. A field can only be set once to ensure the + * input does not contain redundant and potentially conflicting data. + */ +static int +set_field (struct state *state, enum field field, int value) +{ + int r; + + assert (field < ARRAY_SIZE (state->tm)); + + /* Fields can only be set once. */ + if (field < ARRAY_SIZE (state->set) && state->set[field] != FIELD_UNSET) + return -PARSE_TIME_ERR_ALREADYSET; + + state->set[field] = FIELD_SET; + + /* Parse postponed number, if any. */ + r = handle_postponed_number (state, field); + if (r) + return r; + + unset_delim (state); + + state->tm[field] = value; + state->last_field = field; + + return 0; +} + +/* + * Mark n fields in fields to be set to current date/time in the + * specified time zone, or local timezone if not specified. The fields + * will be initialized after parsing is complete and timezone is + * known. + */ +static int +set_fields_to_now (struct state *state, enum field *fields, size_t n) +{ + size_t i; + int r; + + for (i = 0; i < n; i++) { + r = set_field (state, fields[i], 0); + if (r) + return r; + state->set[fields[i]] = FIELD_NOW; + } + + return 0; +} + +/* Modify field by adding value to it. To be used on relative fields, + * which can be modified multiple times (to accumulate). */ +static int +mod_field (struct state *state, enum field field, int value) +{ + int r; + + assert (field < ARRAY_SIZE (state->tm)); /* assert relative??? */ + + if (field < ARRAY_SIZE (state->set)) + state->set[field] = FIELD_SET; + + /* Parse postponed number, if any. */ + r = handle_postponed_number (state, field); + if (r) + return r; + + unset_delim (state); + + state->tm[field] += value; + state->last_field = field; + + return 0; +} + +/* + * Get field value. Make sure the field is set before query. It's most + * likely an error to call this while parsing (for example fields set + * as FIELD_NOW will only be set to some value after parsing). + */ +static int +get_field (struct state *state, enum field field) +{ + assert (field < ARRAY_SIZE (state->tm)); + + return state->tm[field]; +} + +/* + * Validity checkers. + */ +static bool is_valid_12hour (int h) +{ + return h >= 0 && h <= 12; +} + +static bool is_valid_time (int h, int m, int s) +{ + /* Allow 24:00:00 to denote end of day. */ + if (h == 24 && m == 0 && s == 0) + return true; + + return h >= 0 && h <= 23 && m >= 0 && m <= 59 && s >= 0 && s <= 59; +} + +static bool is_valid_mday (int mday) +{ + return mday >= 1 && mday <= 31; +} + +static bool is_valid_mon (int mon) +{ + return mon >= 1 && mon <= 12; +} + +static bool is_valid_year (int year) +{ + return year >= 1970; +} + +static bool is_valid_date (int year, int mon, int mday) +{ + return is_valid_year (year) && is_valid_mon (mon) && is_valid_mday (mday); +} + +/* Unset indicator for time and date set helpers. */ +#define UNSET -1 + +/* Time set helper. No input checking. Use UNSET (-1) to leave unset. */ +static int +set_abs_time (struct state *state, int hour, int min, int sec) +{ + int r; + + if (hour != UNSET) { + if ((r = set_field (state, TM_ABS_HOUR, hour))) + return r; + } + + if (min != UNSET) { + if ((r = set_field (state, TM_ABS_MIN, min))) + return r; + } + + if (sec != UNSET) { + if ((r = set_field (state, TM_ABS_SEC, sec))) + return r; + } + + return 0; +} + +/* Date set helper. No input checking. Use UNSET (-1) to leave unset. */ +static int +set_abs_date (struct state *state, int year, int mon, int mday) +{ + int r; + + if (year != UNSET) { + if ((r = set_field (state, TM_ABS_YEAR, year))) + return r; + } + + if (mon != UNSET) { + if ((r = set_field (state, TM_ABS_MON, mon))) + return r; + } + + if (mday != UNSET) { + if ((r = set_field (state, TM_ABS_MDAY, mday))) + return r; + } + + return 0; +} + +/* + * Keyword parsing and handling. + */ +struct keyword; +typedef int (*setter_t)(struct state *state, struct keyword *kw); + +struct keyword { + const char *name; /* keyword */ + enum field field; /* field to set, or FIELD_NONE if N/A */ + int value; /* value to set, or 0 if N/A */ + setter_t set; /* function to use for setting, if non-NULL */ +}; + +/* + * Setter callback functions for keywords. + */ +static int +kw_set_default (struct state *state, struct keyword *kw) +{ + return set_field (state, kw->field, kw->value); +} + +static int +kw_set_rel (struct state *state, struct keyword *kw) +{ + int multiplier = 1; + + /* Get a previously set multiplier, if any. */ + get_postponed_number (state, &multiplier, NULL, NULL); + + /* Accumulate relative field values. */ + return mod_field (state, kw->field, multiplier * kw->value); +} + +static int +kw_set_number (struct state *state, struct keyword *kw) +{ + /* -1 = no length, from keyword. */ + return set_postponed_number (state, kw->value, -1); +} + +static int +kw_set_month (struct state *state, struct keyword *kw) +{ + int n = get_postponed_length (state); + + /* Consume postponed number if it could be mday. This handles "20 + * January". */ + if (n == 1 || n == 2) { + int r, v; + + get_postponed_number (state, &v, NULL, NULL); + + if (!is_valid_mday (v)) + return -PARSE_TIME_ERR_INVALIDDATE; + + r = set_field (state, TM_ABS_MDAY, v); + if (r) + return r; + } + + return set_field (state, kw->field, kw->value); +} + +static int +kw_set_ampm (struct state *state, struct keyword *kw) +{ + int n = get_postponed_length (state); + + /* Consume postponed number if it could be hour. This handles + * "5pm". */ + if (n == 1 || n == 2) { + int r, v; + + get_postponed_number (state, &v, NULL, NULL); + + if (!is_valid_12hour (v)) + return -PARSE_TIME_ERR_INVALIDTIME; + + r = set_abs_time (state, v, 0, 0); + if (r) + return r; + } + + return set_field (state, kw->field, kw->value); +} + +static int +kw_set_timeofday (struct state *state, struct keyword *kw) +{ + return set_abs_time (state, kw->value, 0, 0); +} + +static int +kw_set_today (struct state *state, unused (struct keyword *kw)) +{ + enum field fields[] = { TM_ABS_YEAR, TM_ABS_MON, TM_ABS_MDAY }; + + return set_fields_to_now (state, fields, ARRAY_SIZE (fields)); +} + +static int +kw_set_now (struct state *state, unused (struct keyword *kw)) +{ + enum field fields[] = { TM_ABS_HOUR, TM_ABS_MIN, TM_ABS_SEC }; + + return set_fields_to_now (state, fields, ARRAY_SIZE (fields)); +} + +static int +kw_set_ordinal (struct state *state, struct keyword *kw) +{ + int n, v; + + /* Require a postponed number. */ + if (!get_postponed_number (state, &v, &n, NULL)) + return -PARSE_TIME_ERR_DATEFORMAT; + + /* Ordinals are mday. */ + if (n != 1 && n != 2) + return -PARSE_TIME_ERR_DATEFORMAT; + + /* Be strict about st, nd, rd, and lax about th. */ + if (strcasecmp (kw->name, "st") == 0 && v != 1 && v != 21 && v != 31) + return -PARSE_TIME_ERR_INVALIDDATE; + else if (strcasecmp (kw->name, "nd") == 0 && v != 2 && v != 22) + return -PARSE_TIME_ERR_INVALIDDATE; + else if (strcasecmp (kw->name, "rd") == 0 && v != 3 && v != 23) + return -PARSE_TIME_ERR_INVALIDDATE; + else if (strcasecmp (kw->name, "th") == 0 && !is_valid_mday (v)) + return -PARSE_TIME_ERR_INVALIDDATE; + + return set_field (state, TM_ABS_MDAY, v); +} + +/* + * Accepted keywords. + * + * A keyword may optionally contain a '|' to indicate the minimum + * match length. Without one, full match is required. It's advisable + * to keep the minimum match parts unique across all keywords. + * + * If keyword begins with upper case letter, then the matching will be + * case sensitive. Otherwise the matching is case insensitive. + * + * If setter is NULL, set_default will be used. + * + * Note: Order matters. Matching is greedy, longest match is used, but + * of equal length matches the first one is used, unless there's an + * equal length case sensitive match which trumps case insensitive + * matches. + */ +static struct keyword keywords[] = { + /* Weekdays. */ + { N_("sun|day"), TM_ABS_WDAY, 0, NULL }, + { N_("mon|day"), TM_ABS_WDAY, 1, NULL }, + { N_("tue|sday"), TM_ABS_WDAY, 2, NULL }, + { N_("wed|nesday"), TM_ABS_WDAY, 3, NULL }, + { N_("thu|rsday"), TM_ABS_WDAY, 4, NULL }, + { N_("fri|day"), TM_ABS_WDAY, 5, NULL }, + { N_("sat|urday"), TM_ABS_WDAY, 6, NULL }, + + /* Months. */ + { N_("jan|uary"), TM_ABS_MON, 1, kw_set_month }, + { N_("feb|ruary"), TM_ABS_MON, 2, kw_set_month }, + { N_("mar|ch"), TM_ABS_MON, 3, kw_set_month }, + { N_("apr|il"), TM_ABS_MON, 4, kw_set_month }, + { N_("may"), TM_ABS_MON, 5, kw_set_month }, + { N_("jun|e"), TM_ABS_MON, 6, kw_set_month }, + { N_("jul|y"), TM_ABS_MON, 7, kw_set_month }, + { N_("aug|ust"), TM_ABS_MON, 8, kw_set_month }, + { N_("sep|tember"), TM_ABS_MON, 9, kw_set_month }, + { N_("oct|ober"), TM_ABS_MON, 10, kw_set_month }, + { N_("nov|ember"), TM_ABS_MON, 11, kw_set_month }, + { N_("dec|ember"), TM_ABS_MON, 12, kw_set_month }, + + /* Durations. */ + { N_("y|ears"), TM_REL_YEAR, 1, kw_set_rel }, + { N_("w|eeks"), TM_REL_WEEK, 1, kw_set_rel }, + { N_("d|ays"), TM_REL_DAY, 1, kw_set_rel }, + { N_("h|ours"), TM_REL_HOUR, 1, kw_set_rel }, + { N_("hr|s"), TM_REL_HOUR, 1, kw_set_rel }, + { N_("m|inutes"), TM_REL_MIN, 1, kw_set_rel }, + /* M=months, m=minutes */ + { N_("M"), TM_REL_MON, 1, kw_set_rel }, + { N_("mins"), TM_REL_MIN, 1, kw_set_rel }, + { N_("mo|nths"), TM_REL_MON, 1, kw_set_rel }, + { N_("s|econds"), TM_REL_SEC, 1, kw_set_rel }, + { N_("secs"), TM_REL_SEC, 1, kw_set_rel }, + + /* Numbers. */ + { N_("one"), TM_NONE, 1, kw_set_number }, + { N_("two"), TM_NONE, 2, kw_set_number }, + { N_("three"), TM_NONE, 3, kw_set_number }, + { N_("four"), TM_NONE, 4, kw_set_number }, + { N_("five"), TM_NONE, 5, kw_set_number }, + { N_("six"), TM_NONE, 6, kw_set_number }, + { N_("seven"), TM_NONE, 7, kw_set_number }, + { N_("eight"), TM_NONE, 8, kw_set_number }, + { N_("nine"), TM_NONE, 9, kw_set_number }, + { N_("ten"), TM_NONE, 10, kw_set_number }, + { N_("dozen"), TM_NONE, 12, kw_set_number }, + { N_("hundred"), TM_NONE, 100, kw_set_number }, + + /* Special number forms. */ + { N_("this"), TM_NONE, 0, kw_set_number }, + { N_("last"), TM_NONE, 1, kw_set_number }, + + /* Other special keywords. */ + { N_("yesterday"), TM_REL_DAY, 1, kw_set_rel }, + { N_("today"), TM_NONE, 0, kw_set_today }, + { N_("now"), TM_NONE, 0, kw_set_now }, + { N_("noon"), TM_NONE, 12, kw_set_timeofday }, + { N_("midnight"), TM_NONE, 0, kw_set_timeofday }, + { N_("am"), TM_AMPM, 0, kw_set_ampm }, + { N_("a.m."), TM_AMPM, 0, kw_set_ampm }, + { N_("pm"), TM_AMPM, 1, kw_set_ampm }, + { N_("p.m."), TM_AMPM, 1, kw_set_ampm }, + { N_("st"), TM_NONE, 0, kw_set_ordinal }, + { N_("nd"), TM_NONE, 0, kw_set_ordinal }, + { N_("rd"), TM_NONE, 0, kw_set_ordinal }, + { N_("th"), TM_NONE, 0, kw_set_ordinal }, + + /* Timezone codes: offset in minutes. XXX: Add more codes. */ + { N_("pst"), TM_TZ, -8*60, NULL }, + { N_("mst"), TM_TZ, -7*60, NULL }, + { N_("cst"), TM_TZ, -6*60, NULL }, + { N_("est"), TM_TZ, -5*60, NULL }, + { N_("ast"), TM_TZ, -4*60, NULL }, + { N_("nst"), TM_TZ, -(3*60+30), NULL }, + + { N_("gmt"), TM_TZ, 0, NULL }, + { N_("utc"), TM_TZ, 0, NULL }, + + { N_("wet"), TM_TZ, 0, NULL }, + { N_("cet"), TM_TZ, 1*60, NULL }, + { N_("eet"), TM_TZ, 2*60, NULL }, + { N_("fet"), TM_TZ, 3*60, NULL }, + + { N_("wat"), TM_TZ, 1*60, NULL }, + { N_("cat"), TM_TZ, 2*60, NULL }, + { N_("eat"), TM_TZ, 3*60, NULL }, +}; + +/* + * Compare strings s and keyword. Return number of matching chars on + * match, 0 for no match. Match must be at least n chars, or all of + * keyword if n < 0, otherwise it's not a match. Use match_case for + * case sensitive matching. + */ +static size_t +stringcmp (const char *s, const char *keyword, ssize_t n, bool match_case) +{ + ssize_t i; + + if (!n) + return 0; + + for (i = 0; *s && *keyword; i++, s++, keyword++) { + if (match_case) { + if (*s != *keyword) + break; + } else { + if (tolower ((unsigned char) *s) != + tolower ((unsigned char) *keyword)) + break; + } + } + + if (n > 0) + return i < n ? 0 : i; + else + return *keyword ? 0 : i; +} + +/* + * Parse a keyword. Return < 0 on error, number of parsed chars on + * success. + */ +static ssize_t +parse_keyword (struct state *state, const char *s) +{ + unsigned int i; + size_t n, max_n = 0; + struct keyword *kw = NULL; + int r; + + /* Match longest keyword */ + for (i = 0; i < ARRAY_SIZE (keywords); i++) { + /* Match case if keyword begins with upper case letter. */ + bool mcase = isupper ((unsigned char) keywords[i].name[0]); + ssize_t minlen = -1; + char keyword[128]; + char *p; + + strncpy (keyword, _(keywords[i].name), sizeof (keyword)); + + /* Truncate too long keywords. XXX: Make this dynamic? */ + keyword[sizeof (keyword) - 1] = '\0'; + + /* Minimum match length. */ + p = strchr (keyword, '|'); + if (p) { + minlen = p - keyword; + + /* Remove the minimum match length separator. */ + memmove (p, p + 1, strlen (p + 1) + 1); + } + + n = stringcmp (s, keyword, minlen, mcase); + if (n > max_n || (n == max_n && mcase)) { + max_n = n; + kw = &keywords[i]; + } + } + + if (!kw) + return -PARSE_TIME_ERR_KEYWORD; + + if (kw->set) + r = kw->set (state, kw); + else + r = kw_set_default (state, kw); + + if (r < 0) + return r; + + return max_n; +} + +/* + * Non-keyword parsers and their helpers. + */ + +static int +set_user_tz (struct state *state, char sign, int hour, int min) +{ + int tz = hour * 60 + min; + + assert (sign == '+' || sign == '-'); + + if (hour < 0 || hour > 14 || min < 0 || min > 59 || min % 15) + return -PARSE_TIME_ERR_INVALIDTIME; + + if (sign == '-') + tz = -tz; + + return set_field (state, TM_TZ, tz); +} + +/* + * Independent parsing of a postponed number when it wasn't consumed + * during parsing of the following token. + */ +static int +parse_postponed_number (struct state *state, int v, int n, char d) +{ + /* + * alright, these are really lone, won't affect parsing of + * following items... it's not a multiplier, those have been eaten + * away. + * + * also note numbers eaten away by parse_single_number. + */ + + assert (n < 8); + + if (n == 1 || n == 2) { + /* Notable exception: Previous field affects parsing. This + * handles "January 20". */ + if (state->last_field == TM_ABS_MON) { + /* D[D] */ + if (!is_valid_mday (v)) + return -PARSE_TIME_ERR_INVALIDDATE; + + return set_field (state, TM_ABS_MDAY, v); + } else if (n == 2) { + /* XXX: Only allow if last field is hour, min, or sec? */ + if (d == '+' || d == '-') { + /* +/-HH */ + return set_user_tz (state, d, v, 0); + } + } + } else if (n == 4) { + /* Notable exception: Value affects parsing. Time zones are + * always at most 1400 and we don't understand years before + * 1970. */ + if (!is_valid_year (v)) { + if (d == '+' || d == '-') { + /* +/-HHMM */ + return set_user_tz (state, d, v / 100, v % 100); + } + } else { + /* YYYY */ + return set_field (state, TM_ABS_YEAR, v); + } + } else if (n == 6) { + /* HHMMSS */ + int hour = v / 10000; + int min = (v / 100) % 100; + int sec = v % 100; + + if (!is_valid_time (hour, min, sec)) + return -PARSE_TIME_ERR_INVALIDTIME; + + return set_abs_time (state, hour, min, sec); + } + + /* else n is one of {-1, 3, 5, 7 } */ + + return -PARSE_TIME_ERR_FORMAT; +} + +/* Parse a single number. Typically postpone parsing until later. */ +static int +parse_single_number (struct state *state, unsigned long v, + unsigned long n) +{ + assert (n); + + /* Parse things that can be parsed immediately. */ + if (n == 8) { + /* YYYYMMDD */ + int year = v / 10000; + int mon = (v / 100) % 100; + int mday = v % 100; + + if (!is_valid_date (year, mon, mday)) + return -PARSE_TIME_ERR_INVALIDDATE; + + return set_abs_date (state, year, mon, mday); + } else if (n > 8) { + /* XXX: Seconds since epoch. */ + return -PARSE_TIME_ERR_FORMAT; + } + + if (v > INT_MAX) + return -PARSE_TIME_ERR_FORMAT; + + return set_postponed_number (state, v, n); +} + +static bool +is_time_sep (char c) +{ + return c == ':'; +} + +static bool +is_date_sep (char c) +{ + return c == '/' || c == '-' || c == '.'; +} + +static bool +is_sep (char c) +{ + return is_time_sep (c) || is_date_sep (c); +} + +/* Two-digit year: 00...69 is 2000s, 70...99 1900s, if n == 0 keep + * unset. */ +static int +expand_year (unsigned long year, size_t n) +{ + if (n == 2) { + return (year < 70 ? 2000 : 1900) + year; + } else if (n == 4) { + return year; + } else { + return UNSET; + } +} + +/* Parse a date number triplet. */ +static int +parse_date (struct state *state, char sep, + unsigned long v1, unsigned long v2, unsigned long v3, + size_t n1, size_t n2, size_t n3) +{ + int year = UNSET, mon = UNSET, mday = UNSET; + + assert (is_date_sep (sep)); + + switch (sep) { + case '/': /* Date: M[M]/D[D][/YY[YY]] or M[M]/YYYY */ + if (n1 != 1 && n1 != 2) + return -PARSE_TIME_ERR_DATEFORMAT; + + if ((n2 == 1 || n2 == 2) && (n3 == 0 || n3 == 2 || n3 == 4)) { + /* M[M]/D[D][/YY[YY]] */ + year = expand_year (v3, n3); + mon = v1; + mday = v2; + } else if (n2 == 4 && n3 == 0) { + /* M[M]/YYYY */ + year = v2; + mon = v1; + } else { + return -PARSE_TIME_ERR_DATEFORMAT; + } + break; + + case '-': /* Date: YYYY-MM[-DD] or DD-MM[-YY[YY]] or MM-YYYY */ + if (n1 == 4 && n2 == 2 && (n3 == 0 || n3 == 2)) { + /* YYYY-MM[-DD] */ + year = v1; + mon = v2; + if (n3) + mday = v3; + } else if (n1 == 2 && n2 == 2 && (n3 == 0 || n3 == 2 || n3 == 4)) { + /* DD-MM[-YY[YY]] */ + year = expand_year (v3, n3); + mon = v2; + mday = v1; + } else if (n1 == 2 && n2 == 4 && n3 == 0) { + /* MM-YYYY */ + year = v2; + mon = v1; + } else { + return -PARSE_TIME_ERR_DATEFORMAT; + } + break; + + case '.': /* Date: D[D].M[M][.[YY[YY]]] */ + if ((n1 != 1 && n1 != 2) || (n2 != 1 && n2 != 2) || + (n3 != 0 && n3 != 2 && n3 != 4)) + return -PARSE_TIME_ERR_DATEFORMAT; + + year = expand_year (v3, n3); + mon = v2; + mday = v1; + break; + } + + if (year != UNSET && !is_valid_year (year)) + return -PARSE_TIME_ERR_INVALIDDATE; + + if (mon != UNSET && !is_valid_mon (mon)) + return -PARSE_TIME_ERR_INVALIDDATE; + + if (mday != UNSET && !is_valid_mday (mday)) + return -PARSE_TIME_ERR_INVALIDDATE; + + return set_abs_date (state, year, mon, mday); +} + +/* Parse a time number triplet. */ +static int +parse_time (struct state *state, char sep, + unsigned long v1, unsigned long v2, unsigned long v3, + size_t n1, size_t n2, size_t n3) +{ + assert (is_time_sep (sep)); + + if ((n1 != 1 && n1 != 2) || n2 != 2 || (n3 != 0 && n3 != 2)) + return -PARSE_TIME_ERR_TIMEFORMAT; + + /* + * Notable exception: Previously set fields affect + * parsing. Interpret (+|-)HH:MM as time zone only if hour and + * minute have been set. + * + * XXX: This could be fixed by restricting the delimiters + * preceding time. For '+' it would be justified, but for '-' it + * might be inconvenient. However prefer to allow '-' as an + * insignificant delimiter preceding time for convenience, and + * handle '+' the same way for consistency between positive and + * negative time zones. + */ + if (is_field_set (state, TM_ABS_HOUR) && + is_field_set (state, TM_ABS_MIN) && + n1 == 2 && n2 == 2 && n3 == 0 && + (state->delim == '+' || state->delim == '-')) { + return set_user_tz (state, state->delim, v1, v2); + } + + if (!is_valid_time (v1, v2, v3)) + return -PARSE_TIME_ERR_INVALIDTIME; + + return set_abs_time (state, v1, v2, n3 ? v3 : 0); +} + +/* strtoul helper that assigns length. */ +static unsigned long +strtoul_len (const char *s, const char **endp, size_t *len) +{ + unsigned long val = strtoul (s, (char **) endp, 10); + + *len = *endp - s; + return val; +} + +/* + * Parse a (group of) number(s). Return < 0 on error, number of parsed + * chars on success. + */ +static ssize_t +parse_number (struct state *state, const char *s) +{ + int r; + unsigned long v1, v2, v3 = 0; + size_t n1, n2, n3 = 0; + const char *p = s; + char sep; + + v1 = strtoul_len (p, &p, &n1); + + if (is_sep (*p) && isdigit ((unsigned char) *(p + 1))) { + sep = *p; + v2 = strtoul_len (p + 1, &p, &n2); + } else { + /* A single number. */ + r = parse_single_number (state, v1, n1); + if (r) + return r; + + return p - s; + } + + /* A group of two or three numbers? */ + if (*p == sep && isdigit ((unsigned char) *(p + 1))) + v3 = strtoul_len (p + 1, &p, &n3); + + if (is_time_sep (sep)) + r = parse_time (state, sep, v1, v2, v3, n1, n2, n3); + else + r = parse_date (state, sep, v1, v2, v3, n1, n2, n3); + + if (r) + return r; + + return p - s; +} + +/* + * Parse delimiter(s). Throw away all except the last one, which is + * stored for parsing the next non-delimiter. Return < 0 on error, + * number of parsed chars on success. + * + * XXX: We might want to be more strict here. + */ +static ssize_t +parse_delim (struct state *state, const char *s) +{ + const char *p = s; + + /* + * Skip non-alpha and non-digit, and store the last for further + * processing. + */ + while (*p && !isalnum ((unsigned char) *p)) { + set_delim (state, *p); + p++; + } + + return p - s; +} + +/* + * Parse a date/time string. Return < 0 on error, number of parsed + * chars on success. + */ +static ssize_t +parse_input (struct state *state, const char *s) +{ + const char *p = s; + ssize_t n; + int r; + + while (*p) { + if (isalpha ((unsigned char) *p)) { + n = parse_keyword (state, p); + } else if (isdigit ((unsigned char) *p)) { + n = parse_number (state, p); + } else { + n = parse_delim (state, p); + } + + if (n <= 0) { + if (n == 0) + n = -PARSE_TIME_ERR; + + return n; + } + + p += n; + } + + /* Parse postponed number, if any. */ + r = handle_postponed_number (state, TM_NONE); + if (r < 0) + return r; + + return p - s; +} + +/* + * Processing the parsed input. + */ + +/* + * Initialize reference time to tm. Use time zone in state if + * specified, otherwise local time. Use now for reference time if + * non-NULL, otherwise current time. + */ +static int +initialize_now (struct state *state, struct tm *tm, const time_t *now) +{ + time_t t; + + if (now) { + t = *now; + } else { + if (time (&t) == (time_t) -1) + return -PARSE_TIME_ERR_LIB; + } + + if (is_field_set (state, TM_TZ)) { + /* Some other time zone. */ + + /* Adjust now according to the TZ. */ + t += get_field (state, TM_TZ) * 60; + + /* It's not gm, but this doesn't mess with the TZ. */ + if (gmtime_r (&t, tm) == NULL) + return -PARSE_TIME_ERR_LIB; + } else { + /* Local time. */ + if (localtime_r (&t, tm) == NULL) + return -PARSE_TIME_ERR_LIB; + } + + return 0; +} + +/* + * Normalize tm according to mktime(3). Both mktime(3) and + * localtime_r(3) use local time, but they cancel each other out here, + * making this function agnostic to time zone. + */ +static int +normalize_tm (struct tm *tm) +{ + time_t t = mktime (tm); + + if (t == (time_t) -1) + return -PARSE_TIME_ERR_LIB; + + if (!localtime_r (&t, tm)) + return -PARSE_TIME_ERR_LIB; + + return 0; +} + +/* Get field out of a struct tm. */ +static int +tm_get_field (const struct tm *tm, enum field field) +{ + switch (field) { + case TM_ABS_SEC: return tm->tm_sec; + case TM_ABS_MIN: return tm->tm_min; + case TM_ABS_HOUR: return tm->tm_hour; + case TM_ABS_MDAY: return tm->tm_mday; + case TM_ABS_MON: return tm->tm_mon + 1; /* 0- to 1-based */ + case TM_ABS_YEAR: return 1900 + tm->tm_year; + case TM_ABS_WDAY: return tm->tm_wday; + case TM_ABS_ISDST: return tm->tm_isdst; + default: + assert (false); + break; + } + + return 0; +} + +/* Modify hour according to am/pm setting. */ +static int +fixup_ampm (struct state *state) +{ + int hour, hdiff = 0; + + if (!is_field_set (state, TM_AMPM)) + return 0; + + if (!is_field_set (state, TM_ABS_HOUR)) + return -PARSE_TIME_ERR_TIMEFORMAT; + + hour = get_field (state, TM_ABS_HOUR); + if (!is_valid_12hour (hour)) + return -PARSE_TIME_ERR_INVALIDTIME; + + if (get_field (state, TM_AMPM)) { + /* 12pm is noon. */ + if (hour != 12) + hdiff = 12; + } else { + /* 12am is midnight, beginning of day. */ + if (hour == 12) + hdiff = -12; + } + + mod_field (state, TM_REL_HOUR, -hdiff); + + return 0; +} + +/* Combine absolute and relative fields, and round. */ +static int +create_output (struct state *state, time_t *t_out, const time_t *tnow, + int round) +{ + struct tm tm = { .tm_isdst = -1 }; + struct tm now; + time_t t; + enum field f; + int r; + int week_round = PARSE_TIME_NO_ROUND; + + r = initialize_now (state, &now, tnow); + if (r) + return r; + + /* Initialize uninitialized fields to now. */ + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { + if (state->set[f] == FIELD_NOW) { + state->tm[f] = tm_get_field (&now, f); + state->set[f] = FIELD_SET; + } + } + + /* + * If MON is set but YEAR is not, refer to past month. + * + * XXX: Why are month/week special in this regard? What about + * mday, or time. Should refer to past. + */ + if (is_field_set (state, TM_ABS_MON) && + !is_field_set (state, TM_ABS_YEAR)) { + if (get_field (state, TM_ABS_MON) >= tm_get_field (&now, TM_ABS_MON)) + mod_field (state, TM_REL_YEAR, 1); + } + + /* + * If WDAY is set but MDAY is not, we consider WDAY relative + * + * XXX: This fails on stuff like "two months ago monday" because + * two months ago wasn't the same day as today. Postpone until we + * know date? + */ + if (is_field_set (state, TM_ABS_WDAY) && + !is_field_set (state, TM_ABS_MDAY)) { + int wday = get_field (state, TM_ABS_WDAY); + int today = tm_get_field (&now, TM_ABS_WDAY); + int rel_days; + + if (today > wday) + rel_days = today - wday; + else + rel_days = today + 7 - wday; + + /* This also prevents special week rounding from happening. */ + mod_field (state, TM_REL_DAY, rel_days); + + unset_field (state, TM_ABS_WDAY); + } + + r = fixup_ampm (state); + if (r) + return r; + + /* + * Iterate fields from most accurate to least accurate, and set + * unset fields according to requested rounding. + */ + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { + if (round != PARSE_TIME_NO_ROUND) { + enum field r = abs_to_rel_field (f); + + if (is_field_set (state, f) || is_field_set (state, r)) { + if (round >= PARSE_TIME_ROUND_UP) + mod_field (state, r, -1); + round = PARSE_TIME_NO_ROUND; /* No more rounding. */ + } else { + if (f == TM_ABS_MDAY && + is_field_set (state, TM_REL_WEEK)) { + /* Week is most accurate. */ + week_round = round; + round = PARSE_TIME_NO_ROUND; + } else { + set_field (state, f, field_epoch (f)); + } + } + } + + if (!is_field_set (state, f)) + set_field (state, f, tm_get_field (&now, f)); + } + + /* Special case: rounding with week accuracy. */ + if (week_round != PARSE_TIME_NO_ROUND) { + /* Temporarily set more accurate fields to now. */ + set_field (state, TM_ABS_SEC, tm_get_field (&now, TM_ABS_SEC)); + set_field (state, TM_ABS_MIN, tm_get_field (&now, TM_ABS_MIN)); + set_field (state, TM_ABS_HOUR, tm_get_field (&now, TM_ABS_HOUR)); + set_field (state, TM_ABS_MDAY, tm_get_field (&now, TM_ABS_MDAY)); + } + + /* + * Set all fields. They may contain out of range values before + * normalization by mktime(3). + */ + tm.tm_sec = get_field (state, TM_ABS_SEC) - get_field (state, TM_REL_SEC); + tm.tm_min = get_field (state, TM_ABS_MIN) - get_field (state, TM_REL_MIN); + tm.tm_hour = get_field (state, TM_ABS_HOUR) - get_field (state, TM_REL_HOUR); + tm.tm_mday = get_field (state, TM_ABS_MDAY) - + get_field (state, TM_REL_DAY) - 7 * get_field (state, TM_REL_WEEK); + tm.tm_mon = get_field (state, TM_ABS_MON) - get_field (state, TM_REL_MON); + tm.tm_mon--; /* 1- to 0-based */ + tm.tm_year = get_field (state, TM_ABS_YEAR) - get_field (state, TM_REL_YEAR) - 1900; + + /* + * It's always normal time. + * + * XXX: This is probably not a solution that universally + * works. Just make sure DST is not taken into account. We don't + * want rounding to be affected by DST. + */ + tm.tm_isdst = -1; + + /* Special case: rounding with week accuracy. */ + if (week_round != PARSE_TIME_NO_ROUND) { + /* Normalize to get proper tm.wday. */ + r = normalize_tm (&tm); + if (r < 0) + return r; + + /* Set more accurate fields back to zero. */ + tm.tm_sec = 0; + tm.tm_min = 0; + tm.tm_hour = 0; + tm.tm_isdst = -1; + + /* Monday is the true 1st day of week, but this is easier. */ + if (week_round <= PARSE_TIME_ROUND_DOWN) + tm.tm_mday -= tm.tm_wday; + else + tm.tm_mday += 7 - tm.tm_wday; + } + + if (is_field_set (state, TM_TZ)) { + /* tm is in specified TZ, convert to UTC for timegm(3). */ + tm.tm_min -= get_field (state, TM_TZ); + t = timegm (&tm); + } else { + /* tm is in local time. */ + t = mktime (&tm); + } + + if (t == (time_t) -1) + return -PARSE_TIME_ERR_LIB; + + *t_out = t; + + return 0; +} + +/* Internally, all errors are < 0. parse_time_string() returns errors > 0. */ +#define EXTERNAL_ERR(r) (-r) + +int +parse_time_string (const char *s, time_t *t, const time_t *now, int round) +{ + struct state state = { .last_field = TM_NONE }; + int r; + + if (!s || !t) + return EXTERNAL_ERR (-PARSE_TIME_ERR); + + r = parse_input (&state, s); + if (r < 0) + return EXTERNAL_ERR (r); + + r = create_output (&state, t, now, round); + if (r < 0) + return EXTERNAL_ERR (r); + + return 0; +} diff --git a/parse-time-string/parse-time-string.h b/parse-time-string/parse-time-string.h new file mode 100644 index 0000000..50b7c6f --- /dev/null +++ b/parse-time-string/parse-time-string.h @@ -0,0 +1,95 @@ +/* + * parse time string - user friendly date and time parser + * Copyright © 2012 Jani Nikula + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + * + * Author: Jani Nikula <jani@nikula.org> + */ + +#ifndef PARSE_TIME_STRING_H +#define PARSE_TIME_STRING_H + +#ifdef __cplusplus +extern "C" { +#endif + +#include <time.h> + +/* return values for parse_time_string() */ +enum { + PARSE_TIME_OK = 0, + PARSE_TIME_ERR, /* unspecified error */ + PARSE_TIME_ERR_LIB, /* library call failed */ + PARSE_TIME_ERR_ALREADYSET, /* attempt to set unit twice */ + PARSE_TIME_ERR_FORMAT, /* generic date/time format error */ + PARSE_TIME_ERR_DATEFORMAT, /* date format error */ + PARSE_TIME_ERR_TIMEFORMAT, /* time format error */ + PARSE_TIME_ERR_INVALIDDATE, /* date value error */ + PARSE_TIME_ERR_INVALIDTIME, /* time value error */ + PARSE_TIME_ERR_KEYWORD, /* unknown keyword */ +}; + +/* round values for parse_time_string() */ +enum { + PARSE_TIME_ROUND_DOWN = -1, + PARSE_TIME_NO_ROUND = 0, + PARSE_TIME_ROUND_UP = 1, +}; + +/** + * parse_time_string() - user friendly date and time parser + * @s: string to parse + * @t: pointer to time_t to store parsed time in + * @now: pointer to time_t containing reference date/time, or NULL + * @round: PARSE_TIME_NO_ROUND, PARSE_TIME_ROUND_DOWN, or + * PARSE_TIME_ROUND_UP + * + * Parse a date/time string 's' and store the parsed date/time result + * in 't'. + * + * A reference date/time is used for determining the "date/time units" + * (roughly equivalent to struct tm members) not specified by 's'. If + * 'now' is non-NULL, it must contain a pointer to a time_t to be used + * as reference date/time. Otherwise, the current time is used. + * + * If 's' does not specify a full date/time, the 'round' parameter + * specifies if and how the result should be rounded as follows: + * + * PARSE_TIME_NO_ROUND: All date/time units that are not specified + * by 's' are set to the corresponding unit derived from the + * reference date/time. + * + * PARSE_TIME_ROUND_DOWN: All date/time units that are more accurate + * than the most accurate unit specified by 's' are set to the + * smallest valid value for that unit. Rest of the unspecified units + * are set as in PARSE_TIME_NO_ROUND. + * + * PARSE_TIME_ROUND_UP: All date/time units that are more accurate + * than the most accurate unit specified by 's' are set to the + * smallest valid value for that unit. The most accurate unit + * specified by 's' is incremented by one (and this is rolled over + * to the less accurate units as necessary). Rest of the unspecified + * units are set as in PARSE_TIME_NO_ROUND. + * + * Return 0 (PARSE_TIME_OK) for succesfully parsed date/time, or one + * of PARSE_TIME_ERR_* on error. 't' is not modified on error. + */ +int parse_time_string (const char *s, time_t *t, const time_t *now, int round); + +#ifdef __cplusplus +} +#endif + +#endif /* PARSE_TIME_STRING_H */ -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch 2012-09-12 21:27 ` [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch Jani Nikula @ 2012-09-13 11:10 ` Michal Nazarewicz 2012-09-13 12:07 ` Jani Nikula 2012-09-13 12:48 ` Tomi Ollila 2012-09-25 11:56 ` Michal Sojka 1 sibling, 2 replies; 30+ messages in thread From: Michal Nazarewicz @ 2012-09-13 11:10 UTC (permalink / raw) To: Jani Nikula, notmuch, David Bremner [-- Attachment #1: Type: text/plain, Size: 707 bytes --] On Wed, Sep 12 2012, Jani Nikula <jani@nikula.org> wrote: > Add a date/time parser to notmuch, to be used for adding date range > query support for notmuch lib later on. Add the parser to a directory > of its own to make it independent of the rest of the notmuch code > base. > > Signed-off-by: Jani Nikula <jani@nikula.org> Have you consider doing the same in bison? I consider the code totally unreadable and unmaintainable. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz (o o) ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo-- [-- Attachment #2.1: Type: text/plain, Size: 0 bytes --] [-- Attachment #2.2: Type: application/pgp-signature, Size: 835 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch 2012-09-13 11:10 ` Michal Nazarewicz @ 2012-09-13 12:07 ` Jani Nikula 2012-09-17 14:13 ` Michal Nazarewicz 2012-09-13 12:48 ` Tomi Ollila 1 sibling, 1 reply; 30+ messages in thread From: Jani Nikula @ 2012-09-13 12:07 UTC (permalink / raw) To: Michal Nazarewicz, notmuch, David Bremner On Thu, 13 Sep 2012, Michal Nazarewicz <mina86@mina86.com> wrote: > On Wed, Sep 12 2012, Jani Nikula <jani@nikula.org> wrote: >> Add a date/time parser to notmuch, to be used for adding date range >> query support for notmuch lib later on. Add the parser to a directory >> of its own to make it independent of the rest of the notmuch code >> base. >> >> Signed-off-by: Jani Nikula <jani@nikula.org> > > Have you consider doing the same in bison? I consider the code totally > unreadable and unmaintainable. I do not think you could easily do everything that this parser does in bison. But then I'm not an expert in bison, and I have zero ambition to become one. So I'm biased, and I'm open about it. Even so, if you're suggesting doing this in bison would make this totally readable and maintainable, I urge you to have a good look at [1]. Note that it also does less in more lines of code. (And using it as-is in notmuch has pretty much been turned down in the past.) Finally, I also suggest you actually read and review the code, pointing out concrete issues in readability or maintainability that you see. Especially since an earlier version has received comment "[I]t looks very nice to me. It is pleasantly nice to read." [2]. What you're doing is worthless bikeshedding otherwise. BR, Jani. [1] http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob_plain;f=lib/parse-datetime.y [2] id:"87mx86dlul.fsf@qmul.ac.uk" ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch 2012-09-13 12:07 ` Jani Nikula @ 2012-09-17 14:13 ` Michal Nazarewicz 2012-09-17 15:54 ` Jani Nikula 0 siblings, 1 reply; 30+ messages in thread From: Michal Nazarewicz @ 2012-09-17 14:13 UTC (permalink / raw) To: Jani Nikula, notmuch, David Bremner [-- Attachment #1: Type: text/plain, Size: 27011 bytes --] > On Thu, 13 Sep 2012, Michal Nazarewicz <mina86@mina86.com> wrote: >> Have you consider doing the same in bison? I consider the code totally >> unreadable and unmaintainable. On Thu, Sep 13 2012, Jani Nikula wrote: > I do not think you could easily do everything that this parser does in > bison. But then I'm not an expert in bison, and I have zero ambition to > become one. So I'm biased, and I'm open about it. Bison can do a lot of weird stuff including modifying how lexer interpretes tokens even while parsing given grammar rule. > Even so, if you're suggesting doing this in bison would make this > totally readable and maintainable, I urge you to have a good look at > [1]. Note that it also does less in more lines of code. (And using it > as-is in notmuch has pretty much been turned down in the past.) > > Finally, I also suggest you actually read and review the code, pointing > out concrete issues in readability or maintainability that you > see. Especially since an earlier version has received comment "[I]t > looks very nice to me. It is pleasantly nice to read." [2]. What you're > doing is worthless bikeshedding otherwise. I'm sorry. I sometime tend to go into extremes with my statements, so yes, the “totally unreadable” was a over statement on my part. My point was however that parsing is a solved problem, and for non-trivial parsers one needs to ask herself whether it's worth trying to implement the logic, or maybe using a parser generator is just simpler. And in this particular case, my feeling is that bison is easier to read and modify. To add some merit to my statement, I attach a bison parser. It supports ranges as so: <date> the specific moment with duration dependent on specification. How duration is figured out is described in the next paragraph. <from>..<to> dates >= <from> and < <to>, so for instance “yesterday..0” days yields results from yesterday. ..<to> dates < <to> <from>.. dates >= <from> <from>++<dur> a shorthand of “<from>..<from> + <dur>”. This is useful for things like: “2012q1++2 quarters” which is equivalent to “2012/01/01..2012/07/01”, ie. the first two quarters of 2012. It supports specifications as: '@' <num> Raw timestamp. It's duration is one second. <num> (seconds | minutes | hours | days | weeks | fortnights) [ago] moves the date by given number of units in the future or in the past (if “ago” is given). <num> can be preceded by sign. This specification's duration is whatever unit was used, ie. one second, one minute, one hour, one day, one week or one fortnight. So “7 days ago” and “1 week ago” specify the same moment, but they hay different durations. <num> (months | quarters | years) [ago] Like above, but calendar months are used which do not always have the same length. If applying the offset ends up with a day of the month out of range, the day is capped to the last day of the month. yesterday Moves one day back. [*] Note that because of [*] this is not quivalent to “-1day”. YYYY/MM/DD YYYY-MM-DD MM-DD-YYYY DD Month YYYY Month DD YYYY Sets date accordingly. [*] “Month” is a human readable month name. Month [DD] [YYYY] If either day or year is missing, given component of the date is not changed. [*] Also, if day is missing, the duration is set to one month rather than one day (but see caveats described in [*]). YYYY q Q Sets date to the beginning of quarter Q, ie. “2012q2” is roughly the same as “2012/04/01”. [*] Sets duration to three moths but see caveats described in [*]. midnight | noon Sets time to 0:00:00 and 12:00:00 respectively. Has duration of 1 hour. HH:MM:SS [am | pm] HH:MM [am | pm] HH (am | pm) Sets time accordingly with the part that is not specified set to zero. Duration depends on how many components are missing, ie. “HH (am|pm)” has a duration of on hour, “HH:MM” has a duration of one minute and “HH:MM:SS” has a duration of one second. [*] Formats specifying the date will zero the time to midnight unless the time has already been specified (ie. “yesterday” is roughly the same as “yesterday midnight”, but “noon yesterday” still keeps time as noon. Also, if the time has not been specified, those formats will set duration to one day (with two exception), so “yesterday” has a duration of one day, but “yesterday midnight”, even though it specifies the same moment's beginning, has a duration of one hour. Purposly, I have not added support for MM/DD/YYYY or DD/MM/YYYY as well as two-digit years. I feel this would only add confusion. --- .gitignore | 3 + Makefile | 17 ++ date-parser-grammar.y | 173 ++++++++++++++++++ date-parser.c | 476 +++++++++++++++++++++++++++++++++++++++++++++++++ date-parser.h | 59 ++++++ test.c | 44 +++++ 6 files changed, 772 insertions(+), 0 deletions(-) create mode 100644 .gitignore create mode 100644 Makefile create mode 100644 date-parser-grammar.y create mode 100644 date-parser.c create mode 100644 date-parser.h create mode 100644 test.c diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..b73c782 --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +test +*.o +date-parser-grammar.tab.* diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..8d95f71 --- /dev/null +++ b/Makefile @@ -0,0 +1,17 @@ +CFLAGS += -std=c99 -Wextra -Werror -pedantic + +test: test.o date-parser.o date-parser-grammar.tab.o +test.o: test.c date-parser.h +date-parser.o: date-parser.c date-parser.h +date-parser.o: date-parser-grammar.tab.h + +date-parser-grammar.tab.c: date-parser-grammar.y + bison $< + +date-parser-grammar.tab.h: date-parser-grammar.tab.c +date-parser-grammar.tab.o: date-parser-grammar.tab.c date-parser.h +date-parser-grammar.tab.o: CPPFLAGS += -Wno-unreachable-code + +clean: + rm -f date-parser-grammar.output date-parser-grammar.tab.* \ + *.o test diff --git a/date-parser-grammar.y b/date-parser-grammar.y new file mode 100644 index 0000000..38ddfca --- /dev/null +++ b/date-parser-grammar.y @@ -0,0 +1,173 @@ +/* Date parser bison grammar file + * Copyright (c) 2012 Google Inc. + * Written by Michal Nazarewicz <mina86@mina86.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/ . */ + +%code requires { + +#ifndef YYSTYPE +# define YYSTYPE long +#endif +#ifndef YYLTYPE +# define YYLTYPE struct yylocation +#endif + +#ifdef YYLLOC_DEFAULT +# undef YYLLOC_DEFAULT +#endif +#define YYLLOC_DEFAULT(Cur, Rhs, N) do { \ + if (N) { \ + (Cur).start = YYRHSLOC(Rhs, 1).start; \ + (Cur).end = YYRHSLOC(Rhs, N).end; \ + } else { \ + (Cur) = YYRHSLOC(Rhs, 0); \ + } \ +} while (0) +} + +%code{ +#include "date-parser-grammar.tab.h" +#include "date-parser.h" + +#define ASSERT(cond, loc, message) do { \ + if (!(cond)) { \ + parse_date_print_error(&loc, message); \ + YYERROR; \ + } \ +} while (0) +} + +%locations +%defines +%error-verbose +%define api.pure + +%parse-param {struct date *ret} +%parse-param {const char **inputp} +%lex-param {const char **inputp} + +%token T_NUMBER "<num>" /* Always positive. */ +%token T_NUMBER_4 "####" /* Four digit number. */ +%token T_AGO "ago" +/* Also used for minutes, hours, days and weeks. */ +%token T_SECONDS "seconds" +/* Also used for quarters and years */ +%token T_MONTHS "months" +%token T_YESTERDAY "yesterday" +%token T_AMPM "am/pm" +%token T_HOUR "<hour>" +%token T_MONTH "<month>" + +%expect 3 /* Two shift/reduce conflicts caused by year_maybe, and onde + * caused by day_maybe. */ + +%% + /* For backwards compatibility, just a number and nothing else + * is treated as timestamp */ +input : number { date_set_from_stamp(ret, $1) } + | date + ; + +date : part + | date part + ; + +part : integer "seconds" ago_maybe { + ASSERT(date_add_seconds(ret, $1 * $3, $2), @$, + "offset ends up in date out of range") + } + | integer "months" ago_maybe { + ASSERT(date_add_months(ret, $1 * $3, $2), @$, + "offset ends up in date out of range") + } + | "yesterday" { + ASSERT(date_set_yesterday(ret), @$, + "offset ends up in date out of range") + } + + | '@' number { date_set_from_stamp(ret, $2) } + | "<hour>" { date_set_time(ret, $1, -1, -1, -1) } + + /* HH:MM, HH:MM:SS, HH:MM am/pm, HH:MM:SS am/pm */ + | number ':' number seconds_maybe ampm_maybe { + ASSERT(date_set_time(ret, $1, $3, $4, $5), @$, + "invalid time") + } + + | number "am/pm" { /* HH am/pm */ + ASSERT(date_set_time(ret, $1, -1, -1, $2), @$, "invalid hour") + } + + | "####" '/' "<num>" '/' "<num>" { /* YYYY/MM/DD */ + ASSERT(date_set_date(ret, $1, $3, $5), @$, "invalid date") + } + | "####" '-' "<num>" '-' "<num>" { /* YYYY-MM-DD */ + ASSERT(date_set_date(ret, $1, $3, $5), @$, "invalid date") + } + | "<num>" '-' "<num>" '-' "####" { /* DD-MM-YYYY */ + ASSERT(date_set_date(ret, $5, $3, $1), @$, "invalid date") + } + /* No MM/DD/YYYY or DD/MM/YYYY because it's confusing. */ + + | "<num>" "<month>" year_maybe { /* 1 January 2012 */ + ASSERT(date_set_date(ret, $3, $2, $1), @$, "invalid date") + } + | "<month>" day_maybe year_maybe { /* January 1 2012 */ + ASSERT(date_set_date(ret, $3, $1, $2), @$, "invalid date") + } + + | "####" 'q' "<num>" { /* Quarter, 2012q1 */ + ASSERT(date_set_quarter(ret, $1, $3), @$, "invalid quarter"); + } + ; + +number : "<num>" { $$ = $1 } + | "####" { $$ = $1 } + ; + +integer : number { $$ = $1 } + | '-' number { $$ = -$2 } + ; + +ago_maybe + : /* empty */ { $$ = 1 } + | "ago" { $$ = -1 } + ; + +seconds_maybe + : /* empty */ { $$ = -1 } + | ':' "<num>" { $$ = $2 } + ; + +ampm_maybe + : /* empty */ { $$ = -1 } + | "am/pm" { $$ = $$ } + /* For people who like writing "a.m." or "p.m." and since dot + * is ignored by the lexer (ie. it's treated just like white + * space), dot is lost. */ + | 'a' 'm' { $$ = 0 } + | 'p' 'm' { $$ = 0 } + ; + +day_maybe + : /* empty */ { $$ = -1 } + | "<num>" { $$ = $1 } + ; + +year_maybe + : /* empty */ { $$ = -1 } + | "####" { $$ = $1 } + ; +%% diff --git a/date-parser.c b/date-parser.c new file mode 100644 index 0000000..c1701bd --- /dev/null +++ b/date-parser.c @@ -0,0 +1,476 @@ +/* Date parser implementation file. + * Copyright (c) 2012 Google Inc. + * Written by Michal Nazarewicz <mina86@mina86.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/ . */ + +#define _POSIX_C_SOURCE 1 + +#include "date-parser.h" +#include "date-parser-grammar.tab.h" + +#include <ctype.h> +#include <errno.h> +#include <stdlib.h> +#include <string.h> +#include <strings.h> + + +static bool is_valid_year(int year) { + /* TODO: Get the actual time_t range. */ + return year >= 1970 && year < 2037; +} + + +/***************************** Basic date helpers ***************************/ + +static int days_in_months[] = { + 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 +}; + +static inline int is_leap(int year) { + return year % 4 == 0 && (year % 100 || year % 400 == 0); +} + +static inline int days_in_month(int year, int month) { + return days_in_months[month] + (month == 1 ? is_leap(year) : 0); +} + +static inline int days_in_year(int year) { + return 365 + is_leap(year); +} + +static inline int min(int a, int b) { + return a < b ? a : b; +} + + +/****************************** Date manipulation ***************************/ + +struct date { + struct tm tm; + int dur_sec, dur_mon, has_time; +}; + + +void date_set_from_stamp(struct date *ret, long stamp) { + time_t t = stamp; + localtime_r(&t, &ret->tm); + ret->dur_sec = 1; + ret->dur_mon = 0; + ret->has_time = 1; +} + +static void date_set_to_now(struct date *ret) { + time_t t = time(NULL); + localtime_r(&t, &ret->tm); + ret->dur_sec = 1; + ret->dur_mon = 0; + ret->has_time = 0; +} + +static void date_zero_time(struct date *ret, int dur_sec, int dur_mon) { + if (!ret->has_time) { + ret->tm.tm_hour = ret->tm.tm_min = ret->tm.tm_sec = 0; + ret->dur_sec = dur_sec; + ret->dur_mon = dur_mon; + } +} + +bool date_add_seconds(struct date *ret, long num, long unit) { + time_t t = mktime(&ret->tm) + num * unit; + localtime_r(&t, &ret->tm); + ret->dur_sec = unit; + ret->dur_mon = 0; + return true; /* TODO add validation */ +} + +bool date_set_yesterday(struct date *ret) { + if (ret->tm.tm_mday != 1) { + --ret->tm.tm_mday; + } else if (ret->tm.tm_mon) { + --ret->tm.tm_mon; + ret->tm.tm_mday = days_in_month(ret->tm.tm_year + 1900, + ret->tm.tm_mon); + } else if (is_valid_year(1900 + ret->tm.tm_year - 1)) { + --ret->tm.tm_year; + ret->tm.tm_mon = 11; + ret->tm.tm_mday = 31; + } else { + return false; + } + date_zero_time(ret, 24 * 3600, 0); + ret->tm.tm_isdst = -1; + return true; +} + +bool date_add_months(struct date *ret, long num, long unit) { + long y; + + y = ret->tm.tm_year + 1900; + num = num * unit + ret->tm.tm_mon; + if (num < 0) { + y -= -num / 12; + num = 11 - (-num % 12); + } else { + y += num / 12; + num %= 12; + } + if (!is_valid_year(y)) { + return false; + } + + ret->tm.tm_year = y - 1900; + ret->tm.tm_mon = num; + ret->tm.tm_mday = min(ret->tm.tm_mday, + days_in_month(ret->tm.tm_year + 1900, + ret->tm.tm_mon)); + if (!ret->has_time) { + ret->dur_sec = 0; + ret->dur_mon = unit; + } + ret->tm.tm_isdst = -1; + return true; +} + +bool date_set_time(struct date *ret, long h, long m, long s, int ampm) { + if (m > 60 || s > 60 || h > 23) { + return false; + } + + if (ampm != -1) { + if (!h || h > 12) { + return false; + } + if (h != 12) { + h += ampm * 12; + } else if (ampm) { /* 12 pm */ + h = 12; + } else { + /* 12 am is 0 the next day, so adjust date */ + date_add_seconds(ret, 1, 24 * 3600); + h = 0; + } + } + + if (m == -1) { + ret->dur_sec = 3600; + m = s = 0; + } else if (s == -1) { + ret->dur_sec = 60; + s = 0; + } else { + ret->dur_sec = 1; + } + ret->dur_mon = 0; + + ret->tm.tm_hour = h; + ret->tm.tm_min = m; + ret->tm.tm_sec = s; + ret->tm.tm_isdst = -1; + + ret->has_time = 1; + return true; +} + +bool date_set_date(struct date *ret, long y, long m, long d) { + if (y == -1) { + y = ret->tm.tm_year + 1900; + } else if (!is_valid_year(y)) { + return false; + } + if (m < 1 || m > 12 || + (d != -1 && (d < 1 || d > days_in_month(y, m)))) { + return false; + } + ret->tm.tm_year = y - 1900; + ret->tm.tm_mon = m - 1; + ret->tm.tm_mday = d == -1 ? 1 : d; + if (d == -1) { + date_zero_time(ret, 0, 1); + } else { + date_zero_time(ret, 24 * 3600, 0); + } + ret->tm.tm_isdst = -1; + return true; +} + +bool date_set_quarter(struct date *ret, long y, long q) { + if (!is_valid_year(y) || q < 1 || q > 4) { + return false; + } + ret->tm.tm_year = y - 1900; + ret->tm.tm_mon = (q - 1) * 3; + ret->tm.tm_mday = 1; + date_zero_time(ret, 0, 3); + ret->tm.tm_isdst = -1; + return true; +} + + +#define TOKEN(str, token, num) { str, sizeof(str) - 1, token, num } +#define ABBR(len) { NULL, len, 0, 0 } + +static struct token { + const char *str; + size_t len; + int token; + long num; +} tokens_array[] = { + TOKEN("ago", T_AGO, 0), + + TOKEN("am", T_AMPM, 0), + TOKEN("pm", T_AMPM, 1), + + TOKEN("seconds", T_SECONDS, 1), ABBR(3), ABBR(6), + TOKEN("minutes", T_SECONDS, 60), ABBR(3), ABBR(6), + TOKEN("hours", T_SECONDS, 3600), ABBR(4), ABBR(1), + TOKEN("days", T_SECONDS, 24 * 3600), ABBR(3), ABBR(1), + TOKEN("weeks", T_SECONDS, 7 * 24 * 3600), ABBR(4), + TOKEN("fortnights", T_SECONDS, 14 * 24 * 3600), ABBR(9), + + TOKEN("months", T_MONTHS, 1), ABBR(5), + TOKEN("quarters", T_MONTHS, 3), ABBR(7), + TOKEN("years", T_MONTHS, 12), ABBR(4), + + TOKEN("yesterday", T_YESTERDAY, 0), + + TOKEN("midnight", T_HOUR, 0), + TOKEN("noon", T_HOUR, 12), + + TOKEN("january", T_MONTH, 1), ABBR(3), + TOKEN("february", T_MONTH, 2), ABBR(3), + TOKEN("march", T_MONTH, 3), ABBR(3), + TOKEN("april", T_MONTH, 4), ABBR(3), + TOKEN("may", T_MONTH, 5), + TOKEN("june", T_MONTH, 6), ABBR(3), + TOKEN("july", T_MONTH, 7), ABBR(3), + TOKEN("august", T_MONTH, 8), ABBR(3), + TOKEN("september", T_MONTH, 9), ABBR(4), ABBR(3), + TOKEN("october", T_MONTH, 10), ABBR(3), + TOKEN("november", T_MONTH, 11), ABBR(3), + TOKEN("december", T_MONTH, 12), ABBR(3), + + { NULL, 0, 0, 0 }, +}; + +#undef TOKEN +#undef ABBR + + +static struct token locale_tokens_array[2*12 + 1]; +static bool locale_tokens_populated = false; + +static void populate_locale_tokens(void) { + static const char *mon_formats[] = { "%b", "%B" }; + static char locale_buffer[1024]; + + char *buf = locale_buffer, *end = buf + sizeof(locale_buffer); + struct token *out = locale_tokens_array; + struct tm tm; + unsigned i; + + tm.tm_sec = 0; + tm.tm_min = 0; + tm.tm_hour = 0; + tm.tm_mday = 10; + tm.tm_year = 100; + tm.tm_isdst = 0; + + for (tm.tm_mon = 0; tm.tm_mon < 12; ++tm.tm_mon) { + for (i = 0; i < 2; ++i) { + out->len = strftime(buf, end - buf, + mon_formats[i], &tm); + if (!out->len) { + continue; + } + out->str = buf; + buf += out->len; + out->token = T_MONTH; + out->num = tm.tm_mon + 1; + ++out; + } + } + + out->len = 0; +} + +static const struct token *find_token(const struct token *tk, + const char *str, size_t len) { + const struct token *ret; + + for (; tk->len; ++tk) { + if (tk->str) { + ret = tk; + } + if (tk->len == len && !strncasecmp(str, ret->str, len)) { + return ret; + } + } + + return NULL; +} + + +/* Treat '_' and '.' as white space so that people don't have to quote + * the argument when specifying it on command line. */ +#define SKIP_WHITE_SPACE(ch) do { \ + while (isspace(*ch) || *ch == '_' || *ch == '.') { \ + ++ch; \ + } \ +} while (0) + + +int yylex(YYSTYPE *valp, struct yylocation *loc, const char **inputp) { + const char *ch = *inputp, *str; + const struct token *tk; + + SKIP_WHITE_SPACE(ch); + + /* End of data */ + if (*ch == 0) { + return EOF; + } + + loc->start = ch; + + /* Parse number */ + if (isdigit(*ch)) { + errno = 0; + *valp = strtol(ch, (char**)&ch, 10); + + loc->end = ch; + *inputp = ch; + + if (errno) { + parse_date_print_error(loc, "number out of range"); + return 256; + } + return ch - loc->start == 4 ? T_NUMBER_4 : T_NUMBER; + } + + if (!isalpha(*ch)) { + *inputp = ch + 1; + loc->end = ch + 1; + return *ch; + } + + /* So it's a string token. */ + str = ch; + while (isalpha(*ch)) { + ++ch; + } + loc->end = ch; + *inputp = ch; + + tk = find_token(tokens_array, str, ch - str); + if (tk) { + *valp = tk->num; + return tk->token; + } + + /* Let's try with locale strings. */ + if (!locale_tokens_populated) { + populate_locale_tokens(); + locale_tokens_populated = true; + } + tk = find_token(locale_tokens_array, str, ch - str); + if (tk) { + *valp = tk->num; + return tk->token; + } + + /* If it's just one letter, return it converted to lower case */ + if (ch - str == 1) { + return tolower(*str); + } + + parse_date_print_error(loc, "unrecognised token"); + return 256; +} + + +/**************************** Parsing interface *****************************/ + +static bool parse_date(struct date *ret, const char *from, char *to) { + char tmp; + int res; + if (to) { + tmp = *to; + *to = '\0'; + } + res = yyparse(ret, &from); + if (to) { + *to = tmp; + } + return res == 0; +} + +bool parse_range(char *arg, time_t *from, time_t *to) { + bool left = false, right = false; + struct date a, b; + char *dd, *pp; + + SKIP_WHITE_SPACE(arg); + if (!*arg) { + fprintf(stderr, "empty range argument\n"); + return false; + } + + dd = strstr(arg, ".."); + pp = strstr(arg, "++"); + if ((dd && pp) || + (dd && strstr(dd + 2, "..")) || + (pp && strstr(pp + 2, "++"))) { + fprintf(stderr, + "%s: at most one of '..' or '++' can be used\n", arg); + return false; + } + + if (dd || pp) { + char *ch = dd ? dd : pp; + left = ch != arg; + SKIP_WHITE_SPACE(ch); + right = *ch; + } + if (pp && (!right || !left)) { + fprintf(stderr, + "%s: '++' requires expression on both sides\n", arg); + return false; + } + + if (left || !right) { /* date:<date>.. or date:<date> */ + date_set_to_now(&a); + if (!parse_date(&a, arg, dd ? dd : pp)) { + return false; + } + } + if (right) { /* date:<date>..<date> or date:..<date> */ + if (pp) { + b = a; + } else { + date_set_to_now(&b); + } + if (!parse_date(&b, (dd ? dd : pp) + 2, NULL)) { + return false; + } + } else if (!left) { /* date:date */ + left = right = true; /* convert to date:<date>..<date> */ + b = a; + if ((b.dur_sec && !date_add_seconds(&b, b.dur_sec, 1)) || + (b.dur_mon && !date_add_months(&b, b.dur_mon, 1))) { + right = false; /* convert to date:<date>.. */ + } + } + + *from = left ? mktime(&a.tm) : 0; + *to = right ? mktime(&b.tm) : (time_t)((unsigned long)~(time_t)0 >> 1); + return true; +} diff --git a/date-parser.h b/date-parser.h new file mode 100644 index 0000000..fb0f19b --- /dev/null +++ b/date-parser.h @@ -0,0 +1,59 @@ +/* Date parser header file. + * Copyright (c) 2012 Google Inc. + * Written by Michal Nazarewicz <mina86@mina86.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/ . */ + +#ifndef H_DATE_PARSER_H +#define H_DATE_PARSER_H + +#include <stdbool.h> +#include <stdio.h> +#include <time.h> + +bool parse_range(char *arg, time_t *from, time_t *to); + +/* For parser */ +struct date; + +struct yylocation { + const char *start, *end; +}; + +static inline void parse_date_print_error(struct yylocation *loc, + const char *message) { + fprintf(stderr, "%.*s: %s\n", + (int)(loc->end - loc->start), loc->start, message); +} + +static inline int yyerror(struct yylocation *loc, struct date *ret, + const char **inputp, const char *message) { + ret = ret; /* make compiler happy */ + inputp = inputp; + parse_date_print_error(loc, message); + return 0; +} + +int yylex(long *valp, struct yylocation *loc, const char **inputp); +int yyparse(struct date *ret, const char **inputp); + +void date_set_from_stamp(struct date *ret, long stamp); +bool date_add_seconds(struct date *ret, long num, long unit); +bool date_set_yesterday(struct date *ret); +bool date_add_months(struct date *ret, long num, long unit); +bool date_set_time(struct date *ret, long h, long m, long s, int ampm); +bool date_set_date(struct date *ret, long y, long m, long d); +bool date_set_quarter(struct date *ret, long y, long q); + +#endif diff --git a/test.c b/test.c new file mode 100644 index 0000000..c4e2d9c --- /dev/null +++ b/test.c @@ -0,0 +1,44 @@ +/* Date parser testing application. + * Copyright (c) 2012 Google Inc. + * Written by Michal Nazarewicz <mina86@mina86.com> + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 3 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/ . */ + +#define _POSIX_C_SOURCE 1 + +#include <locale.h> +#include <stdio.h> +#include <time.h> + +#include "date-parser.h" + +int main(void) { + char buf[1024], *ch; + time_t from, to; + struct tm tm; + + setlocale(LC_ALL, ""); + + while (fgets(buf, sizeof buf, stdin)) { + if (parse_range(buf, &from, &to)) { + localtime_r(&from, &tm); + ch = buf + strftime(buf, sizeof buf / 2, + "[%Y/%m/%d %H:%M:%S %Z, ", &tm); + localtime_r(&to, &tm); + strftime(ch, sizeof buf / 2, + "%Y/%m/%d %H:%M:%S %Z)\n", &tm); + fputs(buf, stdout); + } + } +} -- 1.7.7.3 -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz (o o) ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo-- [-- Attachment #2.1: Type: text/plain, Size: 0 bytes --] [-- Attachment #2.2: Type: application/pgp-signature, Size: 835 bytes --] ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch 2012-09-17 14:13 ` Michal Nazarewicz @ 2012-09-17 15:54 ` Jani Nikula 0 siblings, 0 replies; 30+ messages in thread From: Jani Nikula @ 2012-09-17 15:54 UTC (permalink / raw) To: Michal Nazarewicz, notmuch, David Bremner On Mon, 17 Sep 2012, Michal Nazarewicz <mina86@mina86.com> wrote: > Bison can do a lot of weird stuff including modifying how lexer > interpretes tokens even while parsing given grammar rule. I'll just have to take your word for it. > I'm sorry. I sometime tend to go into extremes with my statements, so > yes, the “totally unreadable” was a over statement on my part. Okay. > My point was however that parsing is a solved problem, and for > non-trivial parsers one needs to ask herself whether it's worth trying > to implement the logic, or maybe using a parser generator is just > simpler. If there's something non-trivial about parsing dates and times, it's not the actual parsing. It's not a well defined grammar. And judging by the number of date parsers available that would be suitable for notmuch, as a library, parsing dates is not a solved problem. > And in this particular case, my feeling is that bison is easier to read > and modify. > > To add some merit to my statement, I attach a bison parser. And there are people out there writing compilers without a parser generator... [1] But using a parser generator or not is really not the issue here, whichever has more merit. The issue is adding suitable date range queries to notmuch. I have submitted patches to do just that. I don't intend to rework them much anymore. It's always interesting in the beginning, but adding the last bits of polish and fixing the corner cases can get a bit tedious. You know how it is. At this point, I'd just like to have the feature in notmuch. So I don't know. I guess I want to say, please submit patches to add the feature, with polish, and not to prove a point, damnit. I have no issues with promoting whichever approach is the best in the end. BR, Jani. [1] http://git.kernel.org/?p=devel/sparse/sparse.git ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch 2012-09-13 11:10 ` Michal Nazarewicz 2012-09-13 12:07 ` Jani Nikula @ 2012-09-13 12:48 ` Tomi Ollila 1 sibling, 0 replies; 30+ messages in thread From: Tomi Ollila @ 2012-09-13 12:48 UTC (permalink / raw) To: Michal Nazarewicz, Jani Nikula, notmuch On Thu, Sep 13 2012, Michal Nazarewicz <mina86@mina86.com> wrote: > On Wed, Sep 12 2012, Jani Nikula <jani@nikula.org> wrote: >> Add a date/time parser to notmuch, to be used for adding date range >> query support for notmuch lib later on. Add the parser to a directory >> of its own to make it independent of the rest of the notmuch code >> base. >> >> Signed-off-by: Jani Nikula <jani@nikula.org> > > Have you consider doing the same in bison? I consider the code totally > unreadable and unmaintainable. Well, I don't find this code 'unreadable', perhaps it depends how you define it ;) This functionality has been in 'work-in-progress' for a long time. I think the interface how this (separate library) is hooked to notmuch is sane. Also, my 'hunch' is that this is maintainable enough (and Jani will do all expected 'zero' maintenaince issues that will appear). If, in the future, there is enough desire for internationalization then Someone(TM) may see the way to do it. But, for the time being I applause the hard work Jani has put into this library and as it is much better what we have now (i.e. nothing) I am going to start testing this (and drop my current wrapper which I've been using to search using relative times)... It might be interesting to see this implemented in Bison but just for the sake of it I don't think it is worth of it. Maybe, someday..., Someone Else(tm) does that. The interface should allow to make drop-in replacement for this parser... Tomi > > -- > Best regards, _ _ > .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o > ..o | Computer Science, Michał “mina86” Nazarewicz (o o) > ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--_______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch 2012-09-12 21:27 ` [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch Jani Nikula 2012-09-13 11:10 ` Michal Nazarewicz @ 2012-09-25 11:56 ` Michal Sojka 2012-10-03 18:49 ` Jani Nikula 1 sibling, 1 reply; 30+ messages in thread From: Michal Sojka @ 2012-09-25 11:56 UTC (permalink / raw) To: Jani Nikula, notmuch, David Bremner Hello Jani, On Wed, Sep 12 2012, Jani Nikula wrote: > Add a date/time parser to notmuch, to be used for adding date range > query support for notmuch lib later on. Add the parser to a directory > of its own to make it independent of the rest of the notmuch code > base. First of all, thank you very much for pushing this towards mainline. This is definitely one of the features I miss in notmuch most. Some comments below. > > Signed-off-by: Jani Nikula <jani@nikula.org> > --- > Makefile | 2 +- > parse-time-string/Makefile | 5 + > parse-time-string/Makefile.local | 12 + > parse-time-string/README | 9 + > parse-time-string/parse-time-string.c | 1484 +++++++++++++++++++++++++++++++++ > parse-time-string/parse-time-string.h | 95 +++ > 6 files changed, 1606 insertions(+), 1 deletion(-) > create mode 100644 parse-time-string/Makefile > create mode 100644 parse-time-string/Makefile.local > create mode 100644 parse-time-string/README > create mode 100644 parse-time-string/parse-time-string.c > create mode 100644 parse-time-string/parse-time-string.h > > diff --git a/Makefile b/Makefile > index e5e2e3a..bb9c316 100644 > --- a/Makefile > +++ b/Makefile > @@ -3,7 +3,7 @@ > all: > > # List all subdirectories here. Each contains its own Makefile.local > -subdirs = compat completion emacs lib man util test > +subdirs = compat completion emacs lib man parse-time-string util test > > # We make all targets depend on the Makefiles themselves. > global_deps = Makefile Makefile.config Makefile.local \ > diff --git a/parse-time-string/Makefile b/parse-time-string/Makefile > new file mode 100644 > index 0000000..fa25832 > --- /dev/null > +++ b/parse-time-string/Makefile > @@ -0,0 +1,5 @@ > +all: > + $(MAKE) -C .. all > + > +.DEFAULT: > + $(MAKE) -C .. $@ > diff --git a/parse-time-string/Makefile.local b/parse-time-string/Makefile.local > new file mode 100644 > index 0000000..53534f3 > --- /dev/null > +++ b/parse-time-string/Makefile.local > @@ -0,0 +1,12 @@ > +dir := parse-time-string > +extra_cflags += -I$(srcdir)/$(dir) > + > +libparse-time-string_c_srcs := $(dir)/parse-time-string.c > + > +libparse-time-string_modules := $(libparse-time-string_c_srcs:.c=.o) > + > +$(dir)/libparse-time-string.a: $(libparse-time-string_modules) > + $(call quiet,AR) rcs $@ $^ > + > +SRCS := $(SRCS) $(libparse-time-string_c_srcs) > +CLEAN := $(CLEAN) $(libparse-time-string_modules) $(dir)/libparse-time-string.a > diff --git a/parse-time-string/README b/parse-time-string/README > new file mode 100644 > index 0000000..300ff1f > --- /dev/null > +++ b/parse-time-string/README > @@ -0,0 +1,9 @@ > +PARSE TIME STRING > +================= > + > +parse_time_string() is a date/time parser originally written for > +notmuch by Jani Nikula <jani@nikula.org>. However, there is nothing > +notmuch specific in it, and it should be kept reusable for other > +projects, and ready to be packaged on its own as needed. Please do not > +add dependencies on or references to anything notmuch specific. The > +parser should only depend on the C library. > diff --git a/parse-time-string/parse-time-string.c b/parse-time-string/parse-time-string.c > new file mode 100644 > index 0000000..15cf686 > --- /dev/null > +++ b/parse-time-string/parse-time-string.c > @@ -0,0 +1,1484 @@ > +/* > + * parse time string - user friendly date and time parser > + * Copyright © 2012 Jani Nikula > + * > + * This program is free software: you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see <http://www.gnu.org/licenses/>. > + * > + * Author: Jani Nikula <jani@nikula.org> > + */ > + > +#include <assert.h> > +#include <ctype.h> > +#include <errno.h> > +#include <limits.h> > +#include <stdio.h> > +#include <stdarg.h> > +#include <stdbool.h> > +#include <stdlib.h> > +#include <string.h> > +#include <strings.h> > +#include <time.h> > +#include <sys/time.h> > +#include <sys/types.h> > + > +#include "parse-time-string.h" > + > +/* > + * IMPLEMENTATION DETAILS > + * > + * At a high level, the parsing is done in two phases: 1) actual > + * parsing of the input string and storing the parsed data into > + * 'struct state', and 2) processing of the data in 'struct state' > + * according to current time (or provided reference time) and > + * rounding. This is evident in the main entry point function > + * parse_time_string(). > + * > + * 1) The parsing phase - parse_input() > + * > + * Parsing is greedy and happens from left to right. The parsing is as > + * unambiguous as possible; only unambiguous date/time formats are > + * accepted. Redundant or contradictory absolute date/time in the > + * input (e.g. date specified multiple times/ways) is not > + * accepted. Relative date/time on the other hand just accumulates if > + * present multiple times (e.g. "5 days 5 days" just turns into 10 > + * days). > + * > + * Parsing decisions are made on the input format, not value. For > + * example, "20/5/2005" fails because the recognized format here is > + * MM/D/YYYY, even though the values would suggest DD/M/YYYY. > + * > + * Parsing is mostly stateless in the sense that parsing decisions are > + * not made based on the values of previously parsed data, or whether > + * certain data is present in the first place. (There are a few > + * exceptions to the latter part, though, such as parsing of time zone > + * that would otherwise look like plain time.) I'm not sure that this "stateless" property brings us some advantage. I think that it sometimes causes the results to be surprising at best (one can also call them wrong). I improved the tests of your parsing library (see the patch in the followup email) and added those cases. > + * > + * When the parser encounters a number that is not greedily parsed as > + * part of a format, the interpretation is postponed until the next > + * token is parsed. The parser for the next token may consume the > + * previously postponed number. For example, when parsing "20 May" the > + * meaning of "20" is not known until "May" is parsed. If the parser > + * for the next token does not consume the postponed number, the > + * number is handled as a "lone" number before parser for the next > + * token finishes. > + * > + * 2) The processing phase - create_output() > + * > + * Once the parser in phase 1 has finished, 'struct state' contains > + * all the information from the input string, and it's no longer > + * needed. Since the parser does not even handle the concept of "now", If you aim at this being a generic library, I'd not call this "now" but reference time, as you already do at other comments in your library. > + * the processing initializes the fields referring to the current > + * date/time. > + * > + * If requested, the result is rounded towards past or future. The > + * idea behind rounding is to support parsing date/time ranges in an > + * obvious way. For example, for a range defined as two dates (without > + * time), one would typically want to have an inclusive range from the > + * beginning of start date to the end of the end date. The caller > + * would use rounding towards past in the start date, and towards > + * future in the end date. > + * > + * The absolute date and time is shifted by the relative date and > + * time, and time zone adjustments are made. Daylight saving time > + * (DST) is specifically *not* handled at all. > + * > + * Finally, the result is stored to time_t. > + */ > + > +#define unused(x) x __attribute__ ((unused)) > + > +/* XXX: Redefine these to add i18n support. The keyword table uses > + * N_() to mark strings to be translated; they are accessed > + * dynamically using _(). */ > +#define _(s) (s) /* i18n: define as gettext (s) */ > +#define N_(s) (s) /* i18n: define as gettext_noop (s) */ > + > +#define ARRAY_SIZE(a) (sizeof (a) / sizeof (a[0])) > + > +/* > + * Field indices in the tm and set arrays of struct state. > + * > + * NOTE: There's some code that depends on the ordering of this enum. > + */ > +enum field { > + /* Keep SEC...YEAR in this order. */ > + TM_ABS_SEC, /* seconds */ > + TM_ABS_MIN, /* minutes */ > + TM_ABS_HOUR, /* hours */ > + TM_ABS_MDAY, /* day of the month */ > + TM_ABS_MON, /* month */ > + TM_ABS_YEAR, /* year */ > + > + TM_ABS_WDAY, /* day of the week. special: may be relative */ > + TM_ABS_ISDST, /* daylight saving time */ > + > + TM_AMPM, /* am vs. pm */ > + TM_TZ, /* timezone in minutes */ > + > + /* Keep SEC...YEAR in this order. */ > + TM_REL_SEC, /* seconds relative to now */ > + TM_REL_MIN, /* minutes ... */ > + TM_REL_HOUR, /* hours ... */ > + TM_REL_DAY, /* days ... */ > + TM_REL_MON, /* months ... */ > + TM_REL_YEAR, /* years ... */ > + TM_REL_WEEK, /* weeks ... */ > + > + TM_NONE, /* not a field */ > + > + TM_SIZE = TM_NONE, > + TM_FIRST_ABS = TM_ABS_SEC, > + TM_FIRST_REL = TM_REL_SEC, > +}; > + > +/* Values for the set array of struct state. */ > +enum field_set { > + FIELD_UNSET, /* The field has not been touched by parser. */ > + FIELD_SET, /* The field has been set by parser. */ > + FIELD_NOW, /* The field will be set to "now". */ > +}; > + > +static enum field > +next_abs_field (enum field field) > +{ > + /* NOTE: Depends on the enum ordering. */ > + return field < TM_ABS_YEAR ? field + 1 : TM_NONE; > +} > + > +static enum field > +abs_to_rel_field (enum field field) > +{ > + assert (field <= TM_ABS_YEAR); > + > + /* NOTE: Depends on the enum ordering. */ > + return field + (TM_FIRST_REL - TM_FIRST_ABS); > +} > + > +/* Get epoch value for field. */ > +static int > +field_epoch (enum field field) > +{ > + if (field == TM_ABS_MDAY || field == TM_ABS_MON) > + return 1; > + else if (field == TM_ABS_YEAR) > + return 1970; > + else > + return 0; > +} > + > +/* The parsing state. */ > +struct state { > + int tm[TM_SIZE]; /* parsed date and time */ > + enum field_set set[TM_SIZE]; /* set status of tm */ > + > + enum field last_field; /* Previously set field. */ > + enum field next_field; /* Next field for parse_postponed_number() */ next_field seems to be unused. > + char delim; > + > + int postponed_length; /* Number of digits in postponed value. */ > + int postponed_value; > + char postponed_delim; /* The delimiter preceding postponed number. */ > +}; > + > +/* > + * Helpers for postponed numbers. > + * > + * postponed_length is the number of digits in postponed value. 0 > + * means there is no postponed number. -1 means there is a postponed > + * number, but it comes from a keyword, and it doesn't have digits. > + */ > +static int > +get_postponed_length (struct state *state) > +{ > + return state->postponed_length; > +} > + > +/* > + * Consume a previously postponed number. Return true if a number was > + * in fact postponed, false otherwise. Store the postponed number's > + * value in *v, length in the input string in *n (or -1 if the number > + * was written out and parsed as a keyword), and the preceding > + * delimiter to *d. > + */ > +static bool > +get_postponed_number (struct state *state, int *v, int *n, char *d) > +{ > + if (!state->postponed_length) > + return false; > + > + if (n) > + *n = state->postponed_length; > + > + if (v) > + *v = state->postponed_value; > + > + if (d) > + *d = state->postponed_delim; > + > + state->postponed_length = 0; > + state->postponed_value = 0; > + state->postponed_delim = 0; > + > + return true; > +} > + > +/* Parse a previously postponed number if one exists. */ > +static int parse_postponed_number (struct state *state, int v, int n, char d); > +static int > +handle_postponed_number (struct state *state, enum field next_field) > +{ > + int v = state->postponed_value; > + int n = state->postponed_length; > + char d = state->postponed_delim; > + int r; > + > + if (!n) > + return 0; > + > + state->postponed_value = 0; > + state->postponed_length = 0; > + state->postponed_delim = 0; > + > + state->next_field = next_field; > + r = parse_postponed_number (state, v, n, d); > + state->next_field = TM_NONE; > + > + return r; > +} > + > +/* > + * Postpone a number to be handled later. If one exists already, > + * handle it first. n may be -1 to indicate a keyword that has no > + * number length. > + */ > +static int > +set_postponed_number (struct state *state, int v, int n) > +{ > + int r; > + char d = state->delim; > + > + /* Parse a previously postponed number, if any. */ > + r = handle_postponed_number (state, TM_NONE); > + if (r) > + return r; > + > + state->postponed_length = n; > + state->postponed_value = v; > + state->postponed_delim = d; > + > + return 0; > +} > + > +static void > +set_delim (struct state *state, char delim) > +{ > + state->delim = delim; > +} > + > +static void > +unset_delim (struct state *state) > +{ > + state->delim = 0; > +} > + > +/* > + * Field set/get/mod helpers. > + */ > + > +/* Return true if field has been set. */ > +static bool > +is_field_set (struct state *state, enum field field) > +{ > + assert (field < ARRAY_SIZE (state->tm)); > + > + return field < ARRAY_SIZE (state->set) && > + state->set[field] != FIELD_UNSET; > +} > + > +static void > +unset_field (struct state *state, enum field field) > +{ > + assert (field < ARRAY_SIZE (state->tm)); > + > + state->set[field] = FIELD_UNSET; > + state->tm[field] = 0; > +} > + > +/* > + * Set field to value. A field can only be set once to ensure the > + * input does not contain redundant and potentially conflicting data. > + */ > +static int > +set_field (struct state *state, enum field field, int value) > +{ > + int r; > + > + assert (field < ARRAY_SIZE (state->tm)); > + > + /* Fields can only be set once. */ > + if (field < ARRAY_SIZE (state->set) && state->set[field] != FIELD_UNSET) > + return -PARSE_TIME_ERR_ALREADYSET; > + > + state->set[field] = FIELD_SET; > + > + /* Parse postponed number, if any. */ > + r = handle_postponed_number (state, field); > + if (r) > + return r; > + > + unset_delim (state); > + > + state->tm[field] = value; > + state->last_field = field; > + > + return 0; > +} > + > +/* > + * Mark n fields in fields to be set to current date/time in the > + * specified time zone, or local timezone if not specified. The fields > + * will be initialized after parsing is complete and timezone is > + * known. > + */ > +static int > +set_fields_to_now (struct state *state, enum field *fields, size_t n) > +{ > + size_t i; > + int r; > + > + for (i = 0; i < n; i++) { > + r = set_field (state, fields[i], 0); > + if (r) > + return r; > + state->set[fields[i]] = FIELD_NOW; > + } > + > + return 0; > +} > + > +/* Modify field by adding value to it. To be used on relative fields, > + * which can be modified multiple times (to accumulate). */ > +static int > +mod_field (struct state *state, enum field field, int value) > +{ > + int r; > + > + assert (field < ARRAY_SIZE (state->tm)); /* assert relative??? */ > + > + if (field < ARRAY_SIZE (state->set)) > + state->set[field] = FIELD_SET; > + > + /* Parse postponed number, if any. */ > + r = handle_postponed_number (state, field); > + if (r) > + return r; > + > + unset_delim (state); > + > + state->tm[field] += value; > + state->last_field = field; > + > + return 0; > +} > + > +/* > + * Get field value. Make sure the field is set before query. It's most > + * likely an error to call this while parsing (for example fields set > + * as FIELD_NOW will only be set to some value after parsing). > + */ > +static int > +get_field (struct state *state, enum field field) > +{ > + assert (field < ARRAY_SIZE (state->tm)); > + > + return state->tm[field]; > +} > + > +/* > + * Validity checkers. > + */ > +static bool is_valid_12hour (int h) > +{ > + return h >= 0 && h <= 12; > +} > + > +static bool is_valid_time (int h, int m, int s) > +{ > + /* Allow 24:00:00 to denote end of day. */ > + if (h == 24 && m == 0 && s == 0) > + return true; > + > + return h >= 0 && h <= 23 && m >= 0 && m <= 59 && s >= 0 && s <= 59; > +} > + > +static bool is_valid_mday (int mday) > +{ > + return mday >= 1 && mday <= 31; > +} > + > +static bool is_valid_mon (int mon) > +{ > + return mon >= 1 && mon <= 12; > +} > + > +static bool is_valid_year (int year) > +{ > + return year >= 1970; > +} > + > +static bool is_valid_date (int year, int mon, int mday) > +{ > + return is_valid_year (year) && is_valid_mon (mon) && is_valid_mday (mday); > +} > + > +/* Unset indicator for time and date set helpers. */ > +#define UNSET -1 > + > +/* Time set helper. No input checking. Use UNSET (-1) to leave unset. */ > +static int > +set_abs_time (struct state *state, int hour, int min, int sec) > +{ > + int r; > + > + if (hour != UNSET) { > + if ((r = set_field (state, TM_ABS_HOUR, hour))) > + return r; > + } > + > + if (min != UNSET) { > + if ((r = set_field (state, TM_ABS_MIN, min))) > + return r; > + } > + > + if (sec != UNSET) { > + if ((r = set_field (state, TM_ABS_SEC, sec))) > + return r; > + } > + > + return 0; > +} > + > +/* Date set helper. No input checking. Use UNSET (-1) to leave unset. */ > +static int > +set_abs_date (struct state *state, int year, int mon, int mday) > +{ > + int r; > + > + if (year != UNSET) { > + if ((r = set_field (state, TM_ABS_YEAR, year))) > + return r; > + } > + > + if (mon != UNSET) { > + if ((r = set_field (state, TM_ABS_MON, mon))) > + return r; > + } > + > + if (mday != UNSET) { > + if ((r = set_field (state, TM_ABS_MDAY, mday))) > + return r; > + } > + > + return 0; > +} > + > +/* > + * Keyword parsing and handling. > + */ > +struct keyword; > +typedef int (*setter_t)(struct state *state, struct keyword *kw); > + > +struct keyword { > + const char *name; /* keyword */ > + enum field field; /* field to set, or FIELD_NONE if N/A */ > + int value; /* value to set, or 0 if N/A */ > + setter_t set; /* function to use for setting, if non-NULL */ > +}; > + > +/* > + * Setter callback functions for keywords. > + */ > +static int > +kw_set_default (struct state *state, struct keyword *kw) > +{ > + return set_field (state, kw->field, kw->value); > +} > + > +static int > +kw_set_rel (struct state *state, struct keyword *kw) > +{ > + int multiplier = 1; > + > + /* Get a previously set multiplier, if any. */ > + get_postponed_number (state, &multiplier, NULL, NULL); > + > + /* Accumulate relative field values. */ > + return mod_field (state, kw->field, multiplier * kw->value); > +} > + > +static int > +kw_set_number (struct state *state, struct keyword *kw) > +{ > + /* -1 = no length, from keyword. */ > + return set_postponed_number (state, kw->value, -1); > +} > + > +static int > +kw_set_month (struct state *state, struct keyword *kw) > +{ > + int n = get_postponed_length (state); > + > + /* Consume postponed number if it could be mday. This handles "20 > + * January". */ > + if (n == 1 || n == 2) { > + int r, v; > + > + get_postponed_number (state, &v, NULL, NULL); > + > + if (!is_valid_mday (v)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + r = set_field (state, TM_ABS_MDAY, v); > + if (r) > + return r; > + } > + > + return set_field (state, kw->field, kw->value); > +} > + > +static int > +kw_set_ampm (struct state *state, struct keyword *kw) > +{ > + int n = get_postponed_length (state); > + > + /* Consume postponed number if it could be hour. This handles > + * "5pm". */ > + if (n == 1 || n == 2) { > + int r, v; > + > + get_postponed_number (state, &v, NULL, NULL); > + > + if (!is_valid_12hour (v)) > + return -PARSE_TIME_ERR_INVALIDTIME; > + > + r = set_abs_time (state, v, 0, 0); > + if (r) > + return r; > + } > + > + return set_field (state, kw->field, kw->value); > +} > + > +static int > +kw_set_timeofday (struct state *state, struct keyword *kw) > +{ > + return set_abs_time (state, kw->value, 0, 0); > +} > + > +static int > +kw_set_today (struct state *state, unused (struct keyword *kw)) > +{ > + enum field fields[] = { TM_ABS_YEAR, TM_ABS_MON, TM_ABS_MDAY }; > + > + return set_fields_to_now (state, fields, ARRAY_SIZE (fields)); > +} > + > +static int > +kw_set_now (struct state *state, unused (struct keyword *kw)) > +{ > + enum field fields[] = { TM_ABS_HOUR, TM_ABS_MIN, TM_ABS_SEC }; > + > + return set_fields_to_now (state, fields, ARRAY_SIZE (fields)); > +} > + > +static int > +kw_set_ordinal (struct state *state, struct keyword *kw) > +{ > + int n, v; > + > + /* Require a postponed number. */ > + if (!get_postponed_number (state, &v, &n, NULL)) > + return -PARSE_TIME_ERR_DATEFORMAT; > + > + /* Ordinals are mday. */ > + if (n != 1 && n != 2) > + return -PARSE_TIME_ERR_DATEFORMAT; > + > + /* Be strict about st, nd, rd, and lax about th. */ > + if (strcasecmp (kw->name, "st") == 0 && v != 1 && v != 21 && v != 31) > + return -PARSE_TIME_ERR_INVALIDDATE; > + else if (strcasecmp (kw->name, "nd") == 0 && v != 2 && v != 22) > + return -PARSE_TIME_ERR_INVALIDDATE; > + else if (strcasecmp (kw->name, "rd") == 0 && v != 3 && v != 23) > + return -PARSE_TIME_ERR_INVALIDDATE; > + else if (strcasecmp (kw->name, "th") == 0 && !is_valid_mday (v)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + return set_field (state, TM_ABS_MDAY, v); > +} > + > +/* > + * Accepted keywords. > + * > + * A keyword may optionally contain a '|' to indicate the minimum > + * match length. Without one, full match is required. It's advisable > + * to keep the minimum match parts unique across all keywords. > + * > + * If keyword begins with upper case letter, then the matching will be > + * case sensitive. Otherwise the matching is case insensitive. > + * > + * If setter is NULL, set_default will be used. > + * > + * Note: Order matters. Matching is greedy, longest match is used, but > + * of equal length matches the first one is used, unless there's an > + * equal length case sensitive match which trumps case insensitive > + * matches. > + */ > +static struct keyword keywords[] = { > + /* Weekdays. */ > + { N_("sun|day"), TM_ABS_WDAY, 0, NULL }, > + { N_("mon|day"), TM_ABS_WDAY, 1, NULL }, > + { N_("tue|sday"), TM_ABS_WDAY, 2, NULL }, > + { N_("wed|nesday"), TM_ABS_WDAY, 3, NULL }, > + { N_("thu|rsday"), TM_ABS_WDAY, 4, NULL }, > + { N_("fri|day"), TM_ABS_WDAY, 5, NULL }, > + { N_("sat|urday"), TM_ABS_WDAY, 6, NULL }, > + > + /* Months. */ > + { N_("jan|uary"), TM_ABS_MON, 1, kw_set_month }, > + { N_("feb|ruary"), TM_ABS_MON, 2, kw_set_month }, > + { N_("mar|ch"), TM_ABS_MON, 3, kw_set_month }, > + { N_("apr|il"), TM_ABS_MON, 4, kw_set_month }, > + { N_("may"), TM_ABS_MON, 5, kw_set_month }, > + { N_("jun|e"), TM_ABS_MON, 6, kw_set_month }, > + { N_("jul|y"), TM_ABS_MON, 7, kw_set_month }, > + { N_("aug|ust"), TM_ABS_MON, 8, kw_set_month }, > + { N_("sep|tember"), TM_ABS_MON, 9, kw_set_month }, > + { N_("oct|ober"), TM_ABS_MON, 10, kw_set_month }, > + { N_("nov|ember"), TM_ABS_MON, 11, kw_set_month }, > + { N_("dec|ember"), TM_ABS_MON, 12, kw_set_month }, > + > + /* Durations. */ > + { N_("y|ears"), TM_REL_YEAR, 1, kw_set_rel }, > + { N_("w|eeks"), TM_REL_WEEK, 1, kw_set_rel }, > + { N_("d|ays"), TM_REL_DAY, 1, kw_set_rel }, > + { N_("h|ours"), TM_REL_HOUR, 1, kw_set_rel }, > + { N_("hr|s"), TM_REL_HOUR, 1, kw_set_rel }, > + { N_("m|inutes"), TM_REL_MIN, 1, kw_set_rel }, > + /* M=months, m=minutes */ > + { N_("M"), TM_REL_MON, 1, kw_set_rel }, > + { N_("mins"), TM_REL_MIN, 1, kw_set_rel }, > + { N_("mo|nths"), TM_REL_MON, 1, kw_set_rel }, > + { N_("s|econds"), TM_REL_SEC, 1, kw_set_rel }, > + { N_("secs"), TM_REL_SEC, 1, kw_set_rel }, > + > + /* Numbers. */ > + { N_("one"), TM_NONE, 1, kw_set_number }, > + { N_("two"), TM_NONE, 2, kw_set_number }, > + { N_("three"), TM_NONE, 3, kw_set_number }, > + { N_("four"), TM_NONE, 4, kw_set_number }, > + { N_("five"), TM_NONE, 5, kw_set_number }, > + { N_("six"), TM_NONE, 6, kw_set_number }, > + { N_("seven"), TM_NONE, 7, kw_set_number }, > + { N_("eight"), TM_NONE, 8, kw_set_number }, > + { N_("nine"), TM_NONE, 9, kw_set_number }, > + { N_("ten"), TM_NONE, 10, kw_set_number }, > + { N_("dozen"), TM_NONE, 12, kw_set_number }, > + { N_("hundred"), TM_NONE, 100, kw_set_number }, > + > + /* Special number forms. */ > + { N_("this"), TM_NONE, 0, kw_set_number }, > + { N_("last"), TM_NONE, 1, kw_set_number }, > + > + /* Other special keywords. */ > + { N_("yesterday"), TM_REL_DAY, 1, kw_set_rel }, > + { N_("today"), TM_NONE, 0, kw_set_today }, > + { N_("now"), TM_NONE, 0, kw_set_now }, > + { N_("noon"), TM_NONE, 12, kw_set_timeofday }, > + { N_("midnight"), TM_NONE, 0, kw_set_timeofday }, > + { N_("am"), TM_AMPM, 0, kw_set_ampm }, > + { N_("a.m."), TM_AMPM, 0, kw_set_ampm }, > + { N_("pm"), TM_AMPM, 1, kw_set_ampm }, > + { N_("p.m."), TM_AMPM, 1, kw_set_ampm }, > + { N_("st"), TM_NONE, 0, kw_set_ordinal }, > + { N_("nd"), TM_NONE, 0, kw_set_ordinal }, > + { N_("rd"), TM_NONE, 0, kw_set_ordinal }, > + { N_("th"), TM_NONE, 0, kw_set_ordinal }, > + > + /* Timezone codes: offset in minutes. XXX: Add more codes. */ > + { N_("pst"), TM_TZ, -8*60, NULL }, > + { N_("mst"), TM_TZ, -7*60, NULL }, > + { N_("cst"), TM_TZ, -6*60, NULL }, > + { N_("est"), TM_TZ, -5*60, NULL }, > + { N_("ast"), TM_TZ, -4*60, NULL }, > + { N_("nst"), TM_TZ, -(3*60+30), NULL }, > + > + { N_("gmt"), TM_TZ, 0, NULL }, > + { N_("utc"), TM_TZ, 0, NULL }, > + > + { N_("wet"), TM_TZ, 0, NULL }, > + { N_("cet"), TM_TZ, 1*60, NULL }, > + { N_("eet"), TM_TZ, 2*60, NULL }, > + { N_("fet"), TM_TZ, 3*60, NULL }, > + > + { N_("wat"), TM_TZ, 1*60, NULL }, > + { N_("cat"), TM_TZ, 2*60, NULL }, > + { N_("eat"), TM_TZ, 3*60, NULL }, > +}; > + > +/* > + * Compare strings s and keyword. Return number of matching chars on > + * match, 0 for no match. Match must be at least n chars, or all of > + * keyword if n < 0, otherwise it's not a match. Use match_case for > + * case sensitive matching. > + */ > +static size_t > +stringcmp (const char *s, const char *keyword, ssize_t n, bool match_case) > +{ > + ssize_t i; > + > + if (!n) > + return 0; > + > + for (i = 0; *s && *keyword; i++, s++, keyword++) { > + if (match_case) { > + if (*s != *keyword) > + break; > + } else { > + if (tolower ((unsigned char) *s) != > + tolower ((unsigned char) *keyword)) > + break; > + } > + } > + > + if (n > 0) > + return i < n ? 0 : i; > + else > + return *keyword ? 0 : i; > +} > + > +/* > + * Parse a keyword. Return < 0 on error, number of parsed chars on > + * success. > + */ > +static ssize_t > +parse_keyword (struct state *state, const char *s) > +{ > + unsigned int i; > + size_t n, max_n = 0; > + struct keyword *kw = NULL; > + int r; > + > + /* Match longest keyword */ > + for (i = 0; i < ARRAY_SIZE (keywords); i++) { > + /* Match case if keyword begins with upper case letter. */ > + bool mcase = isupper ((unsigned char) keywords[i].name[0]); > + ssize_t minlen = -1; > + char keyword[128]; > + char *p; > + > + strncpy (keyword, _(keywords[i].name), sizeof (keyword)); > + > + /* Truncate too long keywords. XXX: Make this dynamic? */ > + keyword[sizeof (keyword) - 1] = '\0'; > + > + /* Minimum match length. */ > + p = strchr (keyword, '|'); > + if (p) { > + minlen = p - keyword; > + > + /* Remove the minimum match length separator. */ > + memmove (p, p + 1, strlen (p + 1) + 1); > + } > + > + n = stringcmp (s, keyword, minlen, mcase); > + if (n > max_n || (n == max_n && mcase)) { > + max_n = n; > + kw = &keywords[i]; > + } > + } > + > + if (!kw) > + return -PARSE_TIME_ERR_KEYWORD; > + > + if (kw->set) > + r = kw->set (state, kw); > + else > + r = kw_set_default (state, kw); > + > + if (r < 0) > + return r; > + > + return max_n; > +} > + > +/* > + * Non-keyword parsers and their helpers. > + */ > + > +static int > +set_user_tz (struct state *state, char sign, int hour, int min) > +{ > + int tz = hour * 60 + min; > + > + assert (sign == '+' || sign == '-'); > + > + if (hour < 0 || hour > 14 || min < 0 || min > 59 || min % 15) > + return -PARSE_TIME_ERR_INVALIDTIME; > + > + if (sign == '-') > + tz = -tz; > + > + return set_field (state, TM_TZ, tz); > +} > + > +/* > + * Independent parsing of a postponed number when it wasn't consumed > + * during parsing of the following token. > + */ > +static int > +parse_postponed_number (struct state *state, int v, int n, char d) > +{ > + /* > + * alright, these are really lone, won't affect parsing of > + * following items... it's not a multiplier, those have been eaten > + * away. > + * > + * also note numbers eaten away by parse_single_number. > + */ > + > + assert (n < 8); > + > + if (n == 1 || n == 2) { I think that guessing the meaning of a number based on its length is a way to hell. Again, see tests in the followup patch. > + /* Notable exception: Previous field affects parsing. This > + * handles "January 20". */ > + if (state->last_field == TM_ABS_MON) { > + /* D[D] */ > + if (!is_valid_mday (v)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + return set_field (state, TM_ABS_MDAY, v); > + } else if (n == 2) { > + /* XXX: Only allow if last field is hour, min, or sec? */ > + if (d == '+' || d == '-') { > + /* +/-HH */ > + return set_user_tz (state, d, v, 0); > + } > + } > + } else if (n == 4) { > + /* Notable exception: Value affects parsing. Time zones are > + * always at most 1400 and we don't understand years before > + * 1970. */ > + if (!is_valid_year (v)) { > + if (d == '+' || d == '-') { > + /* +/-HHMM */ > + return set_user_tz (state, d, v / 100, v % 100); > + } > + } else { > + /* YYYY */ > + return set_field (state, TM_ABS_YEAR, v); > + } > + } else if (n == 6) { > + /* HHMMSS */ > + int hour = v / 10000; > + int min = (v / 100) % 100; > + int sec = v % 100; > + > + if (!is_valid_time (hour, min, sec)) > + return -PARSE_TIME_ERR_INVALIDTIME; > + > + return set_abs_time (state, hour, min, sec); > + } > + > + /* else n is one of {-1, 3, 5, 7 } */ > + > + return -PARSE_TIME_ERR_FORMAT; > +} > + > +/* Parse a single number. Typically postpone parsing until later. */ > +static int > +parse_single_number (struct state *state, unsigned long v, > + unsigned long n) > +{ > + assert (n); > + > + /* Parse things that can be parsed immediately. */ > + if (n == 8) { > + /* YYYYMMDD */ > + int year = v / 10000; > + int mon = (v / 100) % 100; > + int mday = v % 100; > + > + if (!is_valid_date (year, mon, mday)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + return set_abs_date (state, year, mon, mday); > + } else if (n > 8) { > + /* XXX: Seconds since epoch. */ > + return -PARSE_TIME_ERR_FORMAT; > + } > + > + if (v > INT_MAX) > + return -PARSE_TIME_ERR_FORMAT; > + > + return set_postponed_number (state, v, n); > +} > + > +static bool > +is_time_sep (char c) > +{ > + return c == ':'; > +} > + > +static bool > +is_date_sep (char c) > +{ > + return c == '/' || c == '-' || c == '.'; > +} > + > +static bool > +is_sep (char c) > +{ > + return is_time_sep (c) || is_date_sep (c); > +} > + > +/* Two-digit year: 00...69 is 2000s, 70...99 1900s, if n == 0 keep > + * unset. */ > +static int > +expand_year (unsigned long year, size_t n) > +{ > + if (n == 2) { > + return (year < 70 ? 2000 : 1900) + year; > + } else if (n == 4) { > + return year; > + } else { > + return UNSET; > + } > +} > + > +/* Parse a date number triplet. */ > +static int > +parse_date (struct state *state, char sep, > + unsigned long v1, unsigned long v2, unsigned long v3, > + size_t n1, size_t n2, size_t n3) > +{ > + int year = UNSET, mon = UNSET, mday = UNSET; > + > + assert (is_date_sep (sep)); > + > + switch (sep) { > + case '/': /* Date: M[M]/D[D][/YY[YY]] or M[M]/YYYY */ > + if (n1 != 1 && n1 != 2) > + return -PARSE_TIME_ERR_DATEFORMAT; > + > + if ((n2 == 1 || n2 == 2) && (n3 == 0 || n3 == 2 || n3 == 4)) { > + /* M[M]/D[D][/YY[YY]] */ > + year = expand_year (v3, n3); > + mon = v1; > + mday = v2; > + } else if (n2 == 4 && n3 == 0) { > + /* M[M]/YYYY */ > + year = v2; > + mon = v1; > + } else { > + return -PARSE_TIME_ERR_DATEFORMAT; > + } > + break; > + > + case '-': /* Date: YYYY-MM[-DD] or DD-MM[-YY[YY]] or MM-YYYY */ > + if (n1 == 4 && n2 == 2 && (n3 == 0 || n3 == 2)) { > + /* YYYY-MM[-DD] */ > + year = v1; > + mon = v2; > + if (n3) > + mday = v3; > + } else if (n1 == 2 && n2 == 2 && (n3 == 0 || n3 == 2 || n3 == 4)) { > + /* DD-MM[-YY[YY]] */ > + year = expand_year (v3, n3); > + mon = v2; > + mday = v1; > + } else if (n1 == 2 && n2 == 4 && n3 == 0) { > + /* MM-YYYY */ > + year = v2; > + mon = v1; > + } else { > + return -PARSE_TIME_ERR_DATEFORMAT; > + } > + break; > + > + case '.': /* Date: D[D].M[M][.[YY[YY]]] */ > + if ((n1 != 1 && n1 != 2) || (n2 != 1 && n2 != 2) || > + (n3 != 0 && n3 != 2 && n3 != 4)) > + return -PARSE_TIME_ERR_DATEFORMAT; > + > + year = expand_year (v3, n3); > + mon = v2; > + mday = v1; > + break; > + } > + > + if (year != UNSET && !is_valid_year (year)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + if (mon != UNSET && !is_valid_mon (mon)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + if (mday != UNSET && !is_valid_mday (mday)) > + return -PARSE_TIME_ERR_INVALIDDATE; > + > + return set_abs_date (state, year, mon, mday); > +} > + > +/* Parse a time number triplet. */ > +static int > +parse_time (struct state *state, char sep, > + unsigned long v1, unsigned long v2, unsigned long v3, > + size_t n1, size_t n2, size_t n3) > +{ > + assert (is_time_sep (sep)); > + > + if ((n1 != 1 && n1 != 2) || n2 != 2 || (n3 != 0 && n3 != 2)) > + return -PARSE_TIME_ERR_TIMEFORMAT; > + > + /* > + * Notable exception: Previously set fields affect > + * parsing. Interpret (+|-)HH:MM as time zone only if hour and > + * minute have been set. > + * > + * XXX: This could be fixed by restricting the delimiters > + * preceding time. For '+' it would be justified, but for '-' it > + * might be inconvenient. However prefer to allow '-' as an > + * insignificant delimiter preceding time for convenience, and > + * handle '+' the same way for consistency between positive and > + * negative time zones. > + */ > + if (is_field_set (state, TM_ABS_HOUR) && > + is_field_set (state, TM_ABS_MIN) && > + n1 == 2 && n2 == 2 && n3 == 0 && > + (state->delim == '+' || state->delim == '-')) { > + return set_user_tz (state, state->delim, v1, v2); > + } > + > + if (!is_valid_time (v1, v2, v3)) > + return -PARSE_TIME_ERR_INVALIDTIME; > + > + return set_abs_time (state, v1, v2, n3 ? v3 : 0); > +} > + > +/* strtoul helper that assigns length. */ > +static unsigned long > +strtoul_len (const char *s, const char **endp, size_t *len) > +{ > + unsigned long val = strtoul (s, (char **) endp, 10); > + > + *len = *endp - s; > + return val; > +} > + > +/* > + * Parse a (group of) number(s). Return < 0 on error, number of parsed > + * chars on success. > + */ > +static ssize_t > +parse_number (struct state *state, const char *s) > +{ > + int r; > + unsigned long v1, v2, v3 = 0; > + size_t n1, n2, n3 = 0; > + const char *p = s; > + char sep; > + > + v1 = strtoul_len (p, &p, &n1); > + > + if (is_sep (*p) && isdigit ((unsigned char) *(p + 1))) { > + sep = *p; > + v2 = strtoul_len (p + 1, &p, &n2); > + } else { > + /* A single number. */ > + r = parse_single_number (state, v1, n1); > + if (r) > + return r; > + > + return p - s; > + } > + > + /* A group of two or three numbers? */ > + if (*p == sep && isdigit ((unsigned char) *(p + 1))) > + v3 = strtoul_len (p + 1, &p, &n3); > + > + if (is_time_sep (sep)) > + r = parse_time (state, sep, v1, v2, v3, n1, n2, n3); > + else > + r = parse_date (state, sep, v1, v2, v3, n1, n2, n3); > + > + if (r) > + return r; > + > + return p - s; > +} > + > +/* > + * Parse delimiter(s). Throw away all except the last one, which is > + * stored for parsing the next non-delimiter. Return < 0 on error, > + * number of parsed chars on success. > + * > + * XXX: We might want to be more strict here. > + */ > +static ssize_t > +parse_delim (struct state *state, const char *s) > +{ > + const char *p = s; > + > + /* > + * Skip non-alpha and non-digit, and store the last for further > + * processing. > + */ > + while (*p && !isalnum ((unsigned char) *p)) { > + set_delim (state, *p); > + p++; > + } > + > + return p - s; > +} > + > +/* > + * Parse a date/time string. Return < 0 on error, number of parsed > + * chars on success. > + */ > +static ssize_t > +parse_input (struct state *state, const char *s) > +{ > + const char *p = s; > + ssize_t n; > + int r; > + > + while (*p) { > + if (isalpha ((unsigned char) *p)) { > + n = parse_keyword (state, p); > + } else if (isdigit ((unsigned char) *p)) { > + n = parse_number (state, p); > + } else { > + n = parse_delim (state, p); > + } > + > + if (n <= 0) { > + if (n == 0) > + n = -PARSE_TIME_ERR; > + > + return n; > + } > + > + p += n; > + } > + > + /* Parse postponed number, if any. */ > + r = handle_postponed_number (state, TM_NONE); > + if (r < 0) > + return r; > + > + return p - s; > +} > + > +/* > + * Processing the parsed input. > + */ > + > +/* > + * Initialize reference time to tm. Use time zone in state if > + * specified, otherwise local time. Use now for reference time if > + * non-NULL, otherwise current time. > + */ > +static int > +initialize_now (struct state *state, struct tm *tm, const time_t *now) > +{ > + time_t t; > + > + if (now) { > + t = *now; > + } else { > + if (time (&t) == (time_t) -1) > + return -PARSE_TIME_ERR_LIB; > + } > + > + if (is_field_set (state, TM_TZ)) { > + /* Some other time zone. */ > + > + /* Adjust now according to the TZ. */ > + t += get_field (state, TM_TZ) * 60; > + > + /* It's not gm, but this doesn't mess with the TZ. */ > + if (gmtime_r (&t, tm) == NULL) > + return -PARSE_TIME_ERR_LIB; > + } else { > + /* Local time. */ > + if (localtime_r (&t, tm) == NULL) > + return -PARSE_TIME_ERR_LIB; > + } > + > + return 0; > +} > + > +/* > + * Normalize tm according to mktime(3). Both mktime(3) and > + * localtime_r(3) use local time, but they cancel each other out here, > + * making this function agnostic to time zone. > + */ > +static int > +normalize_tm (struct tm *tm) > +{ > + time_t t = mktime (tm); > + > + if (t == (time_t) -1) > + return -PARSE_TIME_ERR_LIB; > + > + if (!localtime_r (&t, tm)) > + return -PARSE_TIME_ERR_LIB; > + > + return 0; > +} > + > +/* Get field out of a struct tm. */ > +static int > +tm_get_field (const struct tm *tm, enum field field) > +{ > + switch (field) { > + case TM_ABS_SEC: return tm->tm_sec; > + case TM_ABS_MIN: return tm->tm_min; > + case TM_ABS_HOUR: return tm->tm_hour; > + case TM_ABS_MDAY: return tm->tm_mday; > + case TM_ABS_MON: return tm->tm_mon + 1; /* 0- to 1-based */ > + case TM_ABS_YEAR: return 1900 + tm->tm_year; > + case TM_ABS_WDAY: return tm->tm_wday; > + case TM_ABS_ISDST: return tm->tm_isdst; > + default: > + assert (false); > + break; > + } > + > + return 0; > +} > + > +/* Modify hour according to am/pm setting. */ > +static int > +fixup_ampm (struct state *state) > +{ > + int hour, hdiff = 0; > + > + if (!is_field_set (state, TM_AMPM)) > + return 0; > + > + if (!is_field_set (state, TM_ABS_HOUR)) > + return -PARSE_TIME_ERR_TIMEFORMAT; > + > + hour = get_field (state, TM_ABS_HOUR); > + if (!is_valid_12hour (hour)) > + return -PARSE_TIME_ERR_INVALIDTIME; > + > + if (get_field (state, TM_AMPM)) { > + /* 12pm is noon. */ > + if (hour != 12) > + hdiff = 12; > + } else { > + /* 12am is midnight, beginning of day. */ > + if (hour == 12) > + hdiff = -12; > + } > + > + mod_field (state, TM_REL_HOUR, -hdiff); > + > + return 0; > +} > + > +/* Combine absolute and relative fields, and round. */ > +static int > +create_output (struct state *state, time_t *t_out, const time_t *tnow, > + int round) > +{ > + struct tm tm = { .tm_isdst = -1 }; > + struct tm now; > + time_t t; > + enum field f; > + int r; > + int week_round = PARSE_TIME_NO_ROUND; > + > + r = initialize_now (state, &now, tnow); > + if (r) > + return r; > + > + /* Initialize uninitialized fields to now. */ > + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { > + if (state->set[f] == FIELD_NOW) { > + state->tm[f] = tm_get_field (&now, f); > + state->set[f] = FIELD_SET; > + } > + } > + > + /* > + * If MON is set but YEAR is not, refer to past month. > + * > + * XXX: Why are month/week special in this regard? What about > + * mday, or time. Should refer to past. > + */ > + if (is_field_set (state, TM_ABS_MON) && > + !is_field_set (state, TM_ABS_YEAR)) { > + if (get_field (state, TM_ABS_MON) >= tm_get_field (&now, TM_ABS_MON)) > + mod_field (state, TM_REL_YEAR, 1); > + } > + > + /* > + * If WDAY is set but MDAY is not, we consider WDAY relative > + * > + * XXX: This fails on stuff like "two months ago monday" because > + * two months ago wasn't the same day as today. Postpone until we > + * know date? > + */ > + if (is_field_set (state, TM_ABS_WDAY) && > + !is_field_set (state, TM_ABS_MDAY)) { > + int wday = get_field (state, TM_ABS_WDAY); > + int today = tm_get_field (&now, TM_ABS_WDAY); > + int rel_days; > + > + if (today > wday) > + rel_days = today - wday; > + else > + rel_days = today + 7 - wday; > + > + /* This also prevents special week rounding from happening. */ > + mod_field (state, TM_REL_DAY, rel_days); > + > + unset_field (state, TM_ABS_WDAY); > + } > + > + r = fixup_ampm (state); > + if (r) > + return r; > + > + /* > + * Iterate fields from most accurate to least accurate, and set > + * unset fields according to requested rounding. > + */ > + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { > + if (round != PARSE_TIME_NO_ROUND) { > + enum field r = abs_to_rel_field (f); > + > + if (is_field_set (state, f) || is_field_set (state, r)) { > + if (round >= PARSE_TIME_ROUND_UP) > + mod_field (state, r, -1); > + round = PARSE_TIME_NO_ROUND; /* No more rounding. */ > + } else { > + if (f == TM_ABS_MDAY && > + is_field_set (state, TM_REL_WEEK)) { > + /* Week is most accurate. */ > + week_round = round; > + round = PARSE_TIME_NO_ROUND; > + } else { > + set_field (state, f, field_epoch (f)); > + } > + } > + } > + > + if (!is_field_set (state, f)) > + set_field (state, f, tm_get_field (&now, f)); > + } > + > + /* Special case: rounding with week accuracy. */ > + if (week_round != PARSE_TIME_NO_ROUND) { > + /* Temporarily set more accurate fields to now. */ > + set_field (state, TM_ABS_SEC, tm_get_field (&now, TM_ABS_SEC)); > + set_field (state, TM_ABS_MIN, tm_get_field (&now, TM_ABS_MIN)); > + set_field (state, TM_ABS_HOUR, tm_get_field (&now, TM_ABS_HOUR)); > + set_field (state, TM_ABS_MDAY, tm_get_field (&now, TM_ABS_MDAY)); > + } > + > + /* > + * Set all fields. They may contain out of range values before > + * normalization by mktime(3). > + */ > + tm.tm_sec = get_field (state, TM_ABS_SEC) - get_field (state, TM_REL_SEC); > + tm.tm_min = get_field (state, TM_ABS_MIN) - get_field (state, TM_REL_MIN); > + tm.tm_hour = get_field (state, TM_ABS_HOUR) - get_field (state, TM_REL_HOUR); > + tm.tm_mday = get_field (state, TM_ABS_MDAY) - > + get_field (state, TM_REL_DAY) - 7 * get_field (state, TM_REL_WEEK); > + tm.tm_mon = get_field (state, TM_ABS_MON) - get_field (state, TM_REL_MON); > + tm.tm_mon--; /* 1- to 0-based */ > + tm.tm_year = get_field (state, TM_ABS_YEAR) - get_field (state, TM_REL_YEAR) - 1900; > + > + /* > + * It's always normal time. > + * > + * XXX: This is probably not a solution that universally > + * works. Just make sure DST is not taken into account. We don't > + * want rounding to be affected by DST. > + */ > + tm.tm_isdst = -1; > + > + /* Special case: rounding with week accuracy. */ > + if (week_round != PARSE_TIME_NO_ROUND) { > + /* Normalize to get proper tm.wday. */ > + r = normalize_tm (&tm); > + if (r < 0) > + return r; > + > + /* Set more accurate fields back to zero. */ > + tm.tm_sec = 0; > + tm.tm_min = 0; > + tm.tm_hour = 0; > + tm.tm_isdst = -1; > + > + /* Monday is the true 1st day of week, but this is easier. */ > + if (week_round <= PARSE_TIME_ROUND_DOWN) > + tm.tm_mday -= tm.tm_wday; > + else > + tm.tm_mday += 7 - tm.tm_wday; > + } > + > + if (is_field_set (state, TM_TZ)) { > + /* tm is in specified TZ, convert to UTC for timegm(3). */ > + tm.tm_min -= get_field (state, TM_TZ); > + t = timegm (&tm); > + } else { > + /* tm is in local time. */ > + t = mktime (&tm); > + } > + > + if (t == (time_t) -1) > + return -PARSE_TIME_ERR_LIB; > + > + *t_out = t; > + > + return 0; > +} > + > +/* Internally, all errors are < 0. parse_time_string() returns errors > 0. */ > +#define EXTERNAL_ERR(r) (-r) > + > +int > +parse_time_string (const char *s, time_t *t, const time_t *now, int round) > +{ > + struct state state = { .last_field = TM_NONE }; > + int r; > + > + if (!s || !t) > + return EXTERNAL_ERR (-PARSE_TIME_ERR); > + > + r = parse_input (&state, s); > + if (r < 0) > + return EXTERNAL_ERR (r); > + > + r = create_output (&state, t, now, round); > + if (r < 0) > + return EXTERNAL_ERR (r); > + > + return 0; > +} > diff --git a/parse-time-string/parse-time-string.h b/parse-time-string/parse-time-string.h > new file mode 100644 > index 0000000..50b7c6f > --- /dev/null > +++ b/parse-time-string/parse-time-string.h > @@ -0,0 +1,95 @@ > +/* > + * parse time string - user friendly date and time parser > + * Copyright © 2012 Jani Nikula > + * > + * This program is free software: you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see <http://www.gnu.org/licenses/>. > + * > + * Author: Jani Nikula <jani@nikula.org> > + */ > + > +#ifndef PARSE_TIME_STRING_H > +#define PARSE_TIME_STRING_H > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +#include <time.h> > + > +/* return values for parse_time_string() */ > +enum { > + PARSE_TIME_OK = 0, > + PARSE_TIME_ERR, /* unspecified error */ > + PARSE_TIME_ERR_LIB, /* library call failed */ > + PARSE_TIME_ERR_ALREADYSET, /* attempt to set unit twice */ > + PARSE_TIME_ERR_FORMAT, /* generic date/time format error */ > + PARSE_TIME_ERR_DATEFORMAT, /* date format error */ > + PARSE_TIME_ERR_TIMEFORMAT, /* time format error */ > + PARSE_TIME_ERR_INVALIDDATE, /* date value error */ > + PARSE_TIME_ERR_INVALIDTIME, /* time value error */ > + PARSE_TIME_ERR_KEYWORD, /* unknown keyword */ > +}; > + > +/* round values for parse_time_string() */ > +enum { > + PARSE_TIME_ROUND_DOWN = -1, > + PARSE_TIME_NO_ROUND = 0, > + PARSE_TIME_ROUND_UP = 1, > +}; > + > +/** > + * parse_time_string() - user friendly date and time parser > + * @s: string to parse > + * @t: pointer to time_t to store parsed time in > + * @now: pointer to time_t containing reference date/time, or NULL > + * @round: PARSE_TIME_NO_ROUND, PARSE_TIME_ROUND_DOWN, or > + * PARSE_TIME_ROUND_UP > + * > + * Parse a date/time string 's' and store the parsed date/time result > + * in 't'. > + * > + * A reference date/time is used for determining the "date/time units" > + * (roughly equivalent to struct tm members) not specified by 's'. If > + * 'now' is non-NULL, it must contain a pointer to a time_t to be used > + * as reference date/time. Otherwise, the current time is used. > + * > + * If 's' does not specify a full date/time, the 'round' parameter > + * specifies if and how the result should be rounded as follows: > + * > + * PARSE_TIME_NO_ROUND: All date/time units that are not specified > + * by 's' are set to the corresponding unit derived from the > + * reference date/time. > + * > + * PARSE_TIME_ROUND_DOWN: All date/time units that are more accurate > + * than the most accurate unit specified by 's' are set to the > + * smallest valid value for that unit. Rest of the unspecified units > + * are set as in PARSE_TIME_NO_ROUND. > + * > + * PARSE_TIME_ROUND_UP: All date/time units that are more accurate > + * than the most accurate unit specified by 's' are set to the > + * smallest valid value for that unit. The most accurate unit > + * specified by 's' is incremented by one (and this is rolled over > + * to the less accurate units as necessary). Rest of the unspecified > + * units are set as in PARSE_TIME_NO_ROUND. Why you round down and increase the most accurate unit? If I want to see emails that were send yesterday, I do not want to see any email that was sent the first second of today. (OK, I know that this is slightly easier to implement) > + * > + * Return 0 (PARSE_TIME_OK) for succesfully parsed date/time, or one > + * of PARSE_TIME_ERR_* on error. 't' is not modified on error. > + */ > +int parse_time_string (const char *s, time_t *t, const time_t *now, int round); now -> ref? > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* PARSE_TIME_STRING_H */ > -- > 1.7.9.5 -Michal ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch 2012-09-25 11:56 ` Michal Sojka @ 2012-10-03 18:49 ` Jani Nikula 2012-10-03 19:02 ` Michal Sojka 0 siblings, 1 reply; 30+ messages in thread From: Jani Nikula @ 2012-10-03 18:49 UTC (permalink / raw) To: Michal Sojka, notmuch, David Bremner On Tue, 25 Sep 2012, Michal Sojka <sojkam1@fel.cvut.cz> wrote: > Hello Jani, > > On Wed, Sep 12 2012, Jani Nikula wrote: >> Add a date/time parser to notmuch, to be used for adding date range >> query support for notmuch lib later on. Add the parser to a directory >> of its own to make it independent of the rest of the notmuch code >> base. > > First of all, thank you very much for pushing this towards mainline. > This is definitely one of the features I miss in notmuch most. > > Some comments below. Thanks for the comments; sorry about the delay in responding. >> >> Signed-off-by: Jani Nikula <jani@nikula.org> >> --- >> Makefile | 2 +- >> parse-time-string/Makefile | 5 + >> parse-time-string/Makefile.local | 12 + >> parse-time-string/README | 9 + >> parse-time-string/parse-time-string.c | 1484 +++++++++++++++++++++++++++++++++ >> parse-time-string/parse-time-string.h | 95 +++ >> 6 files changed, 1606 insertions(+), 1 deletion(-) >> create mode 100644 parse-time-string/Makefile >> create mode 100644 parse-time-string/Makefile.local >> create mode 100644 parse-time-string/README >> create mode 100644 parse-time-string/parse-time-string.c >> create mode 100644 parse-time-string/parse-time-string.h >> >> diff --git a/Makefile b/Makefile >> index e5e2e3a..bb9c316 100644 >> --- a/Makefile >> +++ b/Makefile >> @@ -3,7 +3,7 @@ >> all: >> >> # List all subdirectories here. Each contains its own Makefile.local >> -subdirs = compat completion emacs lib man util test >> +subdirs = compat completion emacs lib man parse-time-string util test >> >> # We make all targets depend on the Makefiles themselves. >> global_deps = Makefile Makefile.config Makefile.local \ >> diff --git a/parse-time-string/Makefile b/parse-time-string/Makefile >> new file mode 100644 >> index 0000000..fa25832 >> --- /dev/null >> +++ b/parse-time-string/Makefile >> @@ -0,0 +1,5 @@ >> +all: >> + $(MAKE) -C .. all >> + >> +.DEFAULT: >> + $(MAKE) -C .. $@ >> diff --git a/parse-time-string/Makefile.local b/parse-time-string/Makefile.local >> new file mode 100644 >> index 0000000..53534f3 >> --- /dev/null >> +++ b/parse-time-string/Makefile.local >> @@ -0,0 +1,12 @@ >> +dir := parse-time-string >> +extra_cflags += -I$(srcdir)/$(dir) >> + >> +libparse-time-string_c_srcs := $(dir)/parse-time-string.c >> + >> +libparse-time-string_modules := $(libparse-time-string_c_srcs:.c=.o) >> + >> +$(dir)/libparse-time-string.a: $(libparse-time-string_modules) >> + $(call quiet,AR) rcs $@ $^ >> + >> +SRCS := $(SRCS) $(libparse-time-string_c_srcs) >> +CLEAN := $(CLEAN) $(libparse-time-string_modules) $(dir)/libparse-time-string.a >> diff --git a/parse-time-string/README b/parse-time-string/README >> new file mode 100644 >> index 0000000..300ff1f >> --- /dev/null >> +++ b/parse-time-string/README >> @@ -0,0 +1,9 @@ >> +PARSE TIME STRING >> +================= >> + >> +parse_time_string() is a date/time parser originally written for >> +notmuch by Jani Nikula <jani@nikula.org>. However, there is nothing >> +notmuch specific in it, and it should be kept reusable for other >> +projects, and ready to be packaged on its own as needed. Please do not >> +add dependencies on or references to anything notmuch specific. The >> +parser should only depend on the C library. >> diff --git a/parse-time-string/parse-time-string.c b/parse-time-string/parse-time-string.c >> new file mode 100644 >> index 0000000..15cf686 >> --- /dev/null >> +++ b/parse-time-string/parse-time-string.c >> @@ -0,0 +1,1484 @@ >> +/* >> + * parse time string - user friendly date and time parser >> + * Copyright © 2012 Jani Nikula >> + * >> + * This program is free software: you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation, either version 2 of the License, or >> + * (at your option) any later version. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + * You should have received a copy of the GNU General Public License >> + * along with this program. If not, see <http://www.gnu.org/licenses/>. >> + * >> + * Author: Jani Nikula <jani@nikula.org> >> + */ >> + >> +#include <assert.h> >> +#include <ctype.h> >> +#include <errno.h> >> +#include <limits.h> >> +#include <stdio.h> >> +#include <stdarg.h> >> +#include <stdbool.h> >> +#include <stdlib.h> >> +#include <string.h> >> +#include <strings.h> >> +#include <time.h> >> +#include <sys/time.h> >> +#include <sys/types.h> >> + >> +#include "parse-time-string.h" >> + >> +/* >> + * IMPLEMENTATION DETAILS >> + * >> + * At a high level, the parsing is done in two phases: 1) actual >> + * parsing of the input string and storing the parsed data into >> + * 'struct state', and 2) processing of the data in 'struct state' >> + * according to current time (or provided reference time) and >> + * rounding. This is evident in the main entry point function >> + * parse_time_string(). >> + * >> + * 1) The parsing phase - parse_input() >> + * >> + * Parsing is greedy and happens from left to right. The parsing is as >> + * unambiguous as possible; only unambiguous date/time formats are >> + * accepted. Redundant or contradictory absolute date/time in the >> + * input (e.g. date specified multiple times/ways) is not >> + * accepted. Relative date/time on the other hand just accumulates if >> + * present multiple times (e.g. "5 days 5 days" just turns into 10 >> + * days). >> + * >> + * Parsing decisions are made on the input format, not value. For >> + * example, "20/5/2005" fails because the recognized format here is >> + * MM/D/YYYY, even though the values would suggest DD/M/YYYY. >> + * >> + * Parsing is mostly stateless in the sense that parsing decisions are >> + * not made based on the values of previously parsed data, or whether >> + * certain data is present in the first place. (There are a few >> + * exceptions to the latter part, though, such as parsing of time zone >> + * that would otherwise look like plain time.) > > I'm not sure that this "stateless" property brings us some advantage. I > think that it sometimes causes the results to be surprising at best (one > can also call them wrong). I improved the tests of your parsing library > (see the patch in the followup email) and added those cases. The advantages are unambiguity and implementation simplicity, two very clear goals I had to begin with. Greedy parsing left-to-right with minimal amount of decisions based on what's been parsed before, and exceptions clearly commented in the code. That said, I can see the issue you have with the test case, and I think I can fix it without sacrificing either goal. >> + * >> + * When the parser encounters a number that is not greedily parsed as >> + * part of a format, the interpretation is postponed until the next >> + * token is parsed. The parser for the next token may consume the >> + * previously postponed number. For example, when parsing "20 May" the >> + * meaning of "20" is not known until "May" is parsed. If the parser >> + * for the next token does not consume the postponed number, the >> + * number is handled as a "lone" number before parser for the next >> + * token finishes. >> + * >> + * 2) The processing phase - create_output() >> + * >> + * Once the parser in phase 1 has finished, 'struct state' contains >> + * all the information from the input string, and it's no longer >> + * needed. Since the parser does not even handle the concept of "now", > > If you aim at this being a generic library, I'd not call this "now" but > reference time, as you already do at other comments in your library. The term "now" is just pretty much used interchangeably with "reference time" in the code, because in the most common case that's what it is. It's a kind of "reference now", if you will. It will be set to the current time if not passed by the caller. > >> + * the processing initializes the fields referring to the current >> + * date/time. >> + * >> + * If requested, the result is rounded towards past or future. The >> + * idea behind rounding is to support parsing date/time ranges in an >> + * obvious way. For example, for a range defined as two dates (without >> + * time), one would typically want to have an inclusive range from the >> + * beginning of start date to the end of the end date. The caller >> + * would use rounding towards past in the start date, and towards >> + * future in the end date. >> + * >> + * The absolute date and time is shifted by the relative date and >> + * time, and time zone adjustments are made. Daylight saving time >> + * (DST) is specifically *not* handled at all. >> + * >> + * Finally, the result is stored to time_t. >> + */ >> + >> +#define unused(x) x __attribute__ ((unused)) >> + >> +/* XXX: Redefine these to add i18n support. The keyword table uses >> + * N_() to mark strings to be translated; they are accessed >> + * dynamically using _(). */ >> +#define _(s) (s) /* i18n: define as gettext (s) */ >> +#define N_(s) (s) /* i18n: define as gettext_noop (s) */ >> + >> +#define ARRAY_SIZE(a) (sizeof (a) / sizeof (a[0])) >> + >> +/* >> + * Field indices in the tm and set arrays of struct state. >> + * >> + * NOTE: There's some code that depends on the ordering of this enum. >> + */ >> +enum field { >> + /* Keep SEC...YEAR in this order. */ >> + TM_ABS_SEC, /* seconds */ >> + TM_ABS_MIN, /* minutes */ >> + TM_ABS_HOUR, /* hours */ >> + TM_ABS_MDAY, /* day of the month */ >> + TM_ABS_MON, /* month */ >> + TM_ABS_YEAR, /* year */ >> + >> + TM_ABS_WDAY, /* day of the week. special: may be relative */ >> + TM_ABS_ISDST, /* daylight saving time */ >> + >> + TM_AMPM, /* am vs. pm */ >> + TM_TZ, /* timezone in minutes */ >> + >> + /* Keep SEC...YEAR in this order. */ >> + TM_REL_SEC, /* seconds relative to now */ >> + TM_REL_MIN, /* minutes ... */ >> + TM_REL_HOUR, /* hours ... */ >> + TM_REL_DAY, /* days ... */ >> + TM_REL_MON, /* months ... */ >> + TM_REL_YEAR, /* years ... */ >> + TM_REL_WEEK, /* weeks ... */ >> + >> + TM_NONE, /* not a field */ >> + >> + TM_SIZE = TM_NONE, >> + TM_FIRST_ABS = TM_ABS_SEC, >> + TM_FIRST_REL = TM_REL_SEC, >> +}; >> + >> +/* Values for the set array of struct state. */ >> +enum field_set { >> + FIELD_UNSET, /* The field has not been touched by parser. */ >> + FIELD_SET, /* The field has been set by parser. */ >> + FIELD_NOW, /* The field will be set to "now". */ >> +}; >> + >> +static enum field >> +next_abs_field (enum field field) >> +{ >> + /* NOTE: Depends on the enum ordering. */ >> + return field < TM_ABS_YEAR ? field + 1 : TM_NONE; >> +} >> + >> +static enum field >> +abs_to_rel_field (enum field field) >> +{ >> + assert (field <= TM_ABS_YEAR); >> + >> + /* NOTE: Depends on the enum ordering. */ >> + return field + (TM_FIRST_REL - TM_FIRST_ABS); >> +} >> + >> +/* Get epoch value for field. */ >> +static int >> +field_epoch (enum field field) >> +{ >> + if (field == TM_ABS_MDAY || field == TM_ABS_MON) >> + return 1; >> + else if (field == TM_ABS_YEAR) >> + return 1970; >> + else >> + return 0; >> +} >> + >> +/* The parsing state. */ >> +struct state { >> + int tm[TM_SIZE]; /* parsed date and time */ >> + enum field_set set[TM_SIZE]; /* set status of tm */ >> + >> + enum field last_field; /* Previously set field. */ >> + enum field next_field; /* Next field for parse_postponed_number() */ > > next_field seems to be unused. Yes, it's RFU. >> + char delim; >> + >> + int postponed_length; /* Number of digits in postponed value. */ >> + int postponed_value; >> + char postponed_delim; /* The delimiter preceding postponed number. */ >> +}; >> + >> +/* >> + * Helpers for postponed numbers. >> + * >> + * postponed_length is the number of digits in postponed value. 0 >> + * means there is no postponed number. -1 means there is a postponed >> + * number, but it comes from a keyword, and it doesn't have digits. >> + */ >> +static int >> +get_postponed_length (struct state *state) >> +{ >> + return state->postponed_length; >> +} >> + >> +/* >> + * Consume a previously postponed number. Return true if a number was >> + * in fact postponed, false otherwise. Store the postponed number's >> + * value in *v, length in the input string in *n (or -1 if the number >> + * was written out and parsed as a keyword), and the preceding >> + * delimiter to *d. >> + */ >> +static bool >> +get_postponed_number (struct state *state, int *v, int *n, char *d) >> +{ >> + if (!state->postponed_length) >> + return false; >> + >> + if (n) >> + *n = state->postponed_length; >> + >> + if (v) >> + *v = state->postponed_value; >> + >> + if (d) >> + *d = state->postponed_delim; >> + >> + state->postponed_length = 0; >> + state->postponed_value = 0; >> + state->postponed_delim = 0; >> + >> + return true; >> +} >> + >> +/* Parse a previously postponed number if one exists. */ >> +static int parse_postponed_number (struct state *state, int v, int n, char d); >> +static int >> +handle_postponed_number (struct state *state, enum field next_field) >> +{ >> + int v = state->postponed_value; >> + int n = state->postponed_length; >> + char d = state->postponed_delim; >> + int r; >> + >> + if (!n) >> + return 0; >> + >> + state->postponed_value = 0; >> + state->postponed_length = 0; >> + state->postponed_delim = 0; >> + >> + state->next_field = next_field; >> + r = parse_postponed_number (state, v, n, d); >> + state->next_field = TM_NONE; >> + >> + return r; >> +} >> + >> +/* >> + * Postpone a number to be handled later. If one exists already, >> + * handle it first. n may be -1 to indicate a keyword that has no >> + * number length. >> + */ >> +static int >> +set_postponed_number (struct state *state, int v, int n) >> +{ >> + int r; >> + char d = state->delim; >> + >> + /* Parse a previously postponed number, if any. */ >> + r = handle_postponed_number (state, TM_NONE); >> + if (r) >> + return r; >> + >> + state->postponed_length = n; >> + state->postponed_value = v; >> + state->postponed_delim = d; >> + >> + return 0; >> +} >> + >> +static void >> +set_delim (struct state *state, char delim) >> +{ >> + state->delim = delim; >> +} >> + >> +static void >> +unset_delim (struct state *state) >> +{ >> + state->delim = 0; >> +} >> + >> +/* >> + * Field set/get/mod helpers. >> + */ >> + >> +/* Return true if field has been set. */ >> +static bool >> +is_field_set (struct state *state, enum field field) >> +{ >> + assert (field < ARRAY_SIZE (state->tm)); >> + >> + return field < ARRAY_SIZE (state->set) && >> + state->set[field] != FIELD_UNSET; >> +} >> + >> +static void >> +unset_field (struct state *state, enum field field) >> +{ >> + assert (field < ARRAY_SIZE (state->tm)); >> + >> + state->set[field] = FIELD_UNSET; >> + state->tm[field] = 0; >> +} >> + >> +/* >> + * Set field to value. A field can only be set once to ensure the >> + * input does not contain redundant and potentially conflicting data. >> + */ >> +static int >> +set_field (struct state *state, enum field field, int value) >> +{ >> + int r; >> + >> + assert (field < ARRAY_SIZE (state->tm)); >> + >> + /* Fields can only be set once. */ >> + if (field < ARRAY_SIZE (state->set) && state->set[field] != FIELD_UNSET) >> + return -PARSE_TIME_ERR_ALREADYSET; >> + >> + state->set[field] = FIELD_SET; >> + >> + /* Parse postponed number, if any. */ >> + r = handle_postponed_number (state, field); >> + if (r) >> + return r; >> + >> + unset_delim (state); >> + >> + state->tm[field] = value; >> + state->last_field = field; >> + >> + return 0; >> +} >> + >> +/* >> + * Mark n fields in fields to be set to current date/time in the >> + * specified time zone, or local timezone if not specified. The fields >> + * will be initialized after parsing is complete and timezone is >> + * known. >> + */ >> +static int >> +set_fields_to_now (struct state *state, enum field *fields, size_t n) >> +{ >> + size_t i; >> + int r; >> + >> + for (i = 0; i < n; i++) { >> + r = set_field (state, fields[i], 0); >> + if (r) >> + return r; >> + state->set[fields[i]] = FIELD_NOW; >> + } >> + >> + return 0; >> +} >> + >> +/* Modify field by adding value to it. To be used on relative fields, >> + * which can be modified multiple times (to accumulate). */ >> +static int >> +mod_field (struct state *state, enum field field, int value) >> +{ >> + int r; >> + >> + assert (field < ARRAY_SIZE (state->tm)); /* assert relative??? */ >> + >> + if (field < ARRAY_SIZE (state->set)) >> + state->set[field] = FIELD_SET; >> + >> + /* Parse postponed number, if any. */ >> + r = handle_postponed_number (state, field); >> + if (r) >> + return r; >> + >> + unset_delim (state); >> + >> + state->tm[field] += value; >> + state->last_field = field; >> + >> + return 0; >> +} >> + >> +/* >> + * Get field value. Make sure the field is set before query. It's most >> + * likely an error to call this while parsing (for example fields set >> + * as FIELD_NOW will only be set to some value after parsing). >> + */ >> +static int >> +get_field (struct state *state, enum field field) >> +{ >> + assert (field < ARRAY_SIZE (state->tm)); >> + >> + return state->tm[field]; >> +} >> + >> +/* >> + * Validity checkers. >> + */ >> +static bool is_valid_12hour (int h) >> +{ >> + return h >= 0 && h <= 12; >> +} >> + >> +static bool is_valid_time (int h, int m, int s) >> +{ >> + /* Allow 24:00:00 to denote end of day. */ >> + if (h == 24 && m == 0 && s == 0) >> + return true; >> + >> + return h >= 0 && h <= 23 && m >= 0 && m <= 59 && s >= 0 && s <= 59; >> +} >> + >> +static bool is_valid_mday (int mday) >> +{ >> + return mday >= 1 && mday <= 31; >> +} >> + >> +static bool is_valid_mon (int mon) >> +{ >> + return mon >= 1 && mon <= 12; >> +} >> + >> +static bool is_valid_year (int year) >> +{ >> + return year >= 1970; >> +} >> + >> +static bool is_valid_date (int year, int mon, int mday) >> +{ >> + return is_valid_year (year) && is_valid_mon (mon) && is_valid_mday (mday); >> +} >> + >> +/* Unset indicator for time and date set helpers. */ >> +#define UNSET -1 >> + >> +/* Time set helper. No input checking. Use UNSET (-1) to leave unset. */ >> +static int >> +set_abs_time (struct state *state, int hour, int min, int sec) >> +{ >> + int r; >> + >> + if (hour != UNSET) { >> + if ((r = set_field (state, TM_ABS_HOUR, hour))) >> + return r; >> + } >> + >> + if (min != UNSET) { >> + if ((r = set_field (state, TM_ABS_MIN, min))) >> + return r; >> + } >> + >> + if (sec != UNSET) { >> + if ((r = set_field (state, TM_ABS_SEC, sec))) >> + return r; >> + } >> + >> + return 0; >> +} >> + >> +/* Date set helper. No input checking. Use UNSET (-1) to leave unset. */ >> +static int >> +set_abs_date (struct state *state, int year, int mon, int mday) >> +{ >> + int r; >> + >> + if (year != UNSET) { >> + if ((r = set_field (state, TM_ABS_YEAR, year))) >> + return r; >> + } >> + >> + if (mon != UNSET) { >> + if ((r = set_field (state, TM_ABS_MON, mon))) >> + return r; >> + } >> + >> + if (mday != UNSET) { >> + if ((r = set_field (state, TM_ABS_MDAY, mday))) >> + return r; >> + } >> + >> + return 0; >> +} >> + >> +/* >> + * Keyword parsing and handling. >> + */ >> +struct keyword; >> +typedef int (*setter_t)(struct state *state, struct keyword *kw); >> + >> +struct keyword { >> + const char *name; /* keyword */ >> + enum field field; /* field to set, or FIELD_NONE if N/A */ >> + int value; /* value to set, or 0 if N/A */ >> + setter_t set; /* function to use for setting, if non-NULL */ >> +}; >> + >> +/* >> + * Setter callback functions for keywords. >> + */ >> +static int >> +kw_set_default (struct state *state, struct keyword *kw) >> +{ >> + return set_field (state, kw->field, kw->value); >> +} >> + >> +static int >> +kw_set_rel (struct state *state, struct keyword *kw) >> +{ >> + int multiplier = 1; >> + >> + /* Get a previously set multiplier, if any. */ >> + get_postponed_number (state, &multiplier, NULL, NULL); >> + >> + /* Accumulate relative field values. */ >> + return mod_field (state, kw->field, multiplier * kw->value); >> +} >> + >> +static int >> +kw_set_number (struct state *state, struct keyword *kw) >> +{ >> + /* -1 = no length, from keyword. */ >> + return set_postponed_number (state, kw->value, -1); >> +} >> + >> +static int >> +kw_set_month (struct state *state, struct keyword *kw) >> +{ >> + int n = get_postponed_length (state); >> + >> + /* Consume postponed number if it could be mday. This handles "20 >> + * January". */ >> + if (n == 1 || n == 2) { >> + int r, v; >> + >> + get_postponed_number (state, &v, NULL, NULL); >> + >> + if (!is_valid_mday (v)) >> + return -PARSE_TIME_ERR_INVALIDDATE; >> + >> + r = set_field (state, TM_ABS_MDAY, v); >> + if (r) >> + return r; >> + } >> + >> + return set_field (state, kw->field, kw->value); >> +} >> + >> +static int >> +kw_set_ampm (struct state *state, struct keyword *kw) >> +{ >> + int n = get_postponed_length (state); >> + >> + /* Consume postponed number if it could be hour. This handles >> + * "5pm". */ >> + if (n == 1 || n == 2) { >> + int r, v; >> + >> + get_postponed_number (state, &v, NULL, NULL); >> + >> + if (!is_valid_12hour (v)) >> + return -PARSE_TIME_ERR_INVALIDTIME; >> + >> + r = set_abs_time (state, v, 0, 0); >> + if (r) >> + return r; >> + } >> + >> + return set_field (state, kw->field, kw->value); >> +} >> + >> +static int >> +kw_set_timeofday (struct state *state, struct keyword *kw) >> +{ >> + return set_abs_time (state, kw->value, 0, 0); >> +} >> + >> +static int >> +kw_set_today (struct state *state, unused (struct keyword *kw)) >> +{ >> + enum field fields[] = { TM_ABS_YEAR, TM_ABS_MON, TM_ABS_MDAY }; >> + >> + return set_fields_to_now (state, fields, ARRAY_SIZE (fields)); >> +} >> + >> +static int >> +kw_set_now (struct state *state, unused (struct keyword *kw)) >> +{ >> + enum field fields[] = { TM_ABS_HOUR, TM_ABS_MIN, TM_ABS_SEC }; >> + >> + return set_fields_to_now (state, fields, ARRAY_SIZE (fields)); >> +} >> + >> +static int >> +kw_set_ordinal (struct state *state, struct keyword *kw) >> +{ >> + int n, v; >> + >> + /* Require a postponed number. */ >> + if (!get_postponed_number (state, &v, &n, NULL)) >> + return -PARSE_TIME_ERR_DATEFORMAT; >> + >> + /* Ordinals are mday. */ >> + if (n != 1 && n != 2) >> + return -PARSE_TIME_ERR_DATEFORMAT; >> + >> + /* Be strict about st, nd, rd, and lax about th. */ >> + if (strcasecmp (kw->name, "st") == 0 && v != 1 && v != 21 && v != 31) >> + return -PARSE_TIME_ERR_INVALIDDATE; >> + else if (strcasecmp (kw->name, "nd") == 0 && v != 2 && v != 22) >> + return -PARSE_TIME_ERR_INVALIDDATE; >> + else if (strcasecmp (kw->name, "rd") == 0 && v != 3 && v != 23) >> + return -PARSE_TIME_ERR_INVALIDDATE; >> + else if (strcasecmp (kw->name, "th") == 0 && !is_valid_mday (v)) >> + return -PARSE_TIME_ERR_INVALIDDATE; >> + >> + return set_field (state, TM_ABS_MDAY, v); >> +} >> + >> +/* >> + * Accepted keywords. >> + * >> + * A keyword may optionally contain a '|' to indicate the minimum >> + * match length. Without one, full match is required. It's advisable >> + * to keep the minimum match parts unique across all keywords. >> + * >> + * If keyword begins with upper case letter, then the matching will be >> + * case sensitive. Otherwise the matching is case insensitive. >> + * >> + * If setter is NULL, set_default will be used. >> + * >> + * Note: Order matters. Matching is greedy, longest match is used, but >> + * of equal length matches the first one is used, unless there's an >> + * equal length case sensitive match which trumps case insensitive >> + * matches. >> + */ >> +static struct keyword keywords[] = { >> + /* Weekdays. */ >> + { N_("sun|day"), TM_ABS_WDAY, 0, NULL }, >> + { N_("mon|day"), TM_ABS_WDAY, 1, NULL }, >> + { N_("tue|sday"), TM_ABS_WDAY, 2, NULL }, >> + { N_("wed|nesday"), TM_ABS_WDAY, 3, NULL }, >> + { N_("thu|rsday"), TM_ABS_WDAY, 4, NULL }, >> + { N_("fri|day"), TM_ABS_WDAY, 5, NULL }, >> + { N_("sat|urday"), TM_ABS_WDAY, 6, NULL }, >> + >> + /* Months. */ >> + { N_("jan|uary"), TM_ABS_MON, 1, kw_set_month }, >> + { N_("feb|ruary"), TM_ABS_MON, 2, kw_set_month }, >> + { N_("mar|ch"), TM_ABS_MON, 3, kw_set_month }, >> + { N_("apr|il"), TM_ABS_MON, 4, kw_set_month }, >> + { N_("may"), TM_ABS_MON, 5, kw_set_month }, >> + { N_("jun|e"), TM_ABS_MON, 6, kw_set_month }, >> + { N_("jul|y"), TM_ABS_MON, 7, kw_set_month }, >> + { N_("aug|ust"), TM_ABS_MON, 8, kw_set_month }, >> + { N_("sep|tember"), TM_ABS_MON, 9, kw_set_month }, >> + { N_("oct|ober"), TM_ABS_MON, 10, kw_set_month }, >> + { N_("nov|ember"), TM_ABS_MON, 11, kw_set_month }, >> + { N_("dec|ember"), TM_ABS_MON, 12, kw_set_month }, >> + >> + /* Durations. */ >> + { N_("y|ears"), TM_REL_YEAR, 1, kw_set_rel }, >> + { N_("w|eeks"), TM_REL_WEEK, 1, kw_set_rel }, >> + { N_("d|ays"), TM_REL_DAY, 1, kw_set_rel }, >> + { N_("h|ours"), TM_REL_HOUR, 1, kw_set_rel }, >> + { N_("hr|s"), TM_REL_HOUR, 1, kw_set_rel }, >> + { N_("m|inutes"), TM_REL_MIN, 1, kw_set_rel }, >> + /* M=months, m=minutes */ >> + { N_("M"), TM_REL_MON, 1, kw_set_rel }, >> + { N_("mins"), TM_REL_MIN, 1, kw_set_rel }, >> + { N_("mo|nths"), TM_REL_MON, 1, kw_set_rel }, >> + { N_("s|econds"), TM_REL_SEC, 1, kw_set_rel }, >> + { N_("secs"), TM_REL_SEC, 1, kw_set_rel }, >> + >> + /* Numbers. */ >> + { N_("one"), TM_NONE, 1, kw_set_number }, >> + { N_("two"), TM_NONE, 2, kw_set_number }, >> + { N_("three"), TM_NONE, 3, kw_set_number }, >> + { N_("four"), TM_NONE, 4, kw_set_number }, >> + { N_("five"), TM_NONE, 5, kw_set_number }, >> + { N_("six"), TM_NONE, 6, kw_set_number }, >> + { N_("seven"), TM_NONE, 7, kw_set_number }, >> + { N_("eight"), TM_NONE, 8, kw_set_number }, >> + { N_("nine"), TM_NONE, 9, kw_set_number }, >> + { N_("ten"), TM_NONE, 10, kw_set_number }, >> + { N_("dozen"), TM_NONE, 12, kw_set_number }, >> + { N_("hundred"), TM_NONE, 100, kw_set_number }, >> + >> + /* Special number forms. */ >> + { N_("this"), TM_NONE, 0, kw_set_number }, >> + { N_("last"), TM_NONE, 1, kw_set_number }, >> + >> + /* Other special keywords. */ >> + { N_("yesterday"), TM_REL_DAY, 1, kw_set_rel }, >> + { N_("today"), TM_NONE, 0, kw_set_today }, >> + { N_("now"), TM_NONE, 0, kw_set_now }, >> + { N_("noon"), TM_NONE, 12, kw_set_timeofday }, >> + { N_("midnight"), TM_NONE, 0, kw_set_timeofday }, >> + { N_("am"), TM_AMPM, 0, kw_set_ampm }, >> + { N_("a.m."), TM_AMPM, 0, kw_set_ampm }, >> + { N_("pm"), TM_AMPM, 1, kw_set_ampm }, >> + { N_("p.m."), TM_AMPM, 1, kw_set_ampm }, >> + { N_("st"), TM_NONE, 0, kw_set_ordinal }, >> + { N_("nd"), TM_NONE, 0, kw_set_ordinal }, >> + { N_("rd"), TM_NONE, 0, kw_set_ordinal }, >> + { N_("th"), TM_NONE, 0, kw_set_ordinal }, >> + >> + /* Timezone codes: offset in minutes. XXX: Add more codes. */ >> + { N_("pst"), TM_TZ, -8*60, NULL }, >> + { N_("mst"), TM_TZ, -7*60, NULL }, >> + { N_("cst"), TM_TZ, -6*60, NULL }, >> + { N_("est"), TM_TZ, -5*60, NULL }, >> + { N_("ast"), TM_TZ, -4*60, NULL }, >> + { N_("nst"), TM_TZ, -(3*60+30), NULL }, >> + >> + { N_("gmt"), TM_TZ, 0, NULL }, >> + { N_("utc"), TM_TZ, 0, NULL }, >> + >> + { N_("wet"), TM_TZ, 0, NULL }, >> + { N_("cet"), TM_TZ, 1*60, NULL }, >> + { N_("eet"), TM_TZ, 2*60, NULL }, >> + { N_("fet"), TM_TZ, 3*60, NULL }, >> + >> + { N_("wat"), TM_TZ, 1*60, NULL }, >> + { N_("cat"), TM_TZ, 2*60, NULL }, >> + { N_("eat"), TM_TZ, 3*60, NULL }, >> +}; >> + >> +/* >> + * Compare strings s and keyword. Return number of matching chars on >> + * match, 0 for no match. Match must be at least n chars, or all of >> + * keyword if n < 0, otherwise it's not a match. Use match_case for >> + * case sensitive matching. >> + */ >> +static size_t >> +stringcmp (const char *s, const char *keyword, ssize_t n, bool match_case) >> +{ >> + ssize_t i; >> + >> + if (!n) >> + return 0; >> + >> + for (i = 0; *s && *keyword; i++, s++, keyword++) { >> + if (match_case) { >> + if (*s != *keyword) >> + break; >> + } else { >> + if (tolower ((unsigned char) *s) != >> + tolower ((unsigned char) *keyword)) >> + break; >> + } >> + } >> + >> + if (n > 0) >> + return i < n ? 0 : i; >> + else >> + return *keyword ? 0 : i; >> +} >> + >> +/* >> + * Parse a keyword. Return < 0 on error, number of parsed chars on >> + * success. >> + */ >> +static ssize_t >> +parse_keyword (struct state *state, const char *s) >> +{ >> + unsigned int i; >> + size_t n, max_n = 0; >> + struct keyword *kw = NULL; >> + int r; >> + >> + /* Match longest keyword */ >> + for (i = 0; i < ARRAY_SIZE (keywords); i++) { >> + /* Match case if keyword begins with upper case letter. */ >> + bool mcase = isupper ((unsigned char) keywords[i].name[0]); >> + ssize_t minlen = -1; >> + char keyword[128]; >> + char *p; >> + >> + strncpy (keyword, _(keywords[i].name), sizeof (keyword)); >> + >> + /* Truncate too long keywords. XXX: Make this dynamic? */ >> + keyword[sizeof (keyword) - 1] = '\0'; >> + >> + /* Minimum match length. */ >> + p = strchr (keyword, '|'); >> + if (p) { >> + minlen = p - keyword; >> + >> + /* Remove the minimum match length separator. */ >> + memmove (p, p + 1, strlen (p + 1) + 1); >> + } >> + >> + n = stringcmp (s, keyword, minlen, mcase); >> + if (n > max_n || (n == max_n && mcase)) { >> + max_n = n; >> + kw = &keywords[i]; >> + } >> + } >> + >> + if (!kw) >> + return -PARSE_TIME_ERR_KEYWORD; >> + >> + if (kw->set) >> + r = kw->set (state, kw); >> + else >> + r = kw_set_default (state, kw); >> + >> + if (r < 0) >> + return r; >> + >> + return max_n; >> +} >> + >> +/* >> + * Non-keyword parsers and their helpers. >> + */ >> + >> +static int >> +set_user_tz (struct state *state, char sign, int hour, int min) >> +{ >> + int tz = hour * 60 + min; >> + >> + assert (sign == '+' || sign == '-'); >> + >> + if (hour < 0 || hour > 14 || min < 0 || min > 59 || min % 15) >> + return -PARSE_TIME_ERR_INVALIDTIME; >> + >> + if (sign == '-') >> + tz = -tz; >> + >> + return set_field (state, TM_TZ, tz); >> +} >> + >> +/* >> + * Independent parsing of a postponed number when it wasn't consumed >> + * during parsing of the following token. >> + */ >> +static int >> +parse_postponed_number (struct state *state, int v, int n, char d) >> +{ >> + /* >> + * alright, these are really lone, won't affect parsing of >> + * following items... it's not a multiplier, those have been eaten >> + * away. >> + * >> + * also note numbers eaten away by parse_single_number. >> + */ >> + >> + assert (n < 8); >> + >> + if (n == 1 || n == 2) { > > I think that guessing the meaning of a number based on its length is a > way to hell. Again, see tests in the followup patch. I disagree. It's a much better way than guessing based on the numberic value, for instance. Anyway, we only end up here because the number's meaning hasn't become clear while processing the previous or the next token. The issue you hit is because of the handling in parse_single_number() below. And that will be fixed. >> + /* Notable exception: Previous field affects parsing. This >> + * handles "January 20". */ >> + if (state->last_field == TM_ABS_MON) { >> + /* D[D] */ >> + if (!is_valid_mday (v)) >> + return -PARSE_TIME_ERR_INVALIDDATE; >> + >> + return set_field (state, TM_ABS_MDAY, v); >> + } else if (n == 2) { >> + /* XXX: Only allow if last field is hour, min, or sec? */ >> + if (d == '+' || d == '-') { >> + /* +/-HH */ >> + return set_user_tz (state, d, v, 0); >> + } >> + } >> + } else if (n == 4) { >> + /* Notable exception: Value affects parsing. Time zones are >> + * always at most 1400 and we don't understand years before >> + * 1970. */ >> + if (!is_valid_year (v)) { >> + if (d == '+' || d == '-') { >> + /* +/-HHMM */ >> + return set_user_tz (state, d, v / 100, v % 100); >> + } >> + } else { >> + /* YYYY */ >> + return set_field (state, TM_ABS_YEAR, v); >> + } >> + } else if (n == 6) { >> + /* HHMMSS */ >> + int hour = v / 10000; >> + int min = (v / 100) % 100; >> + int sec = v % 100; >> + >> + if (!is_valid_time (hour, min, sec)) >> + return -PARSE_TIME_ERR_INVALIDTIME; >> + >> + return set_abs_time (state, hour, min, sec); >> + } >> + >> + /* else n is one of {-1, 3, 5, 7 } */ >> + >> + return -PARSE_TIME_ERR_FORMAT; >> +} >> + >> +/* Parse a single number. Typically postpone parsing until later. */ >> +static int >> +parse_single_number (struct state *state, unsigned long v, >> + unsigned long n) >> +{ >> + assert (n); >> + >> + /* Parse things that can be parsed immediately. */ >> + if (n == 8) { >> + /* YYYYMMDD */ >> + int year = v / 10000; >> + int mon = (v / 100) % 100; >> + int mday = v % 100; >> + >> + if (!is_valid_date (year, mon, mday)) >> + return -PARSE_TIME_ERR_INVALIDDATE; >> + >> + return set_abs_date (state, year, mon, mday); >> + } else if (n > 8) { >> + /* XXX: Seconds since epoch. */ >> + return -PARSE_TIME_ERR_FORMAT; >> + } >> + >> + if (v > INT_MAX) >> + return -PARSE_TIME_ERR_FORMAT; >> + >> + return set_postponed_number (state, v, n); >> +} >> + >> +static bool >> +is_time_sep (char c) >> +{ >> + return c == ':'; >> +} >> + >> +static bool >> +is_date_sep (char c) >> +{ >> + return c == '/' || c == '-' || c == '.'; >> +} >> + >> +static bool >> +is_sep (char c) >> +{ >> + return is_time_sep (c) || is_date_sep (c); >> +} >> + >> +/* Two-digit year: 00...69 is 2000s, 70...99 1900s, if n == 0 keep >> + * unset. */ >> +static int >> +expand_year (unsigned long year, size_t n) >> +{ >> + if (n == 2) { >> + return (year < 70 ? 2000 : 1900) + year; >> + } else if (n == 4) { >> + return year; >> + } else { >> + return UNSET; >> + } >> +} >> + >> +/* Parse a date number triplet. */ >> +static int >> +parse_date (struct state *state, char sep, >> + unsigned long v1, unsigned long v2, unsigned long v3, >> + size_t n1, size_t n2, size_t n3) >> +{ >> + int year = UNSET, mon = UNSET, mday = UNSET; >> + >> + assert (is_date_sep (sep)); >> + >> + switch (sep) { >> + case '/': /* Date: M[M]/D[D][/YY[YY]] or M[M]/YYYY */ >> + if (n1 != 1 && n1 != 2) >> + return -PARSE_TIME_ERR_DATEFORMAT; >> + >> + if ((n2 == 1 || n2 == 2) && (n3 == 0 || n3 == 2 || n3 == 4)) { >> + /* M[M]/D[D][/YY[YY]] */ >> + year = expand_year (v3, n3); >> + mon = v1; >> + mday = v2; >> + } else if (n2 == 4 && n3 == 0) { >> + /* M[M]/YYYY */ >> + year = v2; >> + mon = v1; >> + } else { >> + return -PARSE_TIME_ERR_DATEFORMAT; >> + } >> + break; >> + >> + case '-': /* Date: YYYY-MM[-DD] or DD-MM[-YY[YY]] or MM-YYYY */ >> + if (n1 == 4 && n2 == 2 && (n3 == 0 || n3 == 2)) { >> + /* YYYY-MM[-DD] */ >> + year = v1; >> + mon = v2; >> + if (n3) >> + mday = v3; >> + } else if (n1 == 2 && n2 == 2 && (n3 == 0 || n3 == 2 || n3 == 4)) { >> + /* DD-MM[-YY[YY]] */ >> + year = expand_year (v3, n3); >> + mon = v2; >> + mday = v1; >> + } else if (n1 == 2 && n2 == 4 && n3 == 0) { >> + /* MM-YYYY */ >> + year = v2; >> + mon = v1; >> + } else { >> + return -PARSE_TIME_ERR_DATEFORMAT; >> + } >> + break; >> + >> + case '.': /* Date: D[D].M[M][.[YY[YY]]] */ >> + if ((n1 != 1 && n1 != 2) || (n2 != 1 && n2 != 2) || >> + (n3 != 0 && n3 != 2 && n3 != 4)) >> + return -PARSE_TIME_ERR_DATEFORMAT; >> + >> + year = expand_year (v3, n3); >> + mon = v2; >> + mday = v1; >> + break; >> + } >> + >> + if (year != UNSET && !is_valid_year (year)) >> + return -PARSE_TIME_ERR_INVALIDDATE; >> + >> + if (mon != UNSET && !is_valid_mon (mon)) >> + return -PARSE_TIME_ERR_INVALIDDATE; >> + >> + if (mday != UNSET && !is_valid_mday (mday)) >> + return -PARSE_TIME_ERR_INVALIDDATE; >> + >> + return set_abs_date (state, year, mon, mday); >> +} >> + >> +/* Parse a time number triplet. */ >> +static int >> +parse_time (struct state *state, char sep, >> + unsigned long v1, unsigned long v2, unsigned long v3, >> + size_t n1, size_t n2, size_t n3) >> +{ >> + assert (is_time_sep (sep)); >> + >> + if ((n1 != 1 && n1 != 2) || n2 != 2 || (n3 != 0 && n3 != 2)) >> + return -PARSE_TIME_ERR_TIMEFORMAT; >> + >> + /* >> + * Notable exception: Previously set fields affect >> + * parsing. Interpret (+|-)HH:MM as time zone only if hour and >> + * minute have been set. >> + * >> + * XXX: This could be fixed by restricting the delimiters >> + * preceding time. For '+' it would be justified, but for '-' it >> + * might be inconvenient. However prefer to allow '-' as an >> + * insignificant delimiter preceding time for convenience, and >> + * handle '+' the same way for consistency between positive and >> + * negative time zones. >> + */ >> + if (is_field_set (state, TM_ABS_HOUR) && >> + is_field_set (state, TM_ABS_MIN) && >> + n1 == 2 && n2 == 2 && n3 == 0 && >> + (state->delim == '+' || state->delim == '-')) { >> + return set_user_tz (state, state->delim, v1, v2); >> + } >> + >> + if (!is_valid_time (v1, v2, v3)) >> + return -PARSE_TIME_ERR_INVALIDTIME; >> + >> + return set_abs_time (state, v1, v2, n3 ? v3 : 0); >> +} >> + >> +/* strtoul helper that assigns length. */ >> +static unsigned long >> +strtoul_len (const char *s, const char **endp, size_t *len) >> +{ >> + unsigned long val = strtoul (s, (char **) endp, 10); >> + >> + *len = *endp - s; >> + return val; >> +} >> + >> +/* >> + * Parse a (group of) number(s). Return < 0 on error, number of parsed >> + * chars on success. >> + */ >> +static ssize_t >> +parse_number (struct state *state, const char *s) >> +{ >> + int r; >> + unsigned long v1, v2, v3 = 0; >> + size_t n1, n2, n3 = 0; >> + const char *p = s; >> + char sep; >> + >> + v1 = strtoul_len (p, &p, &n1); >> + >> + if (is_sep (*p) && isdigit ((unsigned char) *(p + 1))) { >> + sep = *p; >> + v2 = strtoul_len (p + 1, &p, &n2); >> + } else { >> + /* A single number. */ >> + r = parse_single_number (state, v1, n1); >> + if (r) >> + return r; >> + >> + return p - s; >> + } >> + >> + /* A group of two or three numbers? */ >> + if (*p == sep && isdigit ((unsigned char) *(p + 1))) >> + v3 = strtoul_len (p + 1, &p, &n3); >> + >> + if (is_time_sep (sep)) >> + r = parse_time (state, sep, v1, v2, v3, n1, n2, n3); >> + else >> + r = parse_date (state, sep, v1, v2, v3, n1, n2, n3); >> + >> + if (r) >> + return r; >> + >> + return p - s; >> +} >> + >> +/* >> + * Parse delimiter(s). Throw away all except the last one, which is >> + * stored for parsing the next non-delimiter. Return < 0 on error, >> + * number of parsed chars on success. >> + * >> + * XXX: We might want to be more strict here. >> + */ >> +static ssize_t >> +parse_delim (struct state *state, const char *s) >> +{ >> + const char *p = s; >> + >> + /* >> + * Skip non-alpha and non-digit, and store the last for further >> + * processing. >> + */ >> + while (*p && !isalnum ((unsigned char) *p)) { >> + set_delim (state, *p); >> + p++; >> + } >> + >> + return p - s; >> +} >> + >> +/* >> + * Parse a date/time string. Return < 0 on error, number of parsed >> + * chars on success. >> + */ >> +static ssize_t >> +parse_input (struct state *state, const char *s) >> +{ >> + const char *p = s; >> + ssize_t n; >> + int r; >> + >> + while (*p) { >> + if (isalpha ((unsigned char) *p)) { >> + n = parse_keyword (state, p); >> + } else if (isdigit ((unsigned char) *p)) { >> + n = parse_number (state, p); >> + } else { >> + n = parse_delim (state, p); >> + } >> + >> + if (n <= 0) { >> + if (n == 0) >> + n = -PARSE_TIME_ERR; >> + >> + return n; >> + } >> + >> + p += n; >> + } >> + >> + /* Parse postponed number, if any. */ >> + r = handle_postponed_number (state, TM_NONE); >> + if (r < 0) >> + return r; >> + >> + return p - s; >> +} >> + >> +/* >> + * Processing the parsed input. >> + */ >> + >> +/* >> + * Initialize reference time to tm. Use time zone in state if >> + * specified, otherwise local time. Use now for reference time if >> + * non-NULL, otherwise current time. >> + */ >> +static int >> +initialize_now (struct state *state, struct tm *tm, const time_t *now) >> +{ >> + time_t t; >> + >> + if (now) { >> + t = *now; >> + } else { >> + if (time (&t) == (time_t) -1) >> + return -PARSE_TIME_ERR_LIB; >> + } >> + >> + if (is_field_set (state, TM_TZ)) { >> + /* Some other time zone. */ >> + >> + /* Adjust now according to the TZ. */ >> + t += get_field (state, TM_TZ) * 60; >> + >> + /* It's not gm, but this doesn't mess with the TZ. */ >> + if (gmtime_r (&t, tm) == NULL) >> + return -PARSE_TIME_ERR_LIB; >> + } else { >> + /* Local time. */ >> + if (localtime_r (&t, tm) == NULL) >> + return -PARSE_TIME_ERR_LIB; >> + } >> + >> + return 0; >> +} >> + >> +/* >> + * Normalize tm according to mktime(3). Both mktime(3) and >> + * localtime_r(3) use local time, but they cancel each other out here, >> + * making this function agnostic to time zone. >> + */ >> +static int >> +normalize_tm (struct tm *tm) >> +{ >> + time_t t = mktime (tm); >> + >> + if (t == (time_t) -1) >> + return -PARSE_TIME_ERR_LIB; >> + >> + if (!localtime_r (&t, tm)) >> + return -PARSE_TIME_ERR_LIB; >> + >> + return 0; >> +} >> + >> +/* Get field out of a struct tm. */ >> +static int >> +tm_get_field (const struct tm *tm, enum field field) >> +{ >> + switch (field) { >> + case TM_ABS_SEC: return tm->tm_sec; >> + case TM_ABS_MIN: return tm->tm_min; >> + case TM_ABS_HOUR: return tm->tm_hour; >> + case TM_ABS_MDAY: return tm->tm_mday; >> + case TM_ABS_MON: return tm->tm_mon + 1; /* 0- to 1-based */ >> + case TM_ABS_YEAR: return 1900 + tm->tm_year; >> + case TM_ABS_WDAY: return tm->tm_wday; >> + case TM_ABS_ISDST: return tm->tm_isdst; >> + default: >> + assert (false); >> + break; >> + } >> + >> + return 0; >> +} >> + >> +/* Modify hour according to am/pm setting. */ >> +static int >> +fixup_ampm (struct state *state) >> +{ >> + int hour, hdiff = 0; >> + >> + if (!is_field_set (state, TM_AMPM)) >> + return 0; >> + >> + if (!is_field_set (state, TM_ABS_HOUR)) >> + return -PARSE_TIME_ERR_TIMEFORMAT; >> + >> + hour = get_field (state, TM_ABS_HOUR); >> + if (!is_valid_12hour (hour)) >> + return -PARSE_TIME_ERR_INVALIDTIME; >> + >> + if (get_field (state, TM_AMPM)) { >> + /* 12pm is noon. */ >> + if (hour != 12) >> + hdiff = 12; >> + } else { >> + /* 12am is midnight, beginning of day. */ >> + if (hour == 12) >> + hdiff = -12; >> + } >> + >> + mod_field (state, TM_REL_HOUR, -hdiff); >> + >> + return 0; >> +} >> + >> +/* Combine absolute and relative fields, and round. */ >> +static int >> +create_output (struct state *state, time_t *t_out, const time_t *tnow, >> + int round) >> +{ >> + struct tm tm = { .tm_isdst = -1 }; >> + struct tm now; >> + time_t t; >> + enum field f; >> + int r; >> + int week_round = PARSE_TIME_NO_ROUND; >> + >> + r = initialize_now (state, &now, tnow); >> + if (r) >> + return r; >> + >> + /* Initialize uninitialized fields to now. */ >> + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { >> + if (state->set[f] == FIELD_NOW) { >> + state->tm[f] = tm_get_field (&now, f); >> + state->set[f] = FIELD_SET; >> + } >> + } >> + >> + /* >> + * If MON is set but YEAR is not, refer to past month. >> + * >> + * XXX: Why are month/week special in this regard? What about >> + * mday, or time. Should refer to past. >> + */ >> + if (is_field_set (state, TM_ABS_MON) && >> + !is_field_set (state, TM_ABS_YEAR)) { >> + if (get_field (state, TM_ABS_MON) >= tm_get_field (&now, TM_ABS_MON)) >> + mod_field (state, TM_REL_YEAR, 1); >> + } >> + >> + /* >> + * If WDAY is set but MDAY is not, we consider WDAY relative >> + * >> + * XXX: This fails on stuff like "two months ago monday" because >> + * two months ago wasn't the same day as today. Postpone until we >> + * know date? >> + */ >> + if (is_field_set (state, TM_ABS_WDAY) && >> + !is_field_set (state, TM_ABS_MDAY)) { >> + int wday = get_field (state, TM_ABS_WDAY); >> + int today = tm_get_field (&now, TM_ABS_WDAY); >> + int rel_days; >> + >> + if (today > wday) >> + rel_days = today - wday; >> + else >> + rel_days = today + 7 - wday; >> + >> + /* This also prevents special week rounding from happening. */ >> + mod_field (state, TM_REL_DAY, rel_days); >> + >> + unset_field (state, TM_ABS_WDAY); >> + } >> + >> + r = fixup_ampm (state); >> + if (r) >> + return r; >> + >> + /* >> + * Iterate fields from most accurate to least accurate, and set >> + * unset fields according to requested rounding. >> + */ >> + for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) { >> + if (round != PARSE_TIME_NO_ROUND) { >> + enum field r = abs_to_rel_field (f); >> + >> + if (is_field_set (state, f) || is_field_set (state, r)) { >> + if (round >= PARSE_TIME_ROUND_UP) >> + mod_field (state, r, -1); >> + round = PARSE_TIME_NO_ROUND; /* No more rounding. */ >> + } else { >> + if (f == TM_ABS_MDAY && >> + is_field_set (state, TM_REL_WEEK)) { >> + /* Week is most accurate. */ >> + week_round = round; >> + round = PARSE_TIME_NO_ROUND; >> + } else { >> + set_field (state, f, field_epoch (f)); >> + } >> + } >> + } >> + >> + if (!is_field_set (state, f)) >> + set_field (state, f, tm_get_field (&now, f)); >> + } >> + >> + /* Special case: rounding with week accuracy. */ >> + if (week_round != PARSE_TIME_NO_ROUND) { >> + /* Temporarily set more accurate fields to now. */ >> + set_field (state, TM_ABS_SEC, tm_get_field (&now, TM_ABS_SEC)); >> + set_field (state, TM_ABS_MIN, tm_get_field (&now, TM_ABS_MIN)); >> + set_field (state, TM_ABS_HOUR, tm_get_field (&now, TM_ABS_HOUR)); >> + set_field (state, TM_ABS_MDAY, tm_get_field (&now, TM_ABS_MDAY)); >> + } >> + >> + /* >> + * Set all fields. They may contain out of range values before >> + * normalization by mktime(3). >> + */ >> + tm.tm_sec = get_field (state, TM_ABS_SEC) - get_field (state, TM_REL_SEC); >> + tm.tm_min = get_field (state, TM_ABS_MIN) - get_field (state, TM_REL_MIN); >> + tm.tm_hour = get_field (state, TM_ABS_HOUR) - get_field (state, TM_REL_HOUR); >> + tm.tm_mday = get_field (state, TM_ABS_MDAY) - >> + get_field (state, TM_REL_DAY) - 7 * get_field (state, TM_REL_WEEK); >> + tm.tm_mon = get_field (state, TM_ABS_MON) - get_field (state, TM_REL_MON); >> + tm.tm_mon--; /* 1- to 0-based */ >> + tm.tm_year = get_field (state, TM_ABS_YEAR) - get_field (state, TM_REL_YEAR) - 1900; >> + >> + /* >> + * It's always normal time. >> + * >> + * XXX: This is probably not a solution that universally >> + * works. Just make sure DST is not taken into account. We don't >> + * want rounding to be affected by DST. >> + */ >> + tm.tm_isdst = -1; >> + >> + /* Special case: rounding with week accuracy. */ >> + if (week_round != PARSE_TIME_NO_ROUND) { >> + /* Normalize to get proper tm.wday. */ >> + r = normalize_tm (&tm); >> + if (r < 0) >> + return r; >> + >> + /* Set more accurate fields back to zero. */ >> + tm.tm_sec = 0; >> + tm.tm_min = 0; >> + tm.tm_hour = 0; >> + tm.tm_isdst = -1; >> + >> + /* Monday is the true 1st day of week, but this is easier. */ >> + if (week_round <= PARSE_TIME_ROUND_DOWN) >> + tm.tm_mday -= tm.tm_wday; >> + else >> + tm.tm_mday += 7 - tm.tm_wday; >> + } >> + >> + if (is_field_set (state, TM_TZ)) { >> + /* tm is in specified TZ, convert to UTC for timegm(3). */ >> + tm.tm_min -= get_field (state, TM_TZ); >> + t = timegm (&tm); >> + } else { >> + /* tm is in local time. */ >> + t = mktime (&tm); >> + } >> + >> + if (t == (time_t) -1) >> + return -PARSE_TIME_ERR_LIB; >> + >> + *t_out = t; >> + >> + return 0; >> +} >> + >> +/* Internally, all errors are < 0. parse_time_string() returns errors > 0. */ >> +#define EXTERNAL_ERR(r) (-r) >> + >> +int >> +parse_time_string (const char *s, time_t *t, const time_t *now, int round) >> +{ >> + struct state state = { .last_field = TM_NONE }; >> + int r; >> + >> + if (!s || !t) >> + return EXTERNAL_ERR (-PARSE_TIME_ERR); >> + >> + r = parse_input (&state, s); >> + if (r < 0) >> + return EXTERNAL_ERR (r); >> + >> + r = create_output (&state, t, now, round); >> + if (r < 0) >> + return EXTERNAL_ERR (r); >> + >> + return 0; >> +} >> diff --git a/parse-time-string/parse-time-string.h b/parse-time-string/parse-time-string.h >> new file mode 100644 >> index 0000000..50b7c6f >> --- /dev/null >> +++ b/parse-time-string/parse-time-string.h >> @@ -0,0 +1,95 @@ >> +/* >> + * parse time string - user friendly date and time parser >> + * Copyright © 2012 Jani Nikula >> + * >> + * This program is free software: you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License as published by >> + * the Free Software Foundation, either version 2 of the License, or >> + * (at your option) any later version. >> + * >> + * This program is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + * You should have received a copy of the GNU General Public License >> + * along with this program. If not, see <http://www.gnu.org/licenses/>. >> + * >> + * Author: Jani Nikula <jani@nikula.org> >> + */ >> + >> +#ifndef PARSE_TIME_STRING_H >> +#define PARSE_TIME_STRING_H >> + >> +#ifdef __cplusplus >> +extern "C" { >> +#endif >> + >> +#include <time.h> >> + >> +/* return values for parse_time_string() */ >> +enum { >> + PARSE_TIME_OK = 0, >> + PARSE_TIME_ERR, /* unspecified error */ >> + PARSE_TIME_ERR_LIB, /* library call failed */ >> + PARSE_TIME_ERR_ALREADYSET, /* attempt to set unit twice */ >> + PARSE_TIME_ERR_FORMAT, /* generic date/time format error */ >> + PARSE_TIME_ERR_DATEFORMAT, /* date format error */ >> + PARSE_TIME_ERR_TIMEFORMAT, /* time format error */ >> + PARSE_TIME_ERR_INVALIDDATE, /* date value error */ >> + PARSE_TIME_ERR_INVALIDTIME, /* time value error */ >> + PARSE_TIME_ERR_KEYWORD, /* unknown keyword */ >> +}; >> + >> +/* round values for parse_time_string() */ >> +enum { >> + PARSE_TIME_ROUND_DOWN = -1, >> + PARSE_TIME_NO_ROUND = 0, >> + PARSE_TIME_ROUND_UP = 1, >> +}; >> + >> +/** >> + * parse_time_string() - user friendly date and time parser >> + * @s: string to parse >> + * @t: pointer to time_t to store parsed time in >> + * @now: pointer to time_t containing reference date/time, or NULL >> + * @round: PARSE_TIME_NO_ROUND, PARSE_TIME_ROUND_DOWN, or >> + * PARSE_TIME_ROUND_UP >> + * >> + * Parse a date/time string 's' and store the parsed date/time result >> + * in 't'. >> + * >> + * A reference date/time is used for determining the "date/time units" >> + * (roughly equivalent to struct tm members) not specified by 's'. If >> + * 'now' is non-NULL, it must contain a pointer to a time_t to be used >> + * as reference date/time. Otherwise, the current time is used. >> + * >> + * If 's' does not specify a full date/time, the 'round' parameter >> + * specifies if and how the result should be rounded as follows: >> + * >> + * PARSE_TIME_NO_ROUND: All date/time units that are not specified >> + * by 's' are set to the corresponding unit derived from the >> + * reference date/time. >> + * >> + * PARSE_TIME_ROUND_DOWN: All date/time units that are more accurate >> + * than the most accurate unit specified by 's' are set to the >> + * smallest valid value for that unit. Rest of the unspecified units >> + * are set as in PARSE_TIME_NO_ROUND. >> + * >> + * PARSE_TIME_ROUND_UP: All date/time units that are more accurate >> + * than the most accurate unit specified by 's' are set to the >> + * smallest valid value for that unit. The most accurate unit >> + * specified by 's' is incremented by one (and this is rolled over >> + * to the less accurate units as necessary). Rest of the unspecified >> + * units are set as in PARSE_TIME_NO_ROUND. > > Why you round down and increase the most accurate unit? If I want to see > emails that were send yesterday, I do not want to see any email that was > sent the first second of today. (OK, I know that this is slightly easier > to implement) It's easy to agree that yesterday's messages should not include messages from the first second of today. It's not even too difficult to implement that. But doing that in this API would feel like rounding 0.6 up and getting 0.9999... as a result. I'll look at adding a separate rounding mode to keep the API generic while better support the sole user of the API. >> + * >> + * Return 0 (PARSE_TIME_OK) for succesfully parsed date/time, or one >> + * of PARSE_TIME_ERR_* on error. 't' is not modified on error. >> + */ >> +int parse_time_string (const char *s, time_t *t, const time_t *now, int round); > > now -> ref? Perhaps. Not a big deal IMO. > >> + >> +#ifdef __cplusplus >> +} >> +#endif >> + >> +#endif /* PARSE_TIME_STRING_H */ >> -- >> 1.7.9.5 > > -Michal BR, Jani. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch 2012-10-03 18:49 ` Jani Nikula @ 2012-10-03 19:02 ` Michal Sojka 0 siblings, 0 replies; 30+ messages in thread From: Michal Sojka @ 2012-10-03 19:02 UTC (permalink / raw) To: Jani Nikula, notmuch, David Bremner On Wed, Oct 03 2012, Jani Nikula wrote: > On Tue, 25 Sep 2012, Michal Sojka <sojkam1@fel.cvut.cz> wrote: >> Hello Jani, >> >> On Wed, Sep 12 2012, Jani Nikula wrote: >>> Add a date/time parser to notmuch, to be used for adding date range >>> query support for notmuch lib later on. Add the parser to a directory >>> of its own to make it independent of the rest of the notmuch code >>> base. >> >> First of all, thank you very much for pushing this towards mainline. >> This is definitely one of the features I miss in notmuch most. >> >> Some comments below. > > Thanks for the comments; sorry about the delay in responding. No problem :) [...] >>> +/** >>> + * parse_time_string() - user friendly date and time parser >>> + * @s: string to parse >>> + * @t: pointer to time_t to store parsed time in >>> + * @now: pointer to time_t containing reference date/time, or NULL >>> + * @round: PARSE_TIME_NO_ROUND, PARSE_TIME_ROUND_DOWN, or >>> + * PARSE_TIME_ROUND_UP >>> + * >>> + * Parse a date/time string 's' and store the parsed date/time result >>> + * in 't'. >>> + * >>> + * A reference date/time is used for determining the "date/time units" >>> + * (roughly equivalent to struct tm members) not specified by 's'. If >>> + * 'now' is non-NULL, it must contain a pointer to a time_t to be used >>> + * as reference date/time. Otherwise, the current time is used. >>> + * >>> + * If 's' does not specify a full date/time, the 'round' parameter >>> + * specifies if and how the result should be rounded as follows: >>> + * >>> + * PARSE_TIME_NO_ROUND: All date/time units that are not specified >>> + * by 's' are set to the corresponding unit derived from the >>> + * reference date/time. >>> + * >>> + * PARSE_TIME_ROUND_DOWN: All date/time units that are more accurate >>> + * than the most accurate unit specified by 's' are set to the >>> + * smallest valid value for that unit. Rest of the unspecified units >>> + * are set as in PARSE_TIME_NO_ROUND. >>> + * >>> + * PARSE_TIME_ROUND_UP: All date/time units that are more accurate >>> + * than the most accurate unit specified by 's' are set to the >>> + * smallest valid value for that unit. The most accurate unit >>> + * specified by 's' is incremented by one (and this is rolled over >>> + * to the less accurate units as necessary). Rest of the unspecified >>> + * units are set as in PARSE_TIME_NO_ROUND. >> >> Why you round down and increase the most accurate unit? If I want to see >> emails that were send yesterday, I do not want to see any email that was >> sent the first second of today. (OK, I know that this is slightly easier >> to implement) > > It's easy to agree that yesterday's messages should not include messages > from the first second of today. It's not even too difficult to implement > that. But doing that in this API would feel like rounding 0.6 up and > getting 0.9999... as a result. > > I'll look at adding a separate rounding mode to keep the API generic > while better support the sole user of the API. I agree that the operation I want here should not be called rounding. Maybe, you can use a term from set theory: supremum or prehaps maximum (seconds are countable). Cheers, -Michal ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 3/9] test: add new test tool parse-time for date/time parser 2012-09-12 21:27 [PATCH v3 0/9] notmuch search date:since..until query support Jani Nikula 2012-09-12 21:27 ` [PATCH v3 1/9] build: drop the -Wswitch-enum warning Jani Nikula 2012-09-12 21:27 ` [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch Jani Nikula @ 2012-09-12 21:27 ` Jani Nikula 2012-09-12 21:27 ` [PATCH v3 4/9] test: add smoke tests for the date/time parser module Jani Nikula ` (5 subsequent siblings) 8 siblings, 0 replies; 30+ messages in thread From: Jani Nikula @ 2012-09-12 21:27 UTC (permalink / raw) To: notmuch, David Bremner Add a smoke testing tool to support testing the date/time parser module directly and independent of the rest of notmuch. --- test/Makefile.local | 7 ++- test/basic | 2 +- test/parse-time.c | 145 +++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 152 insertions(+), 2 deletions(-) create mode 100644 test/parse-time.c diff --git a/test/Makefile.local b/test/Makefile.local index 45df4c7..9ae130a 100644 --- a/test/Makefile.local +++ b/test/Makefile.local @@ -19,9 +19,13 @@ $(dir)/smtp-dummy: $(smtp_dummy_modules) $(dir)/symbol-test: $(dir)/symbol-test.o $(call quiet,CXX) $^ -o $@ -Llib -lnotmuch $(XAPIAN_LDFLAGS) +$(dir)/parse-time: $(dir)/parse-time.o parse-time-string/parse-time-string.o + $(call quiet,CC) $^ -o $@ + .PHONY: test check -test-binaries: $(dir)/arg-test $(dir)/smtp-dummy $(dir)/symbol-test +test-binaries: $(dir)/arg-test $(dir)/smtp-dummy $(dir)/symbol-test \ + $(dir)/parse-time test: all test-binaries @${dir}/notmuch-test $(OPTIONS) @@ -32,4 +36,5 @@ SRCS := $(SRCS) $(smtp_dummy_srcs) CLEAN := $(CLEAN) $(dir)/smtp-dummy $(dir)/smtp-dummy.o \ $(dir)/symbol-test $(dir)/symbol-test.o \ $(dir)/arg-test $(dir)/arg-test.o \ + $(dir)/parse-time $(dir)/parse-time.o \ $(dir)/corpus.mail $(dir)/test-results $(dir)/tmp.* diff --git a/test/basic b/test/basic index 3b635c8..c47197c 100755 --- a/test/basic +++ b/test/basic @@ -54,7 +54,7 @@ test_begin_subtest 'Ensure that all available tests will be run by notmuch-test' eval $(sed -n -e '/^TESTS="$/,/^"$/p' $TEST_DIRECTORY/notmuch-test) tests_in_suite=$(for i in $TESTS; do echo $i; done | sort) available=$(find "$TEST_DIRECTORY" -maxdepth 1 -type f -perm +111 | \ - sed -r -e "s,.*/,," -e "/^(aggregate-results.sh|notmuch-test|smtp-dummy|test-verbose|symbol-test|arg-test)$/d" | \ + sed -r -e "s,.*/,," -e "/^(aggregate-results.sh|notmuch-test|smtp-dummy|test-verbose|symbol-test|arg-test|parse-time)$/d" | \ sort) test_expect_equal "$tests_in_suite" "$available" diff --git a/test/parse-time.c b/test/parse-time.c new file mode 100644 index 0000000..b4de76b --- /dev/null +++ b/test/parse-time.c @@ -0,0 +1,145 @@ +/* + * parse time string - user friendly date and time parser + * Copyright © 2012 Jani Nikula + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + * + * Author: Jani Nikula <jani@nikula.org> + */ + +#include <getopt.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "parse-time-string.h" + +/* + * concat argv[start]...argv[end - 1], separating them by a single + * space, to a malloced string + */ +static char * +concat_args (int start, int end, char *argv[]) +{ + int i; + size_t len = 1; + char *p; + + for (i = start; i < end; i++) + len += strlen (argv[i]) + 1; + + p = malloc (len); + if (!p) + return NULL; + + *p = 0; + + for (i = start; i < end; i++) { + if (i != start) + strcat (p, " "); + strcat (p, argv[i]); + } + + return p; +} + +#define DEFAULT_FORMAT "%a %b %d %T %z %Y" + +static void +usage (const char *name) +{ + printf ("Usage: %s [options ...] <date/time>\n\n", name); + printf ( + "Parse <date/time> and display it in given format.\n\n" + " -f, --format=FMT output format, FMT according to strftime(3)\n" + " (default: \"%s\")\n" + " -n, --now=N use N seconds since epoch as now (default: now)\n" + " -u, --up round result up (default: no rounding)\n" + " -d, --down round result down (default: no rounding)\n" + " -h, --help print this help\n", + DEFAULT_FORMAT); +} + +int +main (int argc, char *argv[]) +{ + int r; + struct tm tm; + time_t result; + time_t now; + time_t *nowp = NULL; + char *argstr; + int round = PARSE_TIME_NO_ROUND; + char buf[1024]; + const char *format = DEFAULT_FORMAT; + struct option options[] = { + { "help", no_argument, NULL, 'h' }, + { "up", no_argument, NULL, 'u' }, + { "down", no_argument, NULL, 'd' }, + { "format", required_argument, NULL, 'f' }, + { "now", required_argument, NULL, 'n' }, + { NULL, 0, NULL, 0 }, + }; + + for (;;) { + int c; + + c = getopt_long (argc, argv, "hudf:n:", options, NULL); + if (c == -1) + break; + + switch (c) { + case 'f': + /* output format */ + format = optarg; + break; + case 'u': + round = PARSE_TIME_ROUND_UP; + break; + case 'd': + round = PARSE_TIME_ROUND_DOWN; + break; + case 'n': + /* specify now in seconds since epoch */ + now = (time_t) strtol (optarg, NULL, 10); + if (now >= (time_t) 0) + nowp = &now; + break; + case 'h': + case '?': + default: + usage (argv[0]); + return 1; + } + } + + argstr = concat_args (optind, argc, argv); + if (!argstr) + return 1; + + r = parse_time_string (argstr, &result, nowp, round); + + free (argstr); + + if (r) + return 1; + + if (!localtime_r (&result, &tm)) + return 1; + + strftime (buf, sizeof (buf), format, &tm); + printf ("%s\n", buf); + + return 0; +} -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 4/9] test: add smoke tests for the date/time parser module 2012-09-12 21:27 [PATCH v3 0/9] notmuch search date:since..until query support Jani Nikula ` (2 preceding siblings ...) 2012-09-12 21:27 ` [PATCH v3 3/9] test: add new test tool parse-time for date/time parser Jani Nikula @ 2012-09-12 21:27 ` Jani Nikula 2012-09-25 12:05 ` [PATCH] test: Improve " Michal Sojka 2012-09-12 21:27 ` [PATCH v3 5/9] build: build parse-time-string as part of the notmuch lib and static cli Jani Nikula ` (4 subsequent siblings) 8 siblings, 1 reply; 30+ messages in thread From: Jani Nikula @ 2012-09-12 21:27 UTC (permalink / raw) To: notmuch, David Bremner Test the date/time parser module directly. Just a small sanity test initially. --- test/notmuch-test | 1 + test/parse-time-string | 26 ++++++++++++++++++++++++++ 2 files changed, 27 insertions(+) create mode 100755 test/parse-time-string diff --git a/test/notmuch-test b/test/notmuch-test index cc732c3..7eadfdf 100755 --- a/test/notmuch-test +++ b/test/notmuch-test @@ -60,6 +60,7 @@ TESTS=" emacs-hello emacs-show missing-headers + parse-time-string " TESTS=${NOTMUCH_TESTS:=$TESTS} diff --git a/test/parse-time-string b/test/parse-time-string new file mode 100755 index 0000000..34b80d7 --- /dev/null +++ b/test/parse-time-string @@ -0,0 +1,26 @@ +#!/usr/bin/env bash +test_description="date/time parser module" +. ./test-lib.sh + +# This is currently just a quick sanity/smoke test. + +_date () +{ + date -d "$*" +%s +} + +_parse_time () +{ + ${TEST_DIRECTORY}/parse-time --format=%s "$*" +} + +test_begin_subtest "date(1) default format without TZ code" +test_expect_equal "$(_parse_time Fri Aug 3 23:06:06 2012)" "$(_date Fri Aug 3 23:06:06 2012)" + +test_begin_subtest "date(1) --rfc-2822 format" +test_expect_equal "$(_parse_time Fri, 03 Aug 2012 23:07:46 +0100)" "$(_date Fri, 03 Aug 2012 23:07:46 +0100)" + +test_begin_subtest "date(1) --rfc=3339=seconds format" +test_expect_equal "$(_parse_time 2012-08-03 23:09:37+03:00)" "$(_date 2012-08-03 23:09:37+03:00)" + +test_done -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH] test: Improve tests for the date/time parser module 2012-09-12 21:27 ` [PATCH v3 4/9] test: add smoke tests for the date/time parser module Jani Nikula @ 2012-09-25 12:05 ` Michal Sojka 2012-10-03 20:32 ` Jani Nikula 0 siblings, 1 reply; 30+ messages in thread From: Michal Sojka @ 2012-09-25 12:05 UTC (permalink / raw) To: Jani Nikula, notmuch, David Bremner This patch reworks date/time parser library test program to make it easier to to write the actual tests. It also modifies the notmuch test script and adds several new tests to it. The INPUT file for the test contains both the dates to be parsed as well as the "expected" results. The test program outputs the results in the same format and replaces expected results with real results. Currently, the "expected" results in the INPUT file correspond to the real results, so the test passes. Some results are, however, different from what I would expect - this is mentioned in the comments after '#'. This patch applies on top of Jani's patchset. --- It can be seen that there are several errors and unexpected results. As I've already written, I'm not sure that the approach taken by this library is the right one. I tend to agree with mina86, that using a more systematic approach (such as bison) would be beneficial. This is however not to say to throw this patchset away. Either Jani will be able to fix all the corner cases. Or we can work together to develop a better solution - add support for ranges to the bison parser. -Michal diff --git a/test/Makefile.local b/test/Makefile.local index 9ae130a..b9105c7 100644 --- a/test/Makefile.local +++ b/test/Makefile.local @@ -20,7 +20,7 @@ $(dir)/symbol-test: $(dir)/symbol-test.o $(call quiet,CXX) $^ -o $@ -Llib -lnotmuch $(XAPIAN_LDFLAGS) $(dir)/parse-time: $(dir)/parse-time.o parse-time-string/parse-time-string.o - $(call quiet,CC) $^ -o $@ + $(call quiet,CC) $^ -o $@ -lrt .PHONY: test check diff --git a/test/parse-time-string b/test/parse-time-string index 34b80d7..265437c 100755 --- a/test/parse-time-string +++ b/test/parse-time-string @@ -14,13 +14,48 @@ _parse_time () ${TEST_DIRECTORY}/parse-time --format=%s "$*" } -test_begin_subtest "date(1) default format without TZ code" -test_expect_equal "$(_parse_time Fri Aug 3 23:06:06 2012)" "$(_date Fri Aug 3 23:06:06 2012)" +test_begin_subtest "Date parser tests" +cat <<EOF > INPUT +now -> Tue Jan 11 11:11:00 +0000 2011 +2010-1-1 -> parse_time_string() error: 5 +Jan 2 -> Sat Jan 02 11:11:00 +0000 2010 # Why 2010? +Mon -> Mon Jan 10 11:11:00 +0000 2011 +last Friday -> parse_time_string() error: 4 +2 hours ago -> parse_time_string() error: 1 +last month -> Sat Dec 11 11:11:00 +0000 2010 +month ago -> parse_time_string() error: 1 +8am -> Tue Jan 11 08:00:00 +0000 2011 +9:15 -> Tue Jan 11 09:15:00 +0000 2011 +12:34 -> Tue Jan 11 12:34:00 +0000 2011 +monday -> Mon Jan 10 11:11:00 +0000 2011 +yesterday -> Mon Jan 10 11:11:00 +0000 2011 +tomorrow -> parse_time_string() error: 1 + -> Tue Jan 11 11:11:00 +0000 2011 # Shouldn't empty string return an error??? -test_begin_subtest "date(1) --rfc-2822 format" -test_expect_equal "$(_parse_time Fri, 03 Aug 2012 23:07:46 +0100)" "$(_date Fri, 03 Aug 2012 23:07:46 +0100)" +Aug 3 23:06:06 2012 -> Fri Aug 03 23:06:06 +0000 2012 # date(1) default format without TZ code +Fri, 03 Aug 2012 23:07:46 +0100 -> Fri Aug 03 22:07:46 +0000 2012 # rfc-2822 +2012-08-03 23:09:37+03:00 -> Fri Aug 03 20:09:37 +0000 2012 # rfc-3339 seconds -test_begin_subtest "date(1) --rfc=3339=seconds format" -test_expect_equal "$(_parse_time 2012-08-03 23:09:37+03:00)" "$(_date 2012-08-03 23:09:37+03:00)" +10s -> Tue Jan 11 11:10:50 +0000 2011 +19701223s -> Wed Dec 23 11:10:59 +0000 1970 # Surprising - number is parsed as date and 's' as '1 second' +19701223 -> Wed Dec 23 11:11:00 +0000 1970 + +19701223 +0100 -> Wed Dec 23 11:11:00 +0000 1970 # Timezone is ignored without an error + +today ^-> Wed Jan 12 00:00:00 +0000 2011 # This should be 11 23:59:59 +today v-> Tue Jan 11 00:00:00 +0000 2011 + +thisweek ^-> Sun Jan 16 00:00:00 +0000 2011 # This should be Sunday 23:59:59 +thisweek v-> Sun Jan 09 00:00:00 +0000 2011 # This should be Monday 00:00:00 + +two months ago-> parse_time_string() error: 1 # Comments in the code suggest that this is supported +two months -> Thu Nov 11 11:11:00 +0000 2010 + +1348569850 -> parse_time_string() error: 4 # Seconds since epoch not yet supported? Backward compatibility in notmuch??? +10 -> parse_time_string() error: 4 # Seconds since epoch? +EOF + +${TEST_DIRECTORY}/parse-time --now="Tue Jan 11 11:11:00 +0000 2011" < INPUT > OUTPUT +test_expect_equal_file INPUT OUTPUT test_done diff --git a/test/parse-time.c b/test/parse-time.c index b4de76b..0415f49 100644 --- a/test/parse-time.c +++ b/test/parse-time.c @@ -18,59 +18,47 @@ * Author: Jani Nikula <jani@nikula.org> */ + +#define _XOPEN_SOURCE 500 /* for strptime() and snprintf() */ #include <getopt.h> #include <stdio.h> #include <stdlib.h> #include <string.h> +#include <time.h> #include "parse-time-string.h" -/* - * concat argv[start]...argv[end - 1], separating them by a single - * space, to a malloced string - */ -static char * -concat_args (int start, int end, char *argv[]) -{ - int i; - size_t len = 1; - char *p; - - for (i = start; i < end; i++) - len += strlen (argv[i]) + 1; - - p = malloc (len); - if (!p) - return NULL; - - *p = 0; - - for (i = start; i < end; i++) { - if (i != start) - strcat (p, " "); - strcat (p, argv[i]); - } - - return p; -} - #define DEFAULT_FORMAT "%a %b %d %T %z %Y" static void usage (const char *name) { - printf ("Usage: %s [options ...] <date/time>\n\n", name); + printf ("Usage: %s [options ...]\n\n", name); printf ( - "Parse <date/time> and display it in given format.\n\n" - " -f, --format=FMT output format, FMT according to strftime(3)\n" - " (default: \"%s\")\n" - " -n, --now=N use N seconds since epoch as now (default: now)\n" - " -u, --up round result up (default: no rounding)\n" - " -d, --down round result down (default: no rounding)\n" - " -h, --help print this help\n", + "Parse date/time read from stdin and display it in given format.\n\n" + " -f, --format=FMT output format for dates and input format for --now,\n" + " FMT according to strftime(3) (default: \"%s\")\n" + " -n, --now=N reference date in FMT (default: now)\n" + " -h, --help print this help\n" + "\n" + "stdin should contain one date/time per line in the following format:\n" + " <date/time> [ <arrow> [ comment ] ]\n" + "where <arrow> determines the operation performed on the <date/time>.\n" + "It can be one of '->', '^->', 'v->' meaning convert, convert and round\n" + "up, convert and round down, respectively.\n", DEFAULT_FORMAT); } +static const char * +get_round_str (int round) +{ + switch (round) { + case PARSE_TIME_ROUND_UP: return "^"; + case PARSE_TIME_ROUND_DOWN: return "v"; + default: return ""; + } +} + int main (int argc, char *argv[]) { @@ -79,14 +67,10 @@ main (int argc, char *argv[]) time_t result; time_t now; time_t *nowp = NULL; - char *argstr; int round = PARSE_TIME_NO_ROUND; - char buf[1024]; const char *format = DEFAULT_FORMAT; struct option options[] = { { "help", no_argument, NULL, 'h' }, - { "up", no_argument, NULL, 'u' }, - { "down", no_argument, NULL, 'd' }, { "format", required_argument, NULL, 'f' }, { "now", required_argument, NULL, 'n' }, { NULL, 0, NULL, 0 }, @@ -111,8 +95,13 @@ main (int argc, char *argv[]) round = PARSE_TIME_ROUND_DOWN; break; case 'n': - /* specify now in seconds since epoch */ - now = (time_t) strtol (optarg, NULL, 10); + memset (&tm, 0, sizeof (tm)); + char *parsed = strptime (optarg, format, &tm); + if (!parsed) { + fprintf (stderr, "Cannot parse reference date: %s\n", optarg); + return 1; + } + now = mktime (&tm); if (now >= (time_t) 0) nowp = &now; break; @@ -124,22 +113,47 @@ main (int argc, char *argv[]) } } - argstr = concat_args (optind, argc, argv); - if (!argstr) - return 1; - - r = parse_time_string (argstr, &result, nowp, round); - - free (argstr); - - if (r) - return 1; - - if (!localtime_r (&result, &tm)) - return 1; - - strftime (buf, sizeof (buf), format, &tm); - printf ("%s\n", buf); + char input[BUFSIZ]; + while (fgets (input, BUFSIZ, stdin) && input[0]) { + if (input[0] == '\n') { + printf ("\n"); + continue; + } + char *arrow; + char *comment = strrchr (input, '#'); + arrow = strstr (input, "->"); + round = PARSE_TIME_NO_ROUND; + if (arrow > input) { + switch (arrow[-1]) { + case '^': round = PARSE_TIME_ROUND_UP; arrow--; break; + case 'v': round = PARSE_TIME_ROUND_DOWN; arrow--; break; + default: break; + } + } + if (arrow) + *arrow = 0; + else + arrow = input + strlen (input); /* XXX: comment is not handled */ + while (arrow > input && arrow[-1] == '\n') + arrow--; + *arrow-- = 0; + + r = parse_time_string (input, &result, nowp, round); + char resstr[BUFSIZ]; + if (r) + snprintf (resstr, sizeof(resstr), "parse_time_string() error: %d", r); + else if (!localtime_r (&result, &tm)) + snprintf (resstr, sizeof(resstr), "localtime(result) error"); + else + strftime (resstr, sizeof (resstr), format, &tm); + + char buf[BUFSIZ]; + snprintf (buf, sizeof(buf), "%s%s-> %s", input, get_round_str (round), resstr); + if (!comment) + printf ("%s\n", buf); + else + printf ("%-*s%s", (int)(comment - input), buf, comment); + } return 0; } ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH] test: Improve tests for the date/time parser module 2012-09-25 12:05 ` [PATCH] test: Improve " Michal Sojka @ 2012-10-03 20:32 ` Jani Nikula 2012-10-04 11:02 ` Michal Sojka 0 siblings, 1 reply; 30+ messages in thread From: Jani Nikula @ 2012-10-03 20:32 UTC (permalink / raw) To: Michal Sojka, notmuch, David Bremner On Tue, 25 Sep 2012, Michal Sojka <sojkam1@fel.cvut.cz> wrote: > This patch reworks date/time parser library test program to make it > easier to to write the actual tests. It also modifies the notmuch test > script and adds several new tests to it. Cool! > The INPUT file for the test contains both the dates to be parsed as well > as the "expected" results. The test program outputs the results in the > same format and replaces expected results with real results. Currently, > the "expected" results in the INPUT file correspond to the real results, > so the test passes. Some results are, however, different from what I > would expect - this is mentioned in the comments after '#'. Please see comments inline next to tests. > > This patch applies on top of Jani's patchset. > --- > It can be seen that there are several errors and unexpected results. > As I've already written, I'm not sure that the approach taken by this > library is the right one. I tend to agree with mina86, that using a > more systematic approach (such as bison) would be beneficial. Then we just have to agree to disagree here. :) > This is however not to say to throw this patchset away. Either Jani > will be able to fix all the corner cases. Or we can work together to > develop a better solution - add support for ranges to the bison > parser. I think I've got the cases pretty much covered, and they're mostly not about syntax and parsing, but rather semantics; what to do with the parsed data. > -Michal > > diff --git a/test/Makefile.local b/test/Makefile.local > index 9ae130a..b9105c7 100644 > --- a/test/Makefile.local > +++ b/test/Makefile.local > @@ -20,7 +20,7 @@ $(dir)/symbol-test: $(dir)/symbol-test.o > $(call quiet,CXX) $^ -o $@ -Llib -lnotmuch $(XAPIAN_LDFLAGS) > > $(dir)/parse-time: $(dir)/parse-time.o parse-time-string/parse-time-string.o > - $(call quiet,CC) $^ -o $@ > + $(call quiet,CC) $^ -o $@ -lrt > > .PHONY: test check > > diff --git a/test/parse-time-string b/test/parse-time-string > index 34b80d7..265437c 100755 > --- a/test/parse-time-string > +++ b/test/parse-time-string > @@ -14,13 +14,48 @@ _parse_time () > ${TEST_DIRECTORY}/parse-time --format=%s "$*" > } > > -test_begin_subtest "date(1) default format without TZ code" > -test_expect_equal "$(_parse_time Fri Aug 3 23:06:06 2012)" "$(_date Fri Aug 3 23:06:06 2012)" > +test_begin_subtest "Date parser tests" > +cat <<EOF > INPUT > +now -> Tue Jan 11 11:11:00 +0000 2011 > +2010-1-1 -> parse_time_string() error: 5 I think that's invalid per ISO 8601. > +Jan 2 -> Sat Jan 02 11:11:00 +0000 2010 # Why 2010? This is an interesting bug. The idea was that specifying a month without year would always refer to past. So when you give Jan 2011 in the reference time, Jan refers to the previous year. Seems simple, but looking closer, also "Jan 2 this year" would end up in 2010. Not good. It would probably be possible to fix this, but per the simplicity of implementation and unambiguity goals, I think I'll just make them refer to current year, at least for now. This may mean having to look into supporting "last {monthname,weekday}" for better usability, but I'll leave that as a future improvement. > +Mon -> Mon Jan 10 11:11:00 +0000 2011 > +last Friday -> parse_time_string() error: 4 "last <weekday>" is not supported. > +2 hours ago -> parse_time_string() error: 1 "ago" is not supported. > +last month -> Sat Dec 11 11:11:00 +0000 2010 > +month ago -> parse_time_string() error: 1 Ditto. > +8am -> Tue Jan 11 08:00:00 +0000 2011 > +9:15 -> Tue Jan 11 09:15:00 +0000 2011 > +12:34 -> Tue Jan 11 12:34:00 +0000 2011 > +monday -> Mon Jan 10 11:11:00 +0000 2011 > +yesterday -> Mon Jan 10 11:11:00 +0000 2011 > +tomorrow -> parse_time_string() error: 1 "tomorrow" is not supported (do you get a lot of mail from the future? ;) > + -> Tue Jan 11 11:11:00 +0000 2011 # Shouldn't empty string return an error??? *shrug* It just starts with the reference time, and finds nothing to add or remove. Let's call it a feature. > > -test_begin_subtest "date(1) --rfc-2822 format" > -test_expect_equal "$(_parse_time Fri, 03 Aug 2012 23:07:46 +0100)" "$(_date Fri, 03 Aug 2012 23:07:46 +0100)" > +Aug 3 23:06:06 2012 -> Fri Aug 03 23:06:06 +0000 2012 # date(1) default format without TZ code > +Fri, 03 Aug 2012 23:07:46 +0100 -> Fri Aug 03 22:07:46 +0000 2012 # rfc-2822 > +2012-08-03 23:09:37+03:00 -> Fri Aug 03 20:09:37 +0000 2012 # rfc-3339 seconds > > -test_begin_subtest "date(1) --rfc=3339=seconds format" > -test_expect_equal "$(_parse_time 2012-08-03 23:09:37+03:00)" "$(_date 2012-08-03 23:09:37+03:00)" > +10s -> Tue Jan 11 11:10:50 +0000 2011 > +19701223s -> Wed Dec 23 11:10:59 +0000 1970 # Surprising - number is parsed as date and 's' as '1 second' Will be fixed. > +19701223 -> Wed Dec 23 11:11:00 +0000 1970 > + > +19701223 +0100 -> Wed Dec 23 11:11:00 +0000 1970 # Timezone is ignored without an error It's not ignored. Date is specified, but the time comes from the reference time. It's the same absolute time regardless of the timezone. > + > +today ^-> Wed Jan 12 00:00:00 +0000 2011 # This should be 11 23:59:59 See my previous mail. Can be fixed. > +today v-> Tue Jan 11 00:00:00 +0000 2011 > + > +thisweek ^-> Sun Jan 16 00:00:00 +0000 2011 # This should be Sunday 23:59:59 > +thisweek v-> Sun Jan 09 00:00:00 +0000 2011 # This should be Monday 00:00:00 Implementation simplicity and dodging a localization issue. Start of the week is based on the tm_mday field of struct tm, where 0 == Sunday. Even if I personally agree Monday is the 1st day of the week. > + > +two months ago-> parse_time_string() error: 1 # Comments in the code suggest that this is supported When in doubt, trust code over comments. ;) > +two months -> Thu Nov 11 11:11:00 +0000 2010 > + > +1348569850 -> parse_time_string() error: 4 # Seconds since epoch not yet supported? Backward compatibility in notmuch??? > +10 -> parse_time_string() error: 4 # Seconds since epoch? Indeed, seconds since epoch not yet supported. There is no backwards compatibility issue, as you can still use the <initial-timestamp>..<final-timestamp> format as described in notmuch-search-terms(7) man page. The new date:<since>..<until> just doesn't support seconds since epoch yet. And I think I'll make the syntax "@<timestamp>" to let you have "<really-big-number> seconds" without surprises. > +EOF > + > +${TEST_DIRECTORY}/parse-time --now="Tue Jan 11 11:11:00 +0000 2011" < INPUT > OUTPUT > +test_expect_equal_file INPUT OUTPUT > > test_done > diff --git a/test/parse-time.c b/test/parse-time.c > index b4de76b..0415f49 100644 > --- a/test/parse-time.c > +++ b/test/parse-time.c > @@ -18,59 +18,47 @@ > * Author: Jani Nikula <jani@nikula.org> > */ > > + > +#define _XOPEN_SOURCE 500 /* for strptime() and snprintf() */ > #include <getopt.h> > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > +#include <time.h> > > #include "parse-time-string.h" > > -/* > - * concat argv[start]...argv[end - 1], separating them by a single > - * space, to a malloced string > - */ > -static char * > -concat_args (int start, int end, char *argv[]) > -{ > - int i; > - size_t len = 1; > - char *p; > - > - for (i = start; i < end; i++) > - len += strlen (argv[i]) + 1; > - > - p = malloc (len); > - if (!p) > - return NULL; > - > - *p = 0; > - > - for (i = start; i < end; i++) { > - if (i != start) > - strcat (p, " "); > - strcat (p, argv[i]); > - } > - > - return p; > -} > - > #define DEFAULT_FORMAT "%a %b %d %T %z %Y" > > static void > usage (const char *name) > { > - printf ("Usage: %s [options ...] <date/time>\n\n", name); > + printf ("Usage: %s [options ...]\n\n", name); > printf ( > - "Parse <date/time> and display it in given format.\n\n" > - " -f, --format=FMT output format, FMT according to strftime(3)\n" > - " (default: \"%s\")\n" > - " -n, --now=N use N seconds since epoch as now (default: now)\n" > - " -u, --up round result up (default: no rounding)\n" > - " -d, --down round result down (default: no rounding)\n" > - " -h, --help print this help\n", > + "Parse date/time read from stdin and display it in given format.\n\n" > + " -f, --format=FMT output format for dates and input format for --now,\n" > + " FMT according to strftime(3) (default: \"%s\")\n" > + " -n, --now=N reference date in FMT (default: now)\n" > + " -h, --help print this help\n" > + "\n" > + "stdin should contain one date/time per line in the following format:\n" > + " <date/time> [ <arrow> [ comment ] ]\n" > + "where <arrow> determines the operation performed on the <date/time>.\n" > + "It can be one of '->', '^->', 'v->' meaning convert, convert and round\n" > + "up, convert and round down, respectively.\n", > DEFAULT_FORMAT); > } > > +static const char * > +get_round_str (int round) > +{ > + switch (round) { > + case PARSE_TIME_ROUND_UP: return "^"; > + case PARSE_TIME_ROUND_DOWN: return "v"; > + default: return ""; > + } > +} > + > int > main (int argc, char *argv[]) > { > @@ -79,14 +67,10 @@ main (int argc, char *argv[]) > time_t result; > time_t now; > time_t *nowp = NULL; > - char *argstr; > int round = PARSE_TIME_NO_ROUND; > - char buf[1024]; > const char *format = DEFAULT_FORMAT; > struct option options[] = { > { "help", no_argument, NULL, 'h' }, > - { "up", no_argument, NULL, 'u' }, > - { "down", no_argument, NULL, 'd' }, > { "format", required_argument, NULL, 'f' }, > { "now", required_argument, NULL, 'n' }, > { NULL, 0, NULL, 0 }, > @@ -111,8 +95,13 @@ main (int argc, char *argv[]) > round = PARSE_TIME_ROUND_DOWN; > break; > case 'n': > - /* specify now in seconds since epoch */ > - now = (time_t) strtol (optarg, NULL, 10); > + memset (&tm, 0, sizeof (tm)); > + char *parsed = strptime (optarg, format, &tm); One of the problems with strptime is that it doesn't support time zones, which is why I chose not to use it here. (You can specify %z in the format to ignore it, but it looks like it's ignored no matter what. *shrug*) Combined with mktime below, you introduce possible TZ and DST variations in the tests, which can be problematic. So perhaps we should keep the reference time as a timestamp here. I didn't look at this test tool patch very closely yet, but in general I like the very much increased clarity in the tests. I'll look into this more when I have a moment. Thanks for the tests, comments, and corner cases. They've been helpful. BR, Jani. > + if (!parsed) { > + fprintf (stderr, "Cannot parse reference date: %s\n", optarg); > + return 1; > + } > + now = mktime (&tm); > if (now >= (time_t) 0) > nowp = &now; > break; > @@ -124,22 +113,47 @@ main (int argc, char *argv[]) > } > } > > - argstr = concat_args (optind, argc, argv); > - if (!argstr) > - return 1; > - > - r = parse_time_string (argstr, &result, nowp, round); > - > - free (argstr); > - > - if (r) > - return 1; > - > - if (!localtime_r (&result, &tm)) > - return 1; > - > - strftime (buf, sizeof (buf), format, &tm); > - printf ("%s\n", buf); > + char input[BUFSIZ]; > + while (fgets (input, BUFSIZ, stdin) && input[0]) { > + if (input[0] == '\n') { > + printf ("\n"); > + continue; > + } > + char *arrow; > + char *comment = strrchr (input, '#'); > + arrow = strstr (input, "->"); > + round = PARSE_TIME_NO_ROUND; > + if (arrow > input) { > + switch (arrow[-1]) { > + case '^': round = PARSE_TIME_ROUND_UP; arrow--; break; > + case 'v': round = PARSE_TIME_ROUND_DOWN; arrow--; break; > + default: break; > + } > + } > + if (arrow) > + *arrow = 0; > + else > + arrow = input + strlen (input); /* XXX: comment is not handled */ > + while (arrow > input && arrow[-1] == '\n') > + arrow--; > + *arrow-- = 0; > + > + r = parse_time_string (input, &result, nowp, round); > + char resstr[BUFSIZ]; > + if (r) > + snprintf (resstr, sizeof(resstr), "parse_time_string() error: %d", r); > + else if (!localtime_r (&result, &tm)) > + snprintf (resstr, sizeof(resstr), "localtime(result) error"); > + else > + strftime (resstr, sizeof (resstr), format, &tm); > + > + char buf[BUFSIZ]; > + snprintf (buf, sizeof(buf), "%s%s-> %s", input, get_round_str (round), resstr); > + if (!comment) > + printf ("%s\n", buf); > + else > + printf ("%-*s%s", (int)(comment - input), buf, comment); > + } > > return 0; > } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH] test: Improve tests for the date/time parser module 2012-10-03 20:32 ` Jani Nikula @ 2012-10-04 11:02 ` Michal Sojka 0 siblings, 0 replies; 30+ messages in thread From: Michal Sojka @ 2012-10-04 11:02 UTC (permalink / raw) To: Jani Nikula, notmuch, David Bremner On Wed, Oct 03 2012, Jani Nikula wrote: > On Tue, 25 Sep 2012, Michal Sojka <sojkam1@fel.cvut.cz> wrote: >> This patch reworks date/time parser library test program to make it >> easier to to write the actual tests. It also modifies the notmuch test >> script and adds several new tests to it. > [...] >> case 'n': >> - /* specify now in seconds since epoch */ >> - now = (time_t) strtol (optarg, NULL, 10); >> + memset (&tm, 0, sizeof (tm)); >> + char *parsed = strptime (optarg, format, &tm); > > One of the problems with strptime is that it doesn't support time zones, > which is why I chose not to use it here. (You can specify %z in the > format to ignore it, but it looks like it's ignored no matter > what. *shrug*) Combined with mktime below, you introduce possible TZ and > DST variations in the tests, which can be problematic. So perhaps we > should keep the reference time as a timestamp here. I didn't pay much attention to time zone issues when writing this little program, so you might be true. But note that test-lib.sh sets TZ=UTC and I hope that this should eliminate the problems. I think that it is better to have human understandable values in test scripts. -Michal ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 5/9] build: build parse-time-string as part of the notmuch lib and static cli 2012-09-12 21:27 [PATCH v3 0/9] notmuch search date:since..until query support Jani Nikula ` (3 preceding siblings ...) 2012-09-12 21:27 ` [PATCH v3 4/9] test: add smoke tests for the date/time parser module Jani Nikula @ 2012-09-12 21:27 ` Jani Nikula 2012-09-12 21:27 ` [PATCH v3 6/9] lib: add date range query support Jani Nikula ` (3 subsequent siblings) 8 siblings, 0 replies; 30+ messages in thread From: Jani Nikula @ 2012-09-12 21:27 UTC (permalink / raw) To: notmuch, David Bremner --- Makefile.local | 2 +- lib/Makefile.local | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Makefile.local b/Makefile.local index 7f2d4f1..2b91946 100644 --- a/Makefile.local +++ b/Makefile.local @@ -277,7 +277,7 @@ notmuch_client_srcs = \ notmuch_client_modules = $(notmuch_client_srcs:.c=.o) -notmuch: $(notmuch_client_modules) lib/libnotmuch.a util/libutil.a +notmuch: $(notmuch_client_modules) lib/libnotmuch.a util/libutil.a parse-time-string/libparse-time-string.a $(call quiet,CXX $(CFLAGS)) $^ $(FINAL_LIBNOTMUCH_LDFLAGS) -o $@ notmuch-shared: $(notmuch_client_modules) lib/$(LINKER_NAME) diff --git a/lib/Makefile.local b/lib/Makefile.local index 8a9aa28..d1635cf 100644 --- a/lib/Makefile.local +++ b/lib/Makefile.local @@ -70,7 +70,7 @@ $(dir)/libnotmuch.a: $(libnotmuch_modules) $(call quiet,AR) rcs $@ $^ $(dir)/$(LIBNAME): $(libnotmuch_modules) notmuch.sym - $(call quiet,CXX $(CXXFLAGS)) $(libnotmuch_modules) $(FINAL_LIBNOTMUCH_LDFLAGS) $(LIBRARY_LINK_FLAG) -o $@ util/libutil.a + $(call quiet,CXX $(CXXFLAGS)) $(libnotmuch_modules) $(FINAL_LIBNOTMUCH_LDFLAGS) $(LIBRARY_LINK_FLAG) -o $@ util/libutil.a parse-time-string/libparse-time-string.a notmuch.sym: $(srcdir)/$(dir)/notmuch.h $(libnotmuch_modules) sh $(srcdir)/$(lib)/gen-version-script.sh $< $(libnotmuch_modules) > $@ -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 6/9] lib: add date range query support 2012-09-12 21:27 [PATCH v3 0/9] notmuch search date:since..until query support Jani Nikula ` (4 preceding siblings ...) 2012-09-12 21:27 ` [PATCH v3 5/9] build: build parse-time-string as part of the notmuch lib and static cli Jani Nikula @ 2012-09-12 21:27 ` Jani Nikula 2012-09-13 11:14 ` Michal Nazarewicz 2012-09-12 21:27 ` [PATCH v3 7/9] test: add tests for date:since..until range queries Jani Nikula ` (2 subsequent siblings) 8 siblings, 1 reply; 30+ messages in thread From: Jani Nikula @ 2012-09-12 21:27 UTC (permalink / raw) To: notmuch, David Bremner Add a custom value range processor to enable date and time searches of the form date:since..until, where "since" and "until" are expressions understood by the previously added date/time parser, to restrict the results to messages within a particular time range (based on the Date: header). If "since" or "until" describes date/time at an accuracy of days or less, the values are rounded according to the accuracy, towards past for "since" and towards future for "until". For example, date:november..yesterday would match from the beginning of November until the end of yesterday. Expressions such as date:today..today means since the beginning of today until the end of today. Open-ended ranges are supported (since Xapian 1.2.1), i.e. you can specify date:..until or date:since.. to not limit the start or end date, respectively. CAVEATS: Xapian does not support spaces in range expressions. You can replace the spaces with '_', or (in most cases) '-', or (in some cases) leave the spaces out altogether. Entering date:expr without ".." (for example date:yesterday) will not work as you might expect. You can achieve the expected result by duplicating the expr both sides of ".." (for example date:yesterday..yesterday). Open-ended ranges won't work with pre-1.2.1 Xapian, but they don't produce an error either. Signed-off-by: Jani Nikula <jani@nikula.org> --- lib/Makefile.local | 1 + lib/database-private.h | 1 + lib/database.cc | 5 +++++ lib/parse-time-vrp.cc | 40 ++++++++++++++++++++++++++++++++++++++++ lib/parse-time-vrp.h | 19 +++++++++++++++++++ 5 files changed, 66 insertions(+) create mode 100644 lib/parse-time-vrp.cc create mode 100644 lib/parse-time-vrp.h diff --git a/lib/Makefile.local b/lib/Makefile.local index d1635cf..6c0f42f 100644 --- a/lib/Makefile.local +++ b/lib/Makefile.local @@ -58,6 +58,7 @@ libnotmuch_c_srcs = \ libnotmuch_cxx_srcs = \ $(dir)/database.cc \ + $(dir)/parse-time-vrp.cc \ $(dir)/directory.cc \ $(dir)/index.cc \ $(dir)/message.cc \ diff --git a/lib/database-private.h b/lib/database-private.h index 88532d5..d3e65fd 100644 --- a/lib/database-private.h +++ b/lib/database-private.h @@ -52,6 +52,7 @@ struct _notmuch_database { Xapian::QueryParser *query_parser; Xapian::TermGenerator *term_gen; Xapian::ValueRangeProcessor *value_range_processor; + Xapian::ValueRangeProcessor *date_range_processor; }; /* Return the list of terms from the given iterator matching a prefix. diff --git a/lib/database.cc b/lib/database.cc index 761dc1a..4df3217 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -19,6 +19,7 @@ */ #include "database-private.h" +#include "parse-time-vrp.h" #include <iostream> @@ -710,12 +711,14 @@ notmuch_database_open (const char *path, notmuch->term_gen = new Xapian::TermGenerator; notmuch->term_gen->set_stemmer (Xapian::Stem ("english")); notmuch->value_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP); + notmuch->date_range_processor = new ParseTimeValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP); notmuch->query_parser->set_default_op (Xapian::Query::OP_AND); notmuch->query_parser->set_database (*notmuch->xapian_db); notmuch->query_parser->set_stemmer (Xapian::Stem ("english")); notmuch->query_parser->set_stemming_strategy (Xapian::QueryParser::STEM_SOME); notmuch->query_parser->add_valuerangeprocessor (notmuch->value_range_processor); + notmuch->query_parser->add_valuerangeprocessor (notmuch->date_range_processor); for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) { prefix_t *prefix = &BOOLEAN_PREFIX_EXTERNAL[i]; @@ -778,6 +781,8 @@ notmuch_database_close (notmuch_database_t *notmuch) notmuch->xapian_db = NULL; delete notmuch->value_range_processor; notmuch->value_range_processor = NULL; + delete notmuch->date_range_processor; + notmuch->date_range_processor = NULL; } void diff --git a/lib/parse-time-vrp.cc b/lib/parse-time-vrp.cc new file mode 100644 index 0000000..148c117 --- /dev/null +++ b/lib/parse-time-vrp.cc @@ -0,0 +1,40 @@ + +#include "database-private.h" +#include "parse-time-vrp.h" +#include "parse-time-string.h" + +#define PREFIX "date:" + +/* See *ValueRangeProcessor in xapian-core/api/valuerangeproc.cc */ +Xapian::valueno +ParseTimeValueRangeProcessor::operator() (std::string &begin, std::string &end) +{ + time_t t, now; + + /* Require date: prefix in start of the range... */ + if (STRNCMP_LITERAL (begin.c_str (), PREFIX)) + return Xapian::BAD_VALUENO; + + /* ...and remove it. */ + begin.erase (0, sizeof (PREFIX) - 1); + + /* Use the same 'now' for begin and end. */ + if (time (&now) == (time_t) -1) + return Xapian::BAD_VALUENO; + + if (!begin.empty ()) { + if (parse_time_string (begin.c_str (), &t, &now, PARSE_TIME_ROUND_DOWN)) + return Xapian::BAD_VALUENO; + + begin.assign (Xapian::sortable_serialise ((double) t)); + } + + if (!end.empty ()) { + if (parse_time_string (end.c_str (), &t, &now, PARSE_TIME_ROUND_UP)) + return Xapian::BAD_VALUENO; + + end.assign (Xapian::sortable_serialise ((double) t)); + } + + return valno; +} diff --git a/lib/parse-time-vrp.h b/lib/parse-time-vrp.h new file mode 100644 index 0000000..526c217 --- /dev/null +++ b/lib/parse-time-vrp.h @@ -0,0 +1,19 @@ + +#ifndef NOTMUCH_PARSE_TIME_VRP_H +#define NOTMUCH_PARSE_TIME_VRP_H + +#include <xapian.h> + +/* see *ValueRangeProcessor in xapian-core/include/xapian/queryparser.h */ +class ParseTimeValueRangeProcessor : public Xapian::ValueRangeProcessor { +protected: + Xapian::valueno valno; + +public: + ParseTimeValueRangeProcessor (Xapian::valueno slot_) + : valno(slot_) { } + + Xapian::valueno operator() (std::string &begin, std::string &end); +}; + +#endif /* NOTMUCH_PARSE_TIME_VRP_H */ -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/9] lib: add date range query support 2012-09-12 21:27 ` [PATCH v3 6/9] lib: add date range query support Jani Nikula @ 2012-09-13 11:14 ` Michal Nazarewicz 2012-09-13 11:32 ` Tomi Ollila ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Michal Nazarewicz @ 2012-09-13 11:14 UTC (permalink / raw) To: Jani Nikula, notmuch, David Bremner [-- Attachment #1: Type: text/plain, Size: 1716 bytes --] On Wed, Sep 12 2012, Jani Nikula wrote: > Add a custom value range processor to enable date and time searches of > the form date:since..until, where "since" and "until" are expressions > understood by the previously added date/time parser, to restrict the > results to messages within a particular time range (based on the Date: > header). > > If "since" or "until" describes date/time at an accuracy of days or > less, the values are rounded according to the accuracy, towards past > for "since" and towards future for "until". For example, > date:november..yesterday would match from the beginning of November > until the end of yesterday. Expressions such as date:today..today > means since the beginning of today until the end of today. IMO this is totally unintuitive and not how the range should work. date:foo..bar should return messages whose date >= foo and < bar. So for instance date:november..yesterday should return messages whose date is > 2012/11/01 00:00:00 and < 2012/09/12 00:00:00. So to get yesterdays messages one would do: date:yesterday..today. > Open-ended ranges are supported (since Xapian 1.2.1), i.e. you can > specify date:..until or date:since.. to not limit the start or end > date, respectively. > > CAVEATS: > > Xapian does not support spaces in range expressions. You can replace > the spaces with '_', or (in most cases) '-', or (in some cases) leave > the spaces out altogether. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz (o o) ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo-- [-- Attachment #2.1: Type: text/plain, Size: 0 bytes --] [-- Attachment #2.2: Type: application/pgp-signature, Size: 835 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/9] lib: add date range query support 2012-09-13 11:14 ` Michal Nazarewicz @ 2012-09-13 11:32 ` Tomi Ollila 2012-09-13 13:33 ` Jani Nikula 2012-09-13 12:32 ` David Bremner 2012-09-25 12:15 ` Michal Sojka 2 siblings, 1 reply; 30+ messages in thread From: Tomi Ollila @ 2012-09-13 11:32 UTC (permalink / raw) To: Michal Nazarewicz, Jani Nikula, notmuch, David Bremner On Thu, Sep 13 2012, Michal Nazarewicz <mina86@mina86.com> wrote: > On Wed, Sep 12 2012, Jani Nikula wrote: >> Add a custom value range processor to enable date and time searches of >> the form date:since..until, where "since" and "until" are expressions >> understood by the previously added date/time parser, to restrict the >> results to messages within a particular time range (based on the Date: >> header). >> >> If "since" or "until" describes date/time at an accuracy of days or >> less, the values are rounded according to the accuracy, towards past >> for "since" and towards future for "until". For example, >> date:november..yesterday would match from the beginning of November >> until the end of yesterday. Expressions such as date:today..today >> means since the beginning of today until the end of today. > > IMO this is totally unintuitive and not how the range should work. > date:foo..bar should return messages whose date >= foo and < bar. So > for instance date:november..yesterday should return messages whose date > is > 2012/11/01 00:00:00 and < 2012/09/12 00:00:00. So to get > yesterdays messages one would do: date:yesterday..today. I find yesterday..yesterday to return whole yesterday's messages more intuitive than that returning zero messages and requiring yesterday..today to see messages sent yesterday. However, I've noticed that range described as -1day..-1day (if that syntax is/were supported) that would be a bit confusing (In yesterday's case I think the length of 'yesterday' is 24h, but in '-1day' the lenght is one second (or something)) Anyway, this just emphasizes that this is confusing matter; we need a good idiom to comprehend this issue... Tomi > >> Open-ended ranges are supported (since Xapian 1.2.1), i.e. you can >> specify date:..until or date:since.. to not limit the start or end >> date, respectively. >> >> CAVEATS: >> >> Xapian does not support spaces in range expressions. You can replace >> the spaces with '_', or (in most cases) '-', or (in some cases) leave >> the spaces out altogether. > > -- > Best regards, _ _ > .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o > ..o | Computer Science, Michał “mina86” Nazarewicz (o o) > ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--_______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/9] lib: add date range query support 2012-09-13 11:32 ` Tomi Ollila @ 2012-09-13 13:33 ` Jani Nikula 2012-09-17 15:11 ` Michal Nazarewicz 0 siblings, 1 reply; 30+ messages in thread From: Jani Nikula @ 2012-09-13 13:33 UTC (permalink / raw) To: Tomi Ollila, Michal Nazarewicz, notmuch, David Bremner On Thu, 13 Sep 2012, Tomi Ollila <tomi.ollila@iki.fi> wrote: > On Thu, Sep 13 2012, Michal Nazarewicz <mina86@mina86.com> wrote: > >> On Wed, Sep 12 2012, Jani Nikula wrote: >>> Add a custom value range processor to enable date and time searches of >>> the form date:since..until, where "since" and "until" are expressions >>> understood by the previously added date/time parser, to restrict the >>> results to messages within a particular time range (based on the Date: >>> header). >>> >>> If "since" or "until" describes date/time at an accuracy of days or >>> less, the values are rounded according to the accuracy, towards past >>> for "since" and towards future for "until". For example, >>> date:november..yesterday would match from the beginning of November >>> until the end of yesterday. Expressions such as date:today..today >>> means since the beginning of today until the end of today. >> >> IMO this is totally unintuitive and not how the range should work. >> date:foo..bar should return messages whose date >= foo and < bar. So >> for instance date:november..yesterday should return messages whose date >> is > 2012/11/01 00:00:00 and < 2012/09/12 00:00:00. So to get >> yesterdays messages one would do: date:yesterday..today. > > > I find yesterday..yesterday to return whole yesterday's messages more > intuitive than that returning zero messages and requiring yesterday..today > to see messages sent yesterday. However, I've noticed that range > described as -1day..-1day (if that syntax is/were supported) > that would be a bit confusing (In yesterday's case I think the length > of 'yesterday' is 24h, but in '-1day' the lenght is one second > (or something)) "yesterday" equals "1 day", so you can use date:yesterday..yesterday and date:1d..1d interchangeably. > Anyway, this just emphasizes that this is confusing matter; we need > a good idiom to comprehend this issue... I find "since" rounding towards past and "until" rounding towards future a very simple rule. But YMMV. One technical aspect is preparing for handling date:expr *without* range, for example date:yesterday, in the future (this is currently not supported by xapian). Intuitively that should mean all messages received yesterday. Because the date parser does not see the range (or lack of it) at all (and this is very much by design), the glue layer in notmuch lib between the parser and xapian should handle it gracefully, with no understanding of expr itself. The obvious and simple way to handle that is to just duplicate expr on both sides of the range, and date:yesterday would equal date:yesterday..yesterday, in a way that is very simple to implement and explain to users. BR, Jani. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/9] lib: add date range query support 2012-09-13 13:33 ` Jani Nikula @ 2012-09-17 15:11 ` Michal Nazarewicz 0 siblings, 0 replies; 30+ messages in thread From: Michal Nazarewicz @ 2012-09-17 15:11 UTC (permalink / raw) To: Jani Nikula, Tomi Ollila, notmuch, David Bremner [-- Attachment #1: Type: text/plain, Size: 931 bytes --] On Thu, Sep 13 2012, Jani Nikula wrote: > I find "since" rounding towards past and "until" rounding towards future > a very simple rule. But YMMV. To implement rounding, each date needs to have a period of time to align to. I call that a duration. But if you have such a duration than I propose a solution where you don't need any kind of rounding. If “yesterday” has a duration of one day, than “date:yesterday” would be equivalent to “date:yesterday..yesterday + 1 day” and this works perfectly well with ranges open on the right side. So to implement date specifications with a single date, no additional code is really required. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz (o o) ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo-- [-- Attachment #2.1: Type: text/plain, Size: 0 bytes --] [-- Attachment #2.2: Type: application/pgp-signature, Size: 835 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/9] lib: add date range query support 2012-09-13 11:14 ` Michal Nazarewicz 2012-09-13 11:32 ` Tomi Ollila @ 2012-09-13 12:32 ` David Bremner 2012-09-17 15:03 ` Michal Nazarewicz 2012-09-25 12:15 ` Michal Sojka 2 siblings, 1 reply; 30+ messages in thread From: David Bremner @ 2012-09-13 12:32 UTC (permalink / raw) To: Michal Nazarewicz, Jani Nikula, notmuch Michal Nazarewicz <mina86@mina86.com> writes: > IMO this is totally unintuitive and not how the range should work. > date:foo..bar should return messages whose date >= foo and < bar. So > for instance date:november..yesterday should return messages whose date > is > 2012/11/01 00:00:00 and < 2012/09/12 00:00:00. So to get > yesterdays messages one would do: date:yesterday..today. I don't find ranges being half-open by default to be very intuitive. Perhaps I don't program in python enough. d ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/9] lib: add date range query support 2012-09-13 12:32 ` David Bremner @ 2012-09-17 15:03 ` Michal Nazarewicz 2012-09-17 15:35 ` Tomi Ollila 2012-09-17 16:10 ` David Bremner 0 siblings, 2 replies; 30+ messages in thread From: Michal Nazarewicz @ 2012-09-17 15:03 UTC (permalink / raw) To: David Bremner, Jani Nikula, notmuch [-- Attachment #1: Type: text/plain, Size: 1506 bytes --] > Michal Nazarewicz <mina86@mina86.com> writes: >> IMO this is totally unintuitive and not how the range should work. >> date:foo..bar should return messages whose date >= foo and < bar. So >> for instance date:november..yesterday should return messages whose date >> is > 2012/11/01 00:00:00 and < 2012/09/12 00:00:00. So to get >> yesterdays messages one would do: date:yesterday..today. On Thu, Sep 13 2012, David Bremner wrote: > I don't find ranges being half-open by default to be very > intuitive. Perhaps I don't program in python enough. Perhaps C than: “for (i = 0; i < 10; ++i)” is the standard idiom and the end range is open. Let's take a look at: date:2012/01/01..2012/01/01 + 1 day in my opinion, that should give results from the first of January only, since “+ 1 day” indicates in a way how long user want the period to be. I think it's also easier to pragmatically create ranges. For instance, let's say you want to create ranges for each week, you'd end up with: date:2012/01/02..2012/01/09 ## 2012w01 date:2012/01/09..2012/01/16 ## 2012w02 date:2012/01/16..2012/01/23 ## 2012w03 Notice how the opening date of a range matches the closing date of the previous date. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, Michał “mina86” Nazarewicz (o o) ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo-- [-- Attachment #2.1: Type: text/plain, Size: 0 bytes --] [-- Attachment #2.2: Type: application/pgp-signature, Size: 835 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/9] lib: add date range query support 2012-09-17 15:03 ` Michal Nazarewicz @ 2012-09-17 15:35 ` Tomi Ollila 2012-09-17 16:10 ` David Bremner 1 sibling, 0 replies; 30+ messages in thread From: Tomi Ollila @ 2012-09-17 15:35 UTC (permalink / raw) To: Michal Nazarewicz, David Bremner, Jani Nikula, notmuch On Mon, Sep 17 2012, Michal Nazarewicz wrote: >> Michal Nazarewicz <mina86@mina86.com> writes: >>> IMO this is totally unintuitive and not how the range should work. >>> date:foo..bar should return messages whose date >= foo and < bar. So >>> for instance date:november..yesterday should return messages whose date >>> is > 2012/11/01 00:00:00 and < 2012/09/12 00:00:00. So to get >>> yesterdays messages one would do: date:yesterday..today. > > On Thu, Sep 13 2012, David Bremner wrote: >> I don't find ranges being half-open by default to be very >> intuitive. Perhaps I don't program in python enough. > > Perhaps C than: “for (i = 0; i < 10; ++i)” is the standard idiom and the > end range is open. > > Let's take a look at: > > date:2012/01/01..2012/01/01 + 1 day > > in my opinion, that should give results from the first of January only, > since “+ 1 day” indicates in a way how long user want the period to be. > > I think it's also easier to pragmatically create ranges. For instance, > let's say you want to create ranges for each week, you'd end up with: > > date:2012/01/02..2012/01/09 ## 2012w01 > date:2012/01/09..2012/01/16 ## 2012w02 > date:2012/01/16..2012/01/23 ## 2012w03 Ok, these matches with ISO Week... > Notice how the opening date of a range matches the closing date of > the previous date. 2012/01/02 is monday 2012/01/09 is monday. For me date:2012/01/02..2012/01/08 ## 2012w01 would be more intuitive in this context. for (i = 0; i < 10; ++i) loops through 0 - 9. for (i = 1; i <= 10; ++i) loops through 1 - 10. python -c 'for f in range(5): print f' prints 0 - 4 perl -le 'for (1..5) { print $_ }' prints 1 - 5 ... these does not clarify, but confuses these intuitions :D > > -- > Best regards, _ _ > ..o | Computer Science, Michał “mina86” Nazarewicz (o o) Tomi ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/9] lib: add date range query support 2012-09-17 15:03 ` Michal Nazarewicz 2012-09-17 15:35 ` Tomi Ollila @ 2012-09-17 16:10 ` David Bremner 1 sibling, 0 replies; 30+ messages in thread From: David Bremner @ 2012-09-17 16:10 UTC (permalink / raw) To: Michal Nazarewicz, notmuch Michal Nazarewicz <mina86@mina86.com> writes: > On Thu, Sep 13 2012, David Bremner wrote: >> I don't find ranges being half-open by default to be very >> intuitive. Perhaps I don't program in python enough. [snip] > date:2012/01/01..2012/01/01 + 1 day > > in my opinion, that should give results from the first of January only, > since “+ 1 day” indicates in a way how long user want the period to be. Sorry, still not convinced. My comment about python was more or less a joke; I get the analogy with C, but (1) I don't think C is a reasonable UI guide, and (2), at least in the C case, the half openness of the range is explicitly given by the operators used. d ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/9] lib: add date range query support 2012-09-13 11:14 ` Michal Nazarewicz 2012-09-13 11:32 ` Tomi Ollila 2012-09-13 12:32 ` David Bremner @ 2012-09-25 12:15 ` Michal Sojka 2 siblings, 0 replies; 30+ messages in thread From: Michal Sojka @ 2012-09-25 12:15 UTC (permalink / raw) To: Michal Nazarewicz, Jani Nikula, notmuch, David Bremner On Thu, Sep 13 2012, Michal Nazarewicz wrote: > On Wed, Sep 12 2012, Jani Nikula wrote: >> Add a custom value range processor to enable date and time searches of >> the form date:since..until, where "since" and "until" are expressions >> understood by the previously added date/time parser, to restrict the >> results to messages within a particular time range (based on the Date: >> header). >> >> If "since" or "until" describes date/time at an accuracy of days or >> less, the values are rounded according to the accuracy, towards past >> for "since" and towards future for "until". For example, >> date:november..yesterday would match from the beginning of November >> until the end of yesterday. Expressions such as date:today..today >> means since the beginning of today until the end of today. > > IMO this is totally unintuitive and not how the range should work. > date:foo..bar should return messages whose date >= foo and < bar. So > for instance date:november..yesterday should return messages whose date > is > 2012/11/01 00:00:00 and < 2012/09/12 00:00:00. So to get > yesterdays messages one would do: date:yesterday..today. For me, date:monday..wednesday means all messages received on monday, tuseday or wednesday. If I say Wednesday, I'm really interested in Wednesday and not the day before Wednesday. I'd also like to allow syntax like date:yesterday with the meaning all messages sent yesterday. My idea how to implement this was described in id:"87bovryqp0.fsf@steelpick.2x.cz". Unfortunately, I have no time to implement it myself. -Michal ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 7/9] test: add tests for date:since..until range queries 2012-09-12 21:27 [PATCH v3 0/9] notmuch search date:since..until query support Jani Nikula ` (5 preceding siblings ...) 2012-09-12 21:27 ` [PATCH v3 6/9] lib: add date range query support Jani Nikula @ 2012-09-12 21:27 ` Jani Nikula 2012-09-12 21:27 ` [PATCH v3 8/9] man: document the " Jani Nikula 2012-09-12 21:27 ` [PATCH v3 9/9] NEWS: date range search support Jani Nikula 8 siblings, 0 replies; 30+ messages in thread From: Jani Nikula @ 2012-09-12 21:27 UTC (permalink / raw) To: notmuch, David Bremner A brief initial test set. --- test/notmuch-test | 1 + test/search-date | 21 +++++++++++++++++++++ 2 files changed, 22 insertions(+) create mode 100755 test/search-date diff --git a/test/notmuch-test b/test/notmuch-test index 7eadfdf..9a1b375 100755 --- a/test/notmuch-test +++ b/test/notmuch-test @@ -61,6 +61,7 @@ TESTS=" emacs-show missing-headers parse-time-string + search-date " TESTS=${NOTMUCH_TESTS:=$TESTS} diff --git a/test/search-date b/test/search-date new file mode 100755 index 0000000..70bcf34 --- /dev/null +++ b/test/search-date @@ -0,0 +1,21 @@ +#!/usr/bin/env bash +test_description="date:since..until queries" +. ./test-lib.sh + +add_email_corpus + +test_begin_subtest "Absolute date range" +output=$(notmuch search date:2010-12-16..12/16/2010 | notmuch_search_sanitize) +test_expect_equal "$output" "thread:XXX 2010-12-16 [1/1] Olivier Berger; Essai accentué (inbox unread)" + +test_begin_subtest "Absolute time range with TZ" +notmuch search date:18-Nov-2009_02:19:26-0800..2009-11-18_04:49:52-06:00 | notmuch_search_sanitize > OUTPUT +cat <<EOF >EXPECTED +thread:XXX 2009-11-18 [1/3] Carl Worth| Jan Janak; [notmuch] What a great idea! (inbox unread) +thread:XXX 2009-11-18 [1/2] Carl Worth| Jan Janak; [notmuch] [PATCH] Older versions of install do not support -C. (inbox unread) +thread:XXX 2009-11-18 [1/3] Carl Worth| Aron Griffis, Keith Packard; [notmuch] archive (inbox unread) +thread:XXX 2009-11-18 [1/2] Carl Worth| Keith Packard; [notmuch] [PATCH] Make notmuch-show 'X' (and 'x') commands remove inbox (and unread) tags (inbox unread) +EOF +test_expect_equal_file OUTPUT EXPECTED + +test_done -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 8/9] man: document the date:since..until range queries 2012-09-12 21:27 [PATCH v3 0/9] notmuch search date:since..until query support Jani Nikula ` (6 preceding siblings ...) 2012-09-12 21:27 ` [PATCH v3 7/9] test: add tests for date:since..until range queries Jani Nikula @ 2012-09-12 21:27 ` Jani Nikula 2012-09-12 21:27 ` [PATCH v3 9/9] NEWS: date range search support Jani Nikula 8 siblings, 0 replies; 30+ messages in thread From: Jani Nikula @ 2012-09-12 21:27 UTC (permalink / raw) To: notmuch, David Bremner --- man/man7/notmuch-search-terms.7 | 147 +++++++++++++++++++++++++++++++++++---- 1 file changed, 135 insertions(+), 12 deletions(-) diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7 index 17a109e..fbd3ee7 100644 --- a/man/man7/notmuch-search-terms.7 +++ b/man/man7/notmuch-search-terms.7 @@ -54,6 +54,8 @@ terms to match against specific portions of an email, (where folder:<directory-path> + date:<since>..<until> + The .B from: prefix is used to match the name or address of the sender of an email @@ -104,6 +106,26 @@ contained within particular directories within the mail store. Only the directory components below the top-level mail database path are available to be searched. +The +.B date: +prefix can be used to restrict the results to only messages within a +particular time range (based on the Date: header) with a range syntax +of: + + date:<since>..<until> + +See \fBDATE AND TIME SEARCH\fR below for details on the range +expression, and supported syntax for <since> and <until> date and time +expressions. + +The time range can also be specified using timestamps with a syntax +of: + + <initial-timestamp>..<final-timestamp> + +Each timestamp is a number representing the number of seconds since +1970\-01\-01 00:00:00 UTC. + In addition to individual terms, multiple terms can be combined with Boolean operators ( .BR and ", " or ", " not @@ -117,20 +139,121 @@ operators, but will have to be protected from interpretation by the shell, (such as by putting quotation marks around any parenthesized expression). -Finally, results can be restricted to only messages within a -particular time range, (based on the Date: header) with a syntax of: +.SH DATE AND TIME SEARCH - <initial-timestamp>..<final-timestamp> +This is a non-exhaustive description of the date and time search with +some pseudo notation. Most of the constructs can be mixed freely, and +in any order, but the same absolute date or time can't be expressed +twice. -Each timestamp is a number representing the number of seconds since -1970\-01\-01 00:00:00 UTC. This is not the most convenient means of -expressing date ranges, but until notmuch is fixed to accept a more -convenient form, one can use the date program to construct -timestamps. For example, with the bash shell the following syntax would -specify a date range to return messages from 2009\-10\-01 until the -current time: - - $(date +%s \-d 2009\-10\-01)..$(date +%s) +.RS 4 +.TP 4 +.B The range expression + +date:<since>..<until> + +The above expression restricts the results to only messages from +<since> to <until>, based on the Date: header. + +If <since> or <until> describes time at an accuracy of days or less, +the date/time is rounded, towards past for <since> and towards future +for <until>, to be inclusive. For example, date:january..february +matches from the beginning of January until the end of +February. Similarly, date:yesterday..yesterday matches from the +beginning of yesterday until the end of yesterday. + +Open-ended ranges are supported (since Xapian 1.2.1), i.e. it's +possible to specify date:..<until> or date:<since>.. to not limit the +start or end time, respectively. Unfortunately, pre-1.2.1 Xapian does +not report an error on open ended ranges, but it does not work as +expected either. + +Xapian does not support spaces in range expressions. You can replace +the spaces with '_', or (in most cases) '-', or (in some cases) leave +the spaces out altogether. + +Entering date:expr without ".." (for example date:yesterday) won't +work, as it's not interpreted as a range expression at all. You can +achieve the expected result by duplicating the expr both sides of ".." +(for example date:yesterday..yesterday). +.RE + +.RS 4 +.TP 4 +.B Relative date and time +[N|number] (years|months|weeks|days|hours|hrs|minutes|mins|seconds|secs) [...] + +All refer to past, can be repeated and will be accumulated. + +Units can be abbreviated to any length, with the otherwise ambiguous +single m being m for minutes and M for months. + +Number multiplier can also be written out one, two, ..., ten, dozen, +hundred. As special cases last means one ("last week") and this means +zero ("this month"). + +When combined with absolute date and time, the relative date and time +specification will be relative from the specified absolute date and +time. + +Examples: 5M2d, two weeks +.RE + +.RS 4 +.TP 4 +.B Supported time formats +H[H]:MM[:SS] [(am|a.m.|pm|p.m.)] + +H[H] (am|a.m.|pm|p.m.) + +HHMMSS + +now + +noon + +midnight + +Examples: 17:05, 5pm +.RE + +.RS 4 +.TP 4 +.B Supported date formats +YYYY-MM[-DD] + +DD-MM[-[YY]YY] + +MM-YYYY + +M[M]/D[D][/[YY]YY] + +M[M]/YYYY + +D[D].M[M][.[YY]YY] + +D[D][(st|nd|rd|th)] Mon[thname] [YYYY] + +Mon[thname] D[D][(st|nd|rd|th)] [YYYY] + +Wee[kday] + +Month names can be abbreviated at three or more characters. + +Weekday names can be abbreviated at three or more characters. + +Examples: 2012-07-31, 31-07-2012, 7/31/2012, August 3 +.RE + +.RS 4 +.TP 4 +.B Time zones +(+|-)HH:MM + +(+|-)HH[MM] + +Some time zone codes, e.g. UTC, EET. +.RE .SH SEE ALSO -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 9/9] NEWS: date range search support 2012-09-12 21:27 [PATCH v3 0/9] notmuch search date:since..until query support Jani Nikula ` (7 preceding siblings ...) 2012-09-12 21:27 ` [PATCH v3 8/9] man: document the " Jani Nikula @ 2012-09-12 21:27 ` Jani Nikula 8 siblings, 0 replies; 30+ messages in thread From: Jani Nikula @ 2012-09-12 21:27 UTC (permalink / raw) To: notmuch, David Bremner --- NEWS | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/NEWS b/NEWS index 2b50ba3..5f5b726 100644 --- a/NEWS +++ b/NEWS @@ -1,3 +1,17 @@ +Notmuch 0.15 (YYYY-MM-DD) +========================= + +Library changes +--------------- + +Date range search support + + The `date:` prefix can now be used in queries to restrict the results to only + messages within a particular time range (based on the Date: header) with a + range syntax of `date:<since>..<until>`. Notmuch supports a wide variety of + expressions in `<since>` and `<until>`. Please refer to the + `notmuch-search-terms(7)` manual page for details. + Notmuch 0.14 (2012-08-20) ========================= -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 30+ messages in thread
end of thread, other threads:[~2012-10-04 11:02 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-12 21:27 [PATCH v3 0/9] notmuch search date:since..until query support Jani Nikula 2012-09-12 21:27 ` [PATCH v3 1/9] build: drop the -Wswitch-enum warning Jani Nikula 2012-09-12 21:27 ` [PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch Jani Nikula 2012-09-13 11:10 ` Michal Nazarewicz 2012-09-13 12:07 ` Jani Nikula 2012-09-17 14:13 ` Michal Nazarewicz 2012-09-17 15:54 ` Jani Nikula 2012-09-13 12:48 ` Tomi Ollila 2012-09-25 11:56 ` Michal Sojka 2012-10-03 18:49 ` Jani Nikula 2012-10-03 19:02 ` Michal Sojka 2012-09-12 21:27 ` [PATCH v3 3/9] test: add new test tool parse-time for date/time parser Jani Nikula 2012-09-12 21:27 ` [PATCH v3 4/9] test: add smoke tests for the date/time parser module Jani Nikula 2012-09-25 12:05 ` [PATCH] test: Improve " Michal Sojka 2012-10-03 20:32 ` Jani Nikula 2012-10-04 11:02 ` Michal Sojka 2012-09-12 21:27 ` [PATCH v3 5/9] build: build parse-time-string as part of the notmuch lib and static cli Jani Nikula 2012-09-12 21:27 ` [PATCH v3 6/9] lib: add date range query support Jani Nikula 2012-09-13 11:14 ` Michal Nazarewicz 2012-09-13 11:32 ` Tomi Ollila 2012-09-13 13:33 ` Jani Nikula 2012-09-17 15:11 ` Michal Nazarewicz 2012-09-13 12:32 ` David Bremner 2012-09-17 15:03 ` Michal Nazarewicz 2012-09-17 15:35 ` Tomi Ollila 2012-09-17 16:10 ` David Bremner 2012-09-25 12:15 ` Michal Sojka 2012-09-12 21:27 ` [PATCH v3 7/9] test: add tests for date:since..until range queries Jani Nikula 2012-09-12 21:27 ` [PATCH v3 8/9] man: document the " Jani Nikula 2012-09-12 21:27 ` [PATCH v3 9/9] NEWS: date range search support Jani Nikula
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).