* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates @ 2021-11-30 20:55 Bob Rogers 2021-12-01 4:17 ` Lars Ingebrigtsen 2021-12-04 18:58 ` Paul Eggert 0 siblings, 2 replies; 40+ messages in thread From: Bob Rogers @ 2021-11-30 20:55 UTC (permalink / raw) To: 52209 [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 1538 bytes --] In the emacs-28 branch at 0dd3883def: Imagine my surprise when evaluating (days-between "2021-10-22" "2020-09-29") returned zero. The root cause is that passing any date string without a time to date-to-time produces the same return value: (date-to-time "2021-10-22") => (14445 17280) (date-to-time "2020-09-29") => (14445 17280) But: (date-to-time "2020-09-29 23:15") => (24435 63540) There are really two bugs here (or maybe three, depending on how you look at it): 1. If parsing throws an error that is not an overflow, it passes the date through timezone-make-date-arpa-standard to try to fix some cases that parse-time-string can't handle. But the condition-case is also wrapped around the encode-time call, which gets a wrong-type-argument error when it sees nil time values for HH:MM, so the fallback gets used for something other than a parsing error. 2. When timezone-make-date-arpa-standard gets something it can't handle, it "canonicalizes" the value to "31 Dec 1999 19:00:00 -0500", which is the source of the constant result. That may be worth another bug report, but I'm not sure of its charter; maybe that's correct behavior in context. The attached patch adds decoded-time-set-defaults, moves that and the encode-time call outside the condition-case, and disables the fallback to timezone-make-date-arpa-standard if the date appears not to have a time value. And I can now tell you there are 388 days between 2020-09-29 and 2021-10-22. -- Bob Rogers http://www.rgrjr.com/ [-- Attachment #2: Type: text/x-patch, Size: 1854 bytes --] diff --git a/lisp/calendar/time-date.el b/lisp/calendar/time-date.el index 155c34927f..6407138953 100644 --- a/lisp/calendar/time-date.el +++ b/lisp/calendar/time-date.el @@ -153,19 +153,25 @@ date-to-time "Parse a string DATE that represents a date-time and return a time value. DATE should be in one of the forms recognized by `parse-time-string'. If DATE lacks timezone information, GMT is assumed." - (condition-case err - (encode-time (parse-time-string date)) - (error - (let ((overflow-error '(error "Specified time is not representable"))) - (if (equal err overflow-error) - (signal (car err) (cdr err)) - (condition-case err - (encode-time (parse-time-string - (timezone-make-date-arpa-standard date))) - (error - (if (equal err overflow-error) - (signal (car err) (cdr err)) - (error "Invalid date: %s" date))))))))) + ;; Pass the result of parsing through decoded-time-set-defaults + ;; because encode-time signals if HH:MM:SS are not filled in. + (encode-time + (decoded-time-set-defaults + (condition-case err + (parse-time-string date) + (error + (let ((overflow-error '(error "Specified time is not representable"))) + (if (or (equal err overflow-error) + ;; timezone-make-date-arpa-standard misbehaves if + ;; not given at least HH:MM as part of the date. + (not (string-match ":" date))) + (signal (car err) (cdr err)) + (condition-case err + (parse-time-string (timezone-make-date-arpa-standard date)) + (error + (if (equal err overflow-error) + (signal (car err) (cdr err)) + (error "Invalid date: %s" date))))))))))) ;;;###autoload (defalias 'time-to-seconds 'float-time) ^ permalink raw reply related [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-11-30 20:55 bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates Bob Rogers @ 2021-12-01 4:17 ` Lars Ingebrigtsen 2021-12-03 5:19 ` Katsumi Yamaoka 2021-12-04 18:58 ` Paul Eggert 1 sibling, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2021-12-01 4:17 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > In the emacs-28 branch at 0dd3883def: > > Imagine my surprise when evaluating > > (days-between "2021-10-22" "2020-09-29") > > returned zero. Thanks, applied to Emacs 29. (These functions were never really intended to support parsing dates like that -- only strict RFC822 date strings were originally supported, but it's become more DWIM as time has passed. Especially since it wasn't explicitly stated anywhere that time-date.el was an RFC822 library.) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-01 4:17 ` Lars Ingebrigtsen @ 2021-12-03 5:19 ` Katsumi Yamaoka 2021-12-03 16:29 ` Lars Ingebrigtsen 2021-12-03 18:38 ` Michael Heerdegen 0 siblings, 2 replies; 40+ messages in thread From: Katsumi Yamaoka @ 2021-12-03 5:19 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Bob Rogers, 52209 [-- Attachment #1: Type: text/plain, Size: 644 bytes --] On Wed, 01 Dec 2021 05:17:30 +0100, Lars Ingebrigtsen wrote: > Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: >> In the emacs-28 branch at 0dd3883def: >> Imagine my surprise when evaluating >> (days-between "2021-10-22" "2020-09-29") >> returned zero. > Thanks, applied to Emacs 29. This change caused another regression. Please try: (current-time-string (date-to-time "Fri, 03-Dec-2021 04:59:52 GMT")) The function needs to test if `parse-time-string' returns a valid data as the old version did it with the help of `encode-time'. A patch is below (where why I do `(setq time ...)' is to silence the byte compiler). Thanks. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 587 bytes --] --- time-date.el~ 2021-12-01 22:24:35.006052000 +0000 +++ time-date.el 2021-12-03 05:13:22.832443900 +0000 @@ -158,7 +158,10 @@ (encode-time (decoded-time-set-defaults (condition-case err - (parse-time-string date) + (let ((time (parse-time-string date))) + (prog1 time + ;; Cause an error if data `parse-time-string' returns is invalid. + (setq time (encode-time time)))) (error (let ((overflow-error '(error "Specified time is not representable"))) (if (or (equal err overflow-error) ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-03 5:19 ` Katsumi Yamaoka @ 2021-12-03 16:29 ` Lars Ingebrigtsen 2021-12-03 18:38 ` Michael Heerdegen 1 sibling, 0 replies; 40+ messages in thread From: Lars Ingebrigtsen @ 2021-12-03 16:29 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: Bob Rogers, 52209 Katsumi Yamaoka <yamaoka@jpl.org> writes: > The function needs to test if `parse-time-string' returns a valid > data as the old version did it with the help of `encode-time'. > A patch is below (where why I do `(setq time ...)' is to silence > the byte compiler). Thanks; applied to Emacs 29. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-03 5:19 ` Katsumi Yamaoka 2021-12-03 16:29 ` Lars Ingebrigtsen @ 2021-12-03 18:38 ` Michael Heerdegen 1 sibling, 0 replies; 40+ messages in thread From: Michael Heerdegen @ 2021-12-03 18:38 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: Bob Rogers, Lars Ingebrigtsen, 52209 Katsumi Yamaoka <yamaoka@jpl.org> writes: > A patch is below (where why I do `(setq time ...)' is to silence > the byte compiler). AFAIU: `ignore' is the solution we choose most of the time for such cases, it makes a bit clearer what happens. Michael. ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-11-30 20:55 bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates Bob Rogers 2021-12-01 4:17 ` Lars Ingebrigtsen @ 2021-12-04 18:58 ` Paul Eggert 2021-12-19 21:11 ` Bob Rogers 1 sibling, 1 reply; 40+ messages in thread From: Paul Eggert @ 2021-12-04 18:58 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Bob Rogers, Katsumi Yamaoka, 52209 [-- Attachment #1: Type: text/plain, Size: 1318 bytes --] Unfortunately the latest change to time-date.el reintroduced Bug#52209. I installed the attached patch to fix this, and to add some test cases mentioned in this bug report to help prevent the problem recurring. Also, this patch documents the new feature, and avoids overenthusiastically guessing the year to be 1970 when the date string lacks a year. > (These functions were never really intended to support parsing dates > like that -- only strict RFC822 date strings were originally supported, > but it's become more DWIM as time has passed. Yes, date-to-time has definitely ... evolved. My understanding is that date-to-time's RFC822 parsing is present only for backward compatibility, and that we shouldn't attempt to enhance it (here, the enhancement would be pointless as the RFC822 parsing fills in the blanks anyway). So the patch I just installed adds the new feature only for the normal path taken, when not doing the RFC822 hack. PS. Internet RFC 822 has been obsolete since 2001, and the Emacs code should be talking about RFC 5322 everywhere except when Emacs is explicitly supporting the obsolete standard instead of the current standard. And we should rename functions like rfc822-goto-eoh to rfc-email-goto-eoh, to help avoid confusion or further function renaming. But I digress.... [-- Attachment #2: 0001-Fix-date-to-time-2021-12-04.patch --] [-- Type: text/x-patch, Size: 5213 bytes --] From cb0f4f00b328a561e49538bbf0f90050eac1ba20 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Sat, 4 Dec 2021 10:33:32 -0800 Subject: [PATCH] Fix (date-to-time "2021-12-04") This should complete the fix for Bug#52209. * lisp/calendar/time-date.el (date-to-time): Apply decoded-time-set-defaults only to the output of (parse-time-string date), and only when the output has a year (to avoid confusion when dates lack years). There is no point applying it after timezone-make-date-arpa-standard since the latter fills in all the blanks. And the former code mistakenly called encode-time on an already-encoded time. This goes back to the code a couple of days ago, except with changed behavior (to fix Bug#52209) only when timezone-make-date-arpa-standard is not called. * test/lisp/calendar/time-date-tests.el (test-date-to-time) (test-days-between): New tests. --- doc/lispref/os.texi | 3 ++- etc/NEWS | 4 +++ lisp/calendar/time-date.el | 38 +++++++++++---------------- test/lisp/calendar/time-date-tests.el | 7 +++++ 4 files changed, 29 insertions(+), 23 deletions(-) diff --git a/doc/lispref/os.texi b/doc/lispref/os.texi index e420644cd8..b4efc44b03 100644 --- a/doc/lispref/os.texi +++ b/doc/lispref/os.texi @@ -1724,7 +1724,8 @@ Time Parsing corresponding Lisp timestamp. The argument @var{string} should represent a date-time, and should be in one of the forms recognized by @code{parse-time-string} (see below). This function assumes Universal -Time if @var{string} lacks explicit time zone information. +Time if @var{string} lacks explicit time zone information, +and assumes earliest values if @var{string} lacks month, day, or time. The operating system limits the range of time and zone values. @end defun diff --git a/etc/NEWS b/etc/NEWS index ac1787d7f8..2b4eaaf8a1 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1084,6 +1084,10 @@ cookies set by web pages on disk. ** New variable 'help-buffer-under-preparation'. This variable is bound to t during the preparation of a "*Help*" buffer. ++++ +** 'date-to-time' now assumes earliest values if its argument lacks +month, day, or time. For example, (date-to-time "2021-12-04") now +assumes a time of 00:00 instead of signaling an error. \f * Changes in Emacs 29.1 on Non-Free Operating Systems diff --git a/lisp/calendar/time-date.el b/lisp/calendar/time-date.el index 8a6ee0f270..37a16d3b98 100644 --- a/lisp/calendar/time-date.el +++ b/lisp/calendar/time-date.el @@ -153,28 +153,22 @@ date-to-time "Parse a string DATE that represents a date-time and return a time value. DATE should be in one of the forms recognized by `parse-time-string'. If DATE lacks timezone information, GMT is assumed." - ;; Pass the result of parsing through decoded-time-set-defaults - ;; because encode-time signals if HH:MM:SS are not filled in. - (encode-time - (decoded-time-set-defaults - (condition-case err - (let ((time (parse-time-string date))) - (prog1 time - ;; Cause an error if data `parse-time-string' returns is invalid. - (setq time (encode-time time)))) - (error - (let ((overflow-error '(error "Specified time is not representable"))) - (if (or (equal err overflow-error) - ;; timezone-make-date-arpa-standard misbehaves if - ;; not given at least HH:MM as part of the date. - (not (string-match ":" date))) - (signal (car err) (cdr err)) - (condition-case err - (parse-time-string (timezone-make-date-arpa-standard date)) - (error - (if (equal err overflow-error) - (signal (car err) (cdr err)) - (error "Invalid date: %s" date))))))))))) + (condition-case err + (let ((parsed (parse-time-string date))) + (when (decoded-time-year parsed) + (decoded-time-set-defaults parsed)) + (encode-time parsed)) + (error + (let ((overflow-error '(error "Specified time is not representable"))) + (if (equal err overflow-error) + (signal (car err) (cdr err)) + (condition-case err + (encode-time (parse-time-string + (timezone-make-date-arpa-standard date))) + (error + (if (equal err overflow-error) + (signal (car err) (cdr err)) + (error "Invalid date: %s" date))))))))) ;;;###autoload (defalias 'time-to-seconds 'float-time) diff --git a/test/lisp/calendar/time-date-tests.el b/test/lisp/calendar/time-date-tests.el index 4568947c0b..d5269804ad 100644 --- a/test/lisp/calendar/time-date-tests.el +++ b/test/lisp/calendar/time-date-tests.el @@ -41,6 +41,13 @@ test-obsolete-encode-time-value (encode-time-value 1 2 3 4 3)) '(1 2 3 4)))) +(ert-deftest test-date-to-time () + (should (equal (format-time-string "%F %T" (date-to-time "2021-12-04")) + "2021-12-04 00:00:00"))) + +(ert-deftest test-days-between () + (should (equal (days-between "2021-10-22" "2020-09-29") 388))) + (ert-deftest test-leap-year () (should-not (date-leap-year-p 1999)) (should-not (date-leap-year-p 1900)) -- 2.32.0 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-04 18:58 ` Paul Eggert @ 2021-12-19 21:11 ` Bob Rogers 2021-12-20 10:08 ` Lars Ingebrigtsen 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2021-12-19 21:11 UTC (permalink / raw) To: Paul Eggert, Lars Ingebrigtsen; +Cc: Katsumi Yamaoka, 52209 [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 2357 bytes --] From: Paul Eggert <eggert@cs.ucla.edu> Date: Sat, 4 Dec 2021 10:58:37 -0800 Unfortunately the latest change to time-date.el reintroduced Bug#52209. I installed the attached patch to fix this . . . I'm sure none of you will be surprised to learn that parse-time-string still doesn't recognize single-digit months or days, with the same fallback-to-the-epoch behavior that threw me for a loop originally. (format-time-string "%F %T %z" (date-to-time "2022-1-12")) => "1999-12-31 19:00:00 -0500" And adding a time makes it work again because it seems that timezone-make-date-arpa-standard does accept single-digit months and days. Go figure. The attached patch extends parse-time-string by using regexps instead of string manipulation of fixed-width fields. This could possibly break interface compatibility, especially if you expect anyone to customize parse-time-rules. So I will not be surprised if you decline to adopt it. > (These functions were never really intended to support parsing dates > like that -- only strict RFC822 date strings were originally supported, > but it's become more DWIM as time has passed. Yes, date-to-time has definitely ... evolved. My understanding is that date-to-time's RFC822 parsing is present only for backward compatibility, and that we shouldn't attempt to enhance it (here, the enhancement would be pointless as the RFC822 parsing fills in the blanks anyway). So the patch I just installed adds the new feature only for the normal path taken, when not doing the RFC822 hack. PS. Internet RFC 822 has been obsolete since 2001, and the Emacs code should be talking about RFC 5322 everywhere except when Emacs is explicitly supporting the obsolete standard instead of the current standard. And we should rename functions like rfc822-goto-eoh to rfc-email-goto-eoh, to help avoid confusion or further function renaming. But I digress.... Since Emacs time functions have evolved well beyond email, I would argue that even "rfc-email-" is too specific a prefix for them. So if this patch is not suitable, maybe it's (cough, cough) time for a new time and date parsing API that supports a broader range of human-generated dates and times, with better error handling and I18N support. WDYT? -- Bob Rogers http://www.rgrjr.com/ [-- Attachment #2: Type: text/x-patch, Size: 6215 bytes --] diff --git a/lisp/calendar/parse-time.el b/lisp/calendar/parse-time.el index 5a3d2706af..4812dcbd1b 100644 --- a/lisp/calendar/parse-time.el +++ b/lisp/calendar/parse-time.el @@ -102,45 +102,25 @@ parse-time-rules ((3) (1 31)) ((4) parse-time-months) ((5) (100)) - ((2 1 0) - ,(lambda () (and (stringp parse-time-elt) - (= (length parse-time-elt) 8) - (= (aref parse-time-elt 2) ?:) - (= (aref parse-time-elt 5) ?:))) - [0 2] [3 5] [6 8]) ((8 7) parse-time-zoneinfo ,(lambda () (car parse-time-val)) ,(lambda () (cadr parse-time-val))) ((8) + "^[-+][0-9][0-9][0-9][0-9]$" ,(lambda () - (and (stringp parse-time-elt) - (= 5 (length parse-time-elt)) - (or (= (aref parse-time-elt 0) ?+) - (= (aref parse-time-elt 0) ?-)))) - ,(lambda () (* 60 (+ (cl-parse-integer parse-time-elt :start 3 :end 5) - (* 60 (cl-parse-integer parse-time-elt :start 1 :end 3))) - (if (= (aref parse-time-elt 0) ?-) -1 1)))) + (* 60 + (+ (cl-parse-integer parse-time-elt :start 3 :end 5) + (* 60 (cl-parse-integer parse-time-elt :start 1 :end 3))) + (if (= (aref parse-time-elt 0) ?-) -1 1)))) ((5 4 3) - ,(lambda () (and (stringp parse-time-elt) - (= (length parse-time-elt) 10) - (= (aref parse-time-elt 4) ?-) - (= (aref parse-time-elt 7) ?-))) - [0 4] [5 7] [8 10]) - ((2 1 0) - ,(lambda () (and (stringp parse-time-elt) - (= (length parse-time-elt) 5) - (= (aref parse-time-elt 2) ?:))) - [0 2] [3 5] ,(lambda () 0)) + "^\\([0-9][0-9][0-9][0-9]\\)-\\([0-9][0-9]?\\)-\\([0-9][0-9]?\\)$" + 1 2 3) ((2 1 0) - ,(lambda () (and (stringp parse-time-elt) - (= (length parse-time-elt) 4) - (= (aref parse-time-elt 1) ?:))) - [0 1] [2 4] ,(lambda () 0)) + "^\\([0-9][0-9]?\\):\\([0-9][0-9]\\)$" + 1 2 ,(lambda () 0)) ((2 1 0) - ,(lambda () (and (stringp parse-time-elt) - (= (length parse-time-elt) 7) - (= (aref parse-time-elt 1) ?:))) - [0 1] [2 4] [5 7]) + "^\\([0-9][0-9]?\\):\\([0-9][0-9]\\):\\([0-9][0-9]\\)$" + 1 2 3) ((5) (50 110) ,(lambda () (+ 1900 parse-time-elt))) ((5) (0 49) ,(lambda () (+ 2000 parse-time-elt)))) "(slots predicate extractor...)") @@ -173,7 +153,11 @@ parse-time-string (parse-time-val)) (when (and (not (nth (car slots) time)) ;not already set (setq parse-time-val - (cond ((and (consp predicate) + (cond ((stringp predicate) + (and (stringp parse-time-elt) + (string-match predicate + parse-time-elt))) + ((and (consp predicate) (not (functionp predicate))) (and (numberp parse-time-elt) (<= (car predicate) parse-time-elt) @@ -188,15 +172,15 @@ parse-time-string (setq exit t) (while slots (let ((new-val (if rule - (let ((this (pop rule))) - (if (vectorp this) - (cl-parse-integer - parse-time-elt - :start (aref this 0) - :end (aref this 1)) - (funcall this))) + (let* ((this (pop rule))) + (if (integerp this) + (cl-parse-integer + (match-string this parse-time-elt)) + (funcall this))) parse-time-val))) - (setf (nth (pop slots) time) new-val)))))))) + (setf (nth (pop slots) time) new-val)))))) + (unless exit + (message "unrecognized token '%s'" parse-time-elt)))) time)))) (defun parse-iso8601-time-string (date-string) diff --git a/test/lisp/calendar/parse-time-tests.el b/test/lisp/calendar/parse-time-tests.el index b706b73570..63b696db1d 100644 --- a/test/lisp/calendar/parse-time-tests.el +++ b/test/lisp/calendar/parse-time-tests.el @@ -45,6 +45,36 @@ parse-time-tests '(42 35 19 22 2 2016 1 nil -28800))) (should (equal (parse-time-string "Friday, 21 Sep 2018 13:47:58 PDT") '(58 47 13 21 9 2018 5 t -25200))) + (should (equal (parse-time-string "Friday, 21 Sep 2018 13:47:58") + '(58 47 13 21 9 2018 5 -1 nil))) + (should (equal (parse-time-string "Friday, 21 Sep 2018") + '(nil nil nil 21 9 2018 5 -1 nil))) + ;; Date can be numeric if separated by hyphens. + (should (equal (parse-time-string "Friday, 2018-09-21") + '(nil nil nil 21 9 2018 5 -1 nil))) + ;; Day of week is optional + (should (equal (parse-time-string "2018-09-21") + '(nil nil nil 21 9 2018 nil -1 nil))) + ;; The order of date, time, etc., does not matter. + (should (equal (parse-time-string "13:47:58, +0100, 2018-09-21, Friday") + '(58 47 13 21 9 2018 5 -1 3600))) + ;; Month, day, or both, can be a single digit. + (should (equal (parse-time-string "Friday, 2018-9-08") + '(nil nil nil 8 9 2018 5 -1 nil))) + (should (equal (parse-time-string "Friday, 2018-09-8") + '(nil nil nil 8 9 2018 5 -1 nil))) + (should (equal (parse-time-string "Friday, 2018-9-8") + '(nil nil nil 8 9 2018 5 -1 nil))) + ;; Time by itself is recognized as such. + (should (equal (parse-time-string "03:47:58") + '(58 47 3 nil nil nil nil -1 nil))) + ;; A leading zero for hours is optional. + (should (equal (parse-time-string "3:47:58") + '(58 47 3 nil nil nil nil -1 nil))) + ;; Missing seconds are assumed to be zero. + (should (equal (parse-time-string "3:47") + '(0 47 3 nil nil nil nil -1 nil))) + (should (equal (format-time-string "%Y-%m-%d %H:%M:%S" (parse-iso8601-time-string "1998-09-12T12:21:54-0200") t) ^ permalink raw reply related [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-19 21:11 ` Bob Rogers @ 2021-12-20 10:08 ` Lars Ingebrigtsen 2021-12-20 15:57 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2021-12-20 10:08 UTC (permalink / raw) To: Bob Rogers; +Cc: Katsumi Yamaoka, Paul Eggert, 52209 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Since Emacs time functions have evolved well beyond email, I would argue > that even "rfc-email-" is too specific a prefix for them. So if this > patch is not suitable, maybe it's (cough, cough) time for a new time and > date parsing API that supports a broader range of human-generated dates > and times, with better error handling and I18N support. WDYT? Yes, I think we should stop futzing around with `parse-time-string' and instead create a new well-defined library with a signature like: (parse-time "2020-01-15T16:12:21-08:00" 'iso-8601) (parse-time "1/4-2" 'us-date) (parse-time "Wed, 15 Jan 2020 16:12:21 -0800" 'rfc822) etc. (And yes, I know the latter is a superseded standard, but it's the one people know.) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-20 10:08 ` Lars Ingebrigtsen @ 2021-12-20 15:57 ` Bob Rogers 2021-12-20 16:34 ` Bob Rogers 2021-12-21 11:01 ` Lars Ingebrigtsen 0 siblings, 2 replies; 40+ messages in thread From: Bob Rogers @ 2021-12-20 15:57 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Katsumi Yamaoka, Paul Eggert, 52209 From: Lars Ingebrigtsen <larsi@gnus.org> Date: Mon, 20 Dec 2021 11:08:44 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Since Emacs time functions have evolved well beyond email, I would argue > that even "rfc-email-" is too specific a prefix for them. So if this > patch is not suitable, maybe it's (cough, cough) time for a new time and > date parsing API that supports a broader range of human-generated dates > and times, with better error handling and I18N support. WDYT? Yes, I think we should stop futzing around with `parse-time-string' and instead create a new well-defined library with a signature like: (parse-time "2020-01-15T16:12:21-08:00" 'iso-8601) (parse-time "1/4-2" 'us-date) (parse-time "Wed, 15 Jan 2020 16:12:21 -0800" 'rfc822) etc. (And yes, I know the latter is a superseded standard, but it's the one people know.) I can see that it's a good idea to have an explicit hint that the date is in rfc822 format since a two-digit year (which parse-time-string still supports) might otherwise be misinterpreted as something else. And perhaps two-digit years this far into the century should otherwise be disallowed. Otherwise, I think the other date formats would be pretty easy to recognize, with the exception of month-day order in numeric dates, which ought to be possible to disambiguate via locale. (Though I admit I have no experience with locale programming.) On the other hand, I can imagine the caller might want to insist that the passed string must be in a certain format and force an error if parse-time finds otherwise. One question: Did you have in mind that parse-time should have the same return value as parse-time-string, in order to feed into the other Emacs time functions? -- Bob ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-20 15:57 ` Bob Rogers @ 2021-12-20 16:34 ` Bob Rogers 2021-12-21 11:01 ` Lars Ingebrigtsen 1 sibling, 0 replies; 40+ messages in thread From: Bob Rogers @ 2021-12-20 16:34 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Katsumi Yamaoka, Paul Eggert, 52209 From: Bob Rogers <rogers-emacs@rgrjr.homedns.org> Date: Mon, 20 Dec 2021 10:57:33 -0500 . . . Otherwise, I think the other date formats would be pretty easy to recognize, with the exception of month-day order in numeric dates, which ought to be possible to disambiguate via locale. (Though I admit I have no experience with locale programming.) Never mind; through further reading I have realized that the current locale has no necessary bearing on the locale of the date. So I'm not sure what's needed here. -- Bob ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-20 15:57 ` Bob Rogers 2021-12-20 16:34 ` Bob Rogers @ 2021-12-21 11:01 ` Lars Ingebrigtsen 2021-12-23 19:48 ` Bob Rogers 1 sibling, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2021-12-21 11:01 UTC (permalink / raw) To: Bob Rogers; +Cc: Katsumi Yamaoka, Paul Eggert, 52209 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > On the other hand, I can imagine the caller might want to insist that > the passed string must be in a certain format and force an error if > parse-time finds otherwise. Yup. That's one good reason to not have a time parsing function guess at formats, because the input data will be different. In my previous job, we had a library to parse date/time strings, and I think we were up to about 80 distinct formats to handle the different data feeds we were getting. For instance, "01 02 03" may be three different dates depending on where you get the date from. > One question: Did you have in mind that parse-time should have the > same return value as parse-time-string, in order to feed into the other > Emacs time functions? Yes, I think so. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-21 11:01 ` Lars Ingebrigtsen @ 2021-12-23 19:48 ` Bob Rogers 2021-12-24 9:29 ` Lars Ingebrigtsen 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2021-12-23 19:48 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 52209 [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 1699 bytes --] From: Lars Ingebrigtsen <larsi@gnus.org> Date: Tue, 21 Dec 2021 12:01:07 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > On the other hand, I can imagine the caller might want to insist > that the passed string must be in a certain format and force an > error if parse-time finds otherwise. Yup. That's one good reason to not have a time parsing function guess at formats, because the input data will be different. OK, I have proceeded along those lines; WIP attached for feedback. I changed the name to "parse-date" to avoid confusion; I was otherwise stuck when trying to come up with a sensible name for the test file, since parse-time-tests.el was already taken (though I suppose I could have added to the existing file). The docstring of parse-date describes the expected functionality as far as I've planned, with comments in square brackets to note what's missing. In my previous job, we had a library to parse date/time strings, and I think we were up to about 80 distinct formats to handle the different data feeds we were getting. For instance, "01 02 03" may be three different dates depending on where you get the date from. Which (additional) formats would you like? I'm assuming we need iso8601 and rfc822 for compatibility (in which case rfc2822 will be easy to provide in addition), and us-date and euro-date to disambiguate the month/day order. Would the third format correspond to ISO 2001-01-03? Do we want to support that? And come to think of it, I've been using DD-Mon-YY for my own purposes for so long that I'm not even certain whether Americans use MM-DD-YY or it's the other way around . . . -- Bob [-- Attachment #2: Type: text/x-patch, Size: 20310 bytes --] diff --git a/lisp/calendar/parse-date.el b/lisp/calendar/parse-date.el new file mode 100644 index 0000000000..c4b756cf2e --- /dev/null +++ b/lisp/calendar/parse-date.el @@ -0,0 +1,281 @@ +;;; parse-date.el --- parsing time/date strings -*- lexical-binding: t -*- + +;; Copyright (C) 2021 Free Software Foundation, Inc. + +;; Author: Bob Rogers <rogers@rgrjr.com> +;; Keywords: util + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: + +;; 'parse-date' parses a time and/or date in a string and returns a +;; list of values, just like `decode-time', where unspecified elements +;; in the string are returned as nil (except unspecified DST is +;; returned as -1). `encode-time' may be applied on these values to +;; obtain an internal time value. If left to its own devices, it +;; accepts a wide variety of formats, but can be told to insist on a +;; particular date/time format. + +;; Historically, `parse-time-string' was used for this purpose, but it +;; was focused on email date formats, and gradually but imperfectly +;; extended to handle other formats. 'parse-date' is compatible in +;; that it parses the same input formats and uses the same return +;; value format, but is stricter in that it signals an error for +;; tokens that `parse-time-string' would simply ignore. + +;;; TODO: +;; +;; * Define and signal a date-error for parsing issues. +;; +;; * Implement rfc2822 and rfc822 independently of parse-time-string. +;; +;; * Add a euro-date format for DD/MM/YYYY ? +;; + +;;; Code: + +(require 'cl-lib) +(require 'iso8601) +(require 'parse-time) + +(defun parse-date-guess-format (time-string) + (cond ((string-match "[0-9]T[0-9]" time-string) 'iso8601) + (t nil))) + +(defun parse-date-ignore-char? (char) + (or (eq char ?\ ) (eq char ?,) (eq char ?,))) + +(defun parse-date-tokenize-string (string) + "Turn STRING into tokens, separated only by whitespace and commas. +Multiple commas are ignored. Pure digit sequences are turned +into integers." + (let ((index 0) + (end (length string)) + (char nil) + (list ())) + ;; Skip leading ignored characters. + (while (and (< index end) + (setq char (aref string index)) + (parse-date-ignore-char? char)) + (cl-incf index)) + (while (< index end) + (let ((start index) + (all-digits (<= ?0 char ?9))) + ;; char is valid; look for more valid characters. + (while (and (< (cl-incf index) end) + (setq char (aref string index)) + (not (parse-date-ignore-char? char))) + (unless (<= ?0 char ?9) + (setq all-digits nil))) + (when (<= index end) + (push (if all-digits + (cl-parse-integer string :start start :end index) + (substring string start index)) + list) + ;; Skip ignored characters. + (while (and (< (cl-incf index) end) + (setq char (aref string index)) + (parse-date-ignore-char? char)) + ()) + ;; Next token. + ))) + (nreverse list))) + +(defconst parse-date-slot-names + '(second minute hour day month year weekday dst zone) + "Names of return value slots, for better error messages +See the decoded-time defstruct.") + +(defconst parse-date-slot-ranges + '((0 59) (0 59) (0 23) (1 31) (1 12) (1 9999)) + "Numeric slot ranges, for bounds checking.") + +(defun parse-date-default (time-string two-digit-year?) + ;; Do the standard parsing thing. This is mostly free form, in that + ;; tokens may appear in any order, but we expect to introduce some + ;; state dependence. + (let ((tokens (parse-date-tokenize-string (downcase time-string))) + (time (list nil nil nil nil nil nil nil -1 nil))) + (cl-flet ((set-matched-slot (slot index token) + ;; Assign a slot value from match data if index is + ;; non-nil, else from token, signalling an error if + ;; it's already been assigned or is out of range. + (let ((value (if index + (cl-parse-integer (match-string index token)) + token)) + (range (nth slot parse-date-slot-ranges))) + (unless (equal (nth slot time) + (if (= slot 7) -1 nil)) + (error "Duplicate %s slot value '%s'" + (nth slot parse-date-slot-names) token)) + (when (and range + (not (<= (car range) value (cadr range)))) + (error "Value %s is out of range for %s" + token (nth slot parse-date-slot-names))) + (setf (nth slot time) value)))) + (while tokens + (let ((token (pop tokens)) + (match nil)) + (cond ((numberp token) + ;; A bare number could be a month, day, or year. + ;; The order of these tests matters greatly. + (cond ((>= token 1000) + (set-matched-slot 5 nil token)) + ((and (<= 1 token 31) + (not (nth 3 time))) + ;; Assume days come before months or years. + (set-matched-slot 3 nil token)) + ((and (<= 1 token 12) + (not (nth 4 time))) + ;; Assume days come before years. + (set-matched-slot 4 nil token)) + ((or (nth 5 time) + (not two-digit-year?) + (> token 100)) + (error "Unrecognized numeric value %s" token)) + ;; It's a two-digit year. + ((>= token 50) + ;; second half of the 20th century. + (set-matched-slot 5 nil (+ 1900 token))) + (t + ;; first half of the 21st century. + (set-matched-slot 5 nil (+ 2000 token))))) + ((setq match (assoc token parse-time-weekdays)) + (set-matched-slot 6 nil (cdr match))) + ((setq match (assoc token parse-time-months)) + (set-matched-slot 4 nil (cdr match))) + ((setq match (assoc token parse-time-zoneinfo)) + (set-matched-slot 8 nil (cadr match)) + (set-matched-slot 7 nil (caddr match))) + ((string-match "^[-+][0-9][0-9][0-9][0-9]$" token) + ;; Numeric time zone. + (set-matched-slot + 8 nil + (* 60 + (+ (cl-parse-integer token :start 3 :end 5) + (* 60 (cl-parse-integer token :start 1 :end 3))) + (if (= (aref token 0) ?-) -1 1)))) + ((string-match + "^\\([0-9][0-9][0-9][0-9]\\)[-/]\\([0-9][0-9]?\\)[-/]\\([0-9][0-9]?\\)$" + token) + ;; ISO-8601-style date (YYYY-MM-DD). + (set-matched-slot 5 1 token) + (set-matched-slot 4 2 token) + (set-matched-slot 3 3 token)) + ((string-match + "^\\([0-9][0-9]?\\)[-/]\\([0-9][0-9]?\\)[-/]\\([0-9][0-9][0-9][0-9]\\)$" + token) + ;; US date (MM-DD-YYYY), but we insist on four + ;; digits for the year. + (set-matched-slot 4 1 token) + (set-matched-slot 3 2 token) + (set-matched-slot 5 3 token)) + ((string-match + "^\\([0-9][0-9]?\\):\\([0-9][0-9]\\):\\([0-9][0-9]\\)$" + token) + (set-matched-slot 2 1 token) + (set-matched-slot 1 2 token) + (set-matched-slot 0 3 token)) + ((string-match "^\\([0-9][0-9]?\\):\\([0-9][0-9]\\)$" token) + ;; Time without seconds. + (set-matched-slot 2 1 token) + (set-matched-slot 1 2 token) + (set-matched-slot 0 nil 0)) + ((member token '("am" "pm")) + (unless (nth 2 time) + (error "'AM'/'PM' specified before or without time")) + (unless (<= (nth 2 time) 12) + (error "'AM'/'PM' specified for time already past noon")) + (when (equal token "pm") + (cl-incf (nth 2 time) 12))) + (t + (error "Unrecognized time token '%s'" token)))))) + time)) + +;;;###autoload +(defun parse-date (time-string &optional format) + "Parse TIME-STRING according to FORMAT, returning a list. +The FORMAT value is a symbol that may be one of the following: + + iso8601 => parse the string according to the ISO-8601 +standard. See `parse-iso8601-time-string'. + + iso-8601 => synonym for iso8601. + + rfc822 => parse an RFC822 (old email) date, which allows +two-digit years and internal '()' comments. In dates of the form +'11 Jan 12', the 11 is assumed to be the day, and the 12 is +assumed to mean 2012. [not fully implemented.] + + rfc2822 => parse an RFC2822 (new email) date, which allows +only four-digit years. [not implemented.] + + us-date => parse a US-style date, of the form MM/DD/YYYY, but +allowing two-digit years. In dates of the form '01/11/12', the 1 +is the month, 11 is the day, and the 12 is assumed to mean 2012. +[not fully implemented.] + + nil => attempt to guess the format, falling back on us-date +with two-digit years disallowed. + +The default is nil, and anything else is assumed to be us-date +with two-digit years disallowed. + + * For all formats except iso8601, parsing is case-insensitive. + + * Commas and whitespace are ignored. + + * In date specifications, either '/' or '-' may be used to +separate components, but all three components must be given. + + * A date that starts with four digits is YYYY-MM-DD, ISO-8601 +style, but a date that ends with four digits is MM-DD-YYYY [at +least in us-date format]. + + * Two digit years, when allowed, are in the 1900's when +between 50 and 99 and in the 2000's when between 0 and 49. + +Errors are signalled when time values are duplicated, +unrecognized, or out of range. No consistency checks between +fields are done. For instance, the weekday is not checked to see +that it corresponds to the date, and parse-date complains about +the 32nd of March (or any other month) but blithely accepts the +29th of February in non-leap years -- or the 31st of February in +any year. + +The result is a list of (SEC MIN HOUR DAY MON YEAR DOW DST TZ), +which can be accessed as a decoded-time defstruct (q.v.), +e.g. `decoded-time-year' to extract the year, and turned into an +Emacs timestamp by `encode-time'. The values returned are +identical to those of `decode-time', but any unknown values other +than DST are returned as nil, and an unknown DST value is +returned as -1." + (cl-case (or format (parse-date-guess-format time-string)) + ((iso8601 iso-8601) + (parse-iso8601-time-string time-string)) + ((rfc822 rfc2822) + ;; [Placeholder; we eventually want something more strict. -- + ;; rgr, 20-Dec-21.] + (parse-time-string time-string)) + (us-date + (parse-date-default time-string t)) + (t + (parse-date-default time-string nil)))) + +(provide 'parse-date) + +;;; parse-date.el ends here diff --git a/test/lisp/calendar/parse-date-tests.el b/test/lisp/calendar/parse-date-tests.el new file mode 100644 index 0000000000..682365e674 --- /dev/null +++ b/test/lisp/calendar/parse-date-tests.el @@ -0,0 +1,164 @@ +;;; parse-date-tests.el --- Test suite for parse-date.el -*- lexical-binding:t -*- + +;; Copyright (C) 2016-2021 Free Software Foundation, Inc. + +;; Author: Lars Ingebrigtsen <larsi@gnus.org> + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: + +;;; Code: + +(require 'ert) +(require 'parse-date) + +(ert-deftest parse-date-tests () + "Test basic parse-date functionality." + + ;; Test tokenization. + (should (equal (parse-date-tokenize-string " ") '())) + (should (equal (parse-date-tokenize-string " a b") '("a" "b"))) + (should (equal (parse-date-tokenize-string "a bbc dde") '("a" "bbc" "dde"))) + (should (equal (parse-date-tokenize-string " , a 27 b,, c 14:32 ") + '("a" 27 "b" "c" "14:32"))) + + ;; Start with some RFC822 dates. + (dolist (format '(nil rfc822)) + (should (equal (parse-date "Mon, 22 Feb 2016 19:35:42 +0100" format) + '(42 35 19 22 2 2016 1 -1 3600))) + (should (equal (parse-date "22 Feb 2016 19:35:42 +0100" format) + '(42 35 19 22 2 2016 nil -1 3600))) + (should (equal (parse-date "22 Feb 2016 +0100" format) + '(nil nil nil 22 2 2016 nil -1 3600))) + (should (equal (parse-date "Mon, 22 February 2016 19:35:42 +0100" format) + '(42 35 19 22 2 2016 1 -1 3600))) + (should (equal (parse-date "Mon, 22 feb 2016 19:35:42 +0100" format) + '(42 35 19 22 2 2016 1 -1 3600))) + (should (equal (parse-date "Monday, 22 february 2016 19:35:42 +0100" format) + '(42 35 19 22 2 2016 1 -1 3600))) + (should (equal (parse-date "Monday, 22 february 2016 19:35:42 PST" format) + '(42 35 19 22 2 2016 1 nil -28800))) + (should (equal (parse-date "Friday, 21 Sep 2018 13:47:58 PDT" format) + '(58 47 13 21 9 2018 5 t -25200))) + (should (equal (parse-date "Friday, 21 Sep 2018 13:47:58" format) + '(58 47 13 21 9 2018 5 -1 nil))) + (should (equal (parse-date "Friday, 21 Sep 2018" format) + '(nil nil nil 21 9 2018 5 -1 nil)))) + ;; These are not allowed by the default format. + (should (equal (parse-date "22 Feb 16 19:35:42 +0100" 'rfc822) + '(42 35 19 22 2 2016 nil -1 3600))) + (should (equal (parse-date "22 Feb 96 19:35:42 +0100" 'rfc822) + '(42 35 19 22 2 1996 nil -1 3600))) + + ;; Test the default format with both hyphens and slashes in dates. + (dolist (case '(;; Month can be numeric if date uses hyphens/slashes. + ("Friday, 2018-09-21" (nil nil nil 21 9 2018 5 -1 nil)) + ;; Year can come last if four digits. + ("Friday, 9-21-2018" (nil nil nil 21 9 2018 5 -1 nil)) + ;; Day of week is optional + ("2018-09-21" (nil nil nil 21 9 2018 nil -1 nil)) + ;; The order of date, time, etc., does not matter. + ("13:47:58, +0100, 2018-09-21, Friday" + (58 47 13 21 9 2018 5 -1 3600)) + ;; Month, day, or both, can be a single digit. + ("Friday, 2018-9-08" (nil nil nil 8 9 2018 5 -1 nil)) + ("Friday, 2018-09-8" (nil nil nil 8 9 2018 5 -1 nil)) + ("Friday, 2018-9-8" (nil nil nil 8 9 2018 5 -1 nil)))) + (let ((string (car case)) + (expected (cadr case))) + ;; Test with hyphens. + (should (equal (parse-date string) expected)) + (while (string-match "-" string) + (setq string (replace-match "/" t t string))) + ;; Test with slashes. + (should (equal (parse-date string) expected)))) + + ;; Time by itself is recognized as such. + (should (equal (parse-date "03:47:58") + '(58 47 3 nil nil nil nil -1 nil))) + ;; A leading zero for hours is optional. + (should (equal (parse-date "3:47:58") + '(58 47 3 nil nil nil nil -1 nil))) + ;; Missing seconds are assumed to be zero. + (should (equal (parse-date "3:47") + '(0 47 3 nil nil nil nil -1 nil))) + ;; AM/PM are understood (in any case combination). + (dolist (am '(am AM Am)) + (should (equal (parse-date (format "3:47 %s" am)) + '(0 47 3 nil nil nil nil -1 nil)))) + (dolist (pm '(pm PM Pm)) + (should (equal (parse-date (format "3:47 %s" pm)) + '(0 47 15 nil nil nil nil -1 nil)))) + + ;; Ensure some cases fail. + (should-error (parse-date "22 Feb 196" 'us-date)) ;; bad year + (should-error (parse-date "22 Feb 16 19:35:42")) ;; two-digit year + (should-error (parse-date "22 Feb 96 19:35:42")) ;; two-digit year + (should-error (parse-date "2 Feb 2021 1996")) ;; duplicate year + (should-error (parse-date "2020-1-1 2021")) ;; another duplicate year + (should-error (parse-date "2020-1-1 30")) ;; extra 30 (not a day)) + (should-error (parse-date "2020-1-1 12")) ;; extra 12 (not a month) + (should-error (parse-date "15:47 15:15")) ;; duplicate time + (should-error (parse-date "2020-1-1 +0800 -0800")) ;; duplicate TZ + (should-error (parse-date "15:47 PM")) ;; PM in the afternoon + (should-error (parse-date "2020-1-1 PM")) ;; PM without a time + ;; Range tests. + (should-error (parse-date "2021-12-32")) + (should-error (parse-date "2021-12-0")) + (should-error (parse-date "2021-13-3")) + (should-error (parse-date "0000-12-3")) + (should-error (parse-date "20021 Dec 3")) + (should-error (parse-date "24:21:14")) + (should-error (parse-date "14:60:21")) + (should-error (parse-date "14:21:60")) + + ;; Test ISO-8601 dates. + (dolist (format '(nil iso8601 iso-8601)) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (parse-date "1998-09-12T12:21:54-0200" format) t) + "1998-09-12 14:21:54")) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (parse-date "1998-09-12T12:21:54-0230" format) t) + "1998-09-12 14:51:54")) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (parse-date "1998-09-12T12:21:54-02:00" format) t) + "1998-09-12 14:21:54")) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (parse-date "1998-09-12T12:21:54-02" format) t) + "1998-09-12 14:21:54")) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (parse-date "1998-09-12T12:21:54+0230" format) t) + "1998-09-12 09:51:54")) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (parse-date "1998-09-12T12:21:54+02" format) t) + "1998-09-12 10:21:54")) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (parse-date "1998-09-12T12:21:54Z" format) t) + "1998-09-12 12:21:54")) + (should (equal (parse-date "1998-09-12T12:21:54") + (encode-time 54 21 12 12 9 1998))))) + +(provide 'parse-date-tests) + +;;; parse-date-tests.el ends here ^ permalink raw reply related [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-23 19:48 ` Bob Rogers @ 2021-12-24 9:29 ` Lars Ingebrigtsen 2021-12-24 15:58 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2021-12-24 9:29 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > OK, I have proceeded along those lines; WIP attached for feedback. I > changed the name to "parse-date" to avoid confusion; I was otherwise > stuck when trying to come up with a sensible name for the test file, > since parse-time-tests.el was already taken (though I suppose I could > have added to the existing file). Sounds good to me. > Which (additional) formats would you like? I'm assuming we need iso8601 > and rfc822 for compatibility (in which case rfc2822 will be easy to > provide in addition), and us-date and euro-date to disambiguate the > month/day order. Would the third format correspond to ISO 2001-01-03? > Do we want to support that? Probably not -- you mostly see that in Sweden. > +(defun parse-date (time-string &optional format) I think it'd be better if this was a cl-defmethod with an eql specifier for the format. > + iso8601 => parse the string according to the ISO-8601 > +standard. See `parse-iso8601-time-string'. > + > + iso-8601 => synonym for iso8601. And synonyms aren't necessary -- they just confuse people reading the code. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-24 9:29 ` Lars Ingebrigtsen @ 2021-12-24 15:58 ` Bob Rogers 2021-12-25 11:58 ` Lars Ingebrigtsen 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2021-12-24 15:58 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 52209 From: Lars Ingebrigtsen <larsi@gnus.org> Date: Fri, 24 Dec 2021 10:29:29 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Which (additional) formats would you like? I'm assuming we need iso8601 > and rfc822 for compatibility (in which case rfc2822 will be easy to > provide in addition), and us-date and euro-date to disambiguate the > month/day order. Would the third format correspond to ISO 2001-01-03? > Do we want to support that? Probably not -- you mostly see that in Sweden. OK (<phew> ;-}). > +(defun parse-date (time-string &optional format) I think it'd be better if this was a cl-defmethod with an eql specifier for the format. OK, good; cl-case was easier to start, but I was also beginning to think in terms of cl-defmethod. > + iso8601 => parse the string according to the ISO-8601 > +standard. See `parse-iso8601-time-string'. > + > + iso-8601 => synonym for iso8601. And synonyms aren't necessary -- they just confuse people reading the code. OK. I added the synonym because RFCs are always spelled without the hyphen, but I wasn't sure about the convention for ISO standards. And it seems that there isn't a well defined precedent in the Emacs sources; C programmers mostly avoid the hyphen, but Elisp programmers are more evenly split: rogers@orion> find . -name '*.el' | xargs cat | tr A-Z a-z | grep -c 'iso-[0-9]' 702 rogers@orion> find . -name '*.el' | xargs cat | tr A-Z a-z | grep -c 'iso[0-9]' 798 rogers@orion> find . -name '*.[ch]' | xargs cat | tr A-Z a-z | grep -c 'iso-[0-9]' 47 rogers@orion> find . -name '*.[ch]' | xargs cat | tr A-Z a-z | grep -c 'iso[0-9]' 148 rogers@orion> So which do you prefer? I'm also looking at defining a date-parse-error condition with a few error symbol "subclasses," but I'm wondering about the tradeoff between having enough error symbols for precision in error reporting vs. cluttering the code with too many. Thoughts? -- Bob ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-24 15:58 ` Bob Rogers @ 2021-12-25 11:58 ` Lars Ingebrigtsen 2021-12-25 22:50 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2021-12-25 11:58 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > So which do you prefer? With a hyphen. > I'm also looking at defining a date-parse-error condition with a few > error symbol "subclasses," but I'm wondering about the tradeoff between > having enough error symbols for precision in error reporting > vs. cluttering the code with too many. Thoughts? Having a `date-parse-error' would be fine, but I'm unsure about the utility of having a bunch of sub-errors, but perhaps you have a use case in mind? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-25 11:58 ` Lars Ingebrigtsen @ 2021-12-25 22:50 ` Bob Rogers 2021-12-26 11:31 ` Lars Ingebrigtsen 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2021-12-25 22:50 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 52209 From: Lars Ingebrigtsen <larsi@gnus.org> Date: Sat, 25 Dec 2021 12:58:14 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > I'm also looking at defining a date-parse-error condition with a > few error symbol "subclasses," but I'm wondering about the tradeoff > between having enough error symbols for precision in error > reporting vs. cluttering the code with too many. Thoughts? Having a `date-parse-error' would be fine, but I'm unsure about the utility of having a bunch of sub-errors, but perhaps you have a use case in mind? My only motivation is that I think it would make the resulting error message clearer. For example, passing a malformed ISO 8601 date to iso8601-parse just signals wrong-type-argument, which is not very helpful. Multiple errors would allow me to specify the problem in detail, while still classifying them as date/time parsing errors. Here are four that I have in mind: Unknown date/time token: X Illegal date/time value for field: <field>, X Duplicate date/time value for field: <field>, X Date/time value for field out of range: <field>, X, <min>, <max> This doesn't quite cover the 14 calls to `error' that are in the current version of the code, in that they wouldn't be as precise, but they should be adequate. On the other hand, this might be overkill for callers of parse-date, who, being deep in their own logic, might only care that some date they have to deal with is invalid. Which is why I wanted an opinion from someone with the big picture -- I admit I am biased (and a bit annoyed) from too often having to study the code to figure out why some perfectly reasonable date I supply is being misinterpreted. -- Bob ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-25 22:50 ` Bob Rogers @ 2021-12-26 11:31 ` Lars Ingebrigtsen 2021-12-28 15:52 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2021-12-26 11:31 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > On the other hand, this might be overkill for callers of parse-date, > who, being deep in their own logic, might only care that some date they > have to deal with is invalid. Which is why I wanted an opinion from > someone with the big picture -- I admit I am biased (and a bit annoyed) > from too often having to study the code to figure out why some perfectly > reasonable date I supply is being misinterpreted. Better errors messages are possible without making many specific error symbols, though. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-26 11:31 ` Lars Ingebrigtsen @ 2021-12-28 15:52 ` Bob Rogers 2021-12-29 15:19 ` Lars Ingebrigtsen 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2021-12-28 15:52 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 52209 [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 947 bytes --] From: Lars Ingebrigtsen <larsi@gnus.org> Date: Sun, 26 Dec 2021 12:31:07 +0100 Better errors messages are possible without making many specific error symbols, though. OK, I think I have a good solution that uses a single error symbol; let me know what you think. (Having never done much with Elisp conditions, I was still thinking in terms of Common Lisp, so I had to go scratch my head for a while.) I am currently working on broadening what the parser will accept, though I think it is close to a usable state. I am using the documentation for the Perl Date::Parse module to see what it accepts, and will then look at the corresponding Python and Ruby modules for further ideas. I am not planning to adopt everything I see, though; in particular, I think it's a good idea for new code to stick to insisting on four-digit years except when the caller has specified a format that determines the month/day order. -- Bob [-- Attachment #2: Type: text/x-patch, Size: 33616 bytes --] diff --git a/lisp/calendar/parse-date.el b/lisp/calendar/parse-date.el new file mode 100644 index 0000000000..10bd939e91 --- /dev/null +++ b/lisp/calendar/parse-date.el @@ -0,0 +1,472 @@ +;;; parse-date.el --- parsing time/date strings -*- lexical-binding: t -*- + +;; Copyright (C) 2021 Free Software Foundation, Inc. + +;; Author: Bob Rogers <rogers@rgrjr.com> +;; Keywords: util + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: + +;; 'parse-date' parses a time and/or date in a string and returns a +;; list of values, just like `decode-time', where unspecified elements +;; in the string are returned as nil (except unspecified DST is +;; returned as -1). `encode-time' may be applied on these values to +;; obtain an internal time value. If left to its own devices, it +;; accepts a wide variety of formats, but can be told to insist on a +;; particular date/time format. + +;; Historically, `parse-time-string' was used for this purpose, but it +;; was focused on email date formats, and gradually but imperfectly +;; extended to handle other formats. 'parse-date' is compatible in +;; that it parses the same input formats and uses the same return +;; value format, but is stricter in that it signals an error for +;; tokens that `parse-time-string' would simply ignore. + +;;; TODO: +;; +;; * Add a euro-date format for DD/MM/YYYY ? +;; + +;;; Code: + +(require 'cl-lib) +(require 'iso8601) +(require 'parse-time) + +(define-error 'date-parse-error "Date/time parse error" 'error) + +(defconst parse-date--ends-with-alpha-tz-re + (concat " \\(" (mapconcat #'car parse-time-zoneinfo "\\|") "\\)$") + "Recognize an alphanumeric timezone at the end of the string.") + +(defun parse-date--guess-rfc822-formats (date-string) + (let ((case-fold-search t)) + (cond ((string-match "(" date-string) 'rfc2822) + ((string-match parse-date--ends-with-alpha-tz-re date-string) + ;; Alphabetic timezones are legacy syntax. + 'rfc822) + ((string-match " [-+][0-9][0-9][0-9][0-9][ \t\n]*\\($\\}(\\)" + date-string) + ;; Note that an ISO-8601 timezone has a colon in the middle + ;; and no preceding space. + 'rfc2822) + (t nil)))) + +(defun parse-date--guess-format (date-string) + (cond ((iso8601-valid-p date-string) 'iso-8601) + ((parse-date--guess-rfc822-formats date-string)) + (t nil))) + +(defun parse-date--ignore-char? (char) + ;; Ignore whitespace and commas. + (or (eq char ?\ ) (eq char ?\t) (eq char ?\r) (eq char ?\n) (eq char ?,))) + +(defun parse-date--tokenize-string (string &optional strip-fws?) + "Turn STRING into tokens, separated only by whitespace and commas. +Multiple commas are ignored. Pure digit sequences are turned +into integers. If STRIP-FWS? is true, then folding whitespace as +defined by RFC2822 (strictly, the CFWS production that also +accepts comments) is stripped out by treating it like whitespace; +if it's value is the symbol `first', we exit when we see the +first '(' (per RFC2822), else we strip them all (per RFC822)." + (let ((index 0) + (end (length string)) + (fws-eof? (eq strip-fws? 'first)) + (list ())) + (when fws-eof? + ;; In order to stop on the first "(", we need to see it as + ;; non-whitespace. + (setq strip-fws? nil)) + (cl-flet ((skip-ignored () + ;; Skip ignored characters at index (the scan + ;; position). Skip RFC822 comments in matched parens + ;; if strip-fws? is true, but do not complain about + ;; unterminated comments. + (let ((char nil) + (nest 0)) + (while (and (< index end) + (setq char (aref string index)) + (or (> nest 0) + (parse-date--ignore-char? char) + (and strip-fws? (eql char ?\()))) + (cl-incf index) + ;; FWS bookkeeping. + (cond ((not strip-fws?)) + ((and (eq char ?\\) + (< (1+ index) end)) + ;; Move to the next char but don't check + ;; it to see if it might be a paren. + (cl-incf index)) + ((eq char ?\() (cl-incf nest)) + ((eq char ?\)) (cl-decf nest))))))) + (skip-ignored) ;; Skip leading whitespace. + (while (and (< index end) + (not (and fws-eof? + (eq (aref string index) ?\()))) + (let* ((start index) + (char (aref string index)) + (all-digits (<= ?0 char ?9))) + ;; char is valid; look for more valid characters. + (when (and strip-fws? + (eq char ?\\) + (< (1+ index) end)) + ;; Escaped character, which might be a "(". If so, we are + ;; correct to include it in the token, even though the + ;; caller is sure to barf. If not, we violate RFC2?822 by + ;; not removing the backslash, but no characters in valid + ;; RFC2?822 dates need escaping anyway, so it shouldn't + ;; matter that this is not done strictly correctly. -- + ;; rgr, 24-Dec-21. + (cl-incf index)) + (while (and (< (cl-incf index) end) + (setq char (aref string index)) + (not (or (parse-date--ignore-char? char) + (and strip-fws? + (eq char ?\())))) + (unless (<= ?0 char ?9) + (setq all-digits nil)) + (when (and strip-fws? + (eq char ?\\) + (< (1+ index) end)) + ;; Escaped character, see above. + (cl-incf index))) + (push (if all-digits + (cl-parse-integer string :start start :end index) + (substring string start index)) + list) + (skip-ignored))) + (nreverse list)))) + +(defconst parse-date--slot-names + '(second minute hour day month year weekday dst zone) + "Names of return value slots, for better error messages +See the decoded-time defstruct.") + +(defconst parse-date--slot-ranges + '((0 60) (0 59) (0 23) (1 31) (1 12) (1 9999)) + "Numeric slot ranges, for bounds checking. +Note that RFC2822 explicitly requires that seconds go up to 60, +to allow for leap seconds (see Mills, D., 'Network Time +Protocol', STD 12, RFC 1119, September 1989).") + +(defun parse-date--x822 (time-string obs-format?) + ;; Parse an RFC2822 or (if obs-format? is true) RFC822 date. The + ;; strict syntax for the former is as follows: + ;; + ;; [ day-of-week "," ] day FWS month-name FWS year FWS time [CFWS] + ;; + ;; where "time" is: + ;; + ;; 2DIGIT ":" 2DIGIT [ ":" 2DIGIT ] FWS ( "+" / "-" ) 4DIGIT + ;; + ;; RFC822 also accepts comments in random places (which is handled + ;; by parse-date--tokenize-string) and two-digit years. We are + ;; somewhat more lax in what we accept (specifically, the hours + ;; don't have to be two digits, and the TZ and the comma after the + ;; DOW are optional), but we do insist that the items that are + ;; present do appear in this order. + (let ((tokens (parse-date--tokenize-string (downcase time-string) + (if obs-format? 'all 'first))) + (time (list nil nil nil nil nil nil nil -1 nil))) + (cl-labels ((set-matched-slot (slot index token) + ;; Assign a slot value from match data if index is + ;; non-nil, else from token, signalling an error if + ;; it's already been assigned or is out of range. + (let ((value (if index + (cl-parse-integer (match-string index token)) + token)) + (range (nth slot parse-date--slot-ranges))) + (unless (equal (nth slot time) + (if (= slot 7) -1 nil)) + (signal 'date-parse-error + (list "Duplicate slot value" + (nth slot parse-date--slot-names) token))) + (when (and range + (not (<= (car range) value (cadr range)))) + (signal 'date-parse-error + (list "Slot out of range" + (nth slot parse-date--slot-names) + token (car range) (cadr range)))) + (setf (nth slot time) value))) + (set-numeric (slot token) + (unless (natnump token) + (signal 'date-parse-error + (list "Not a number" + (nth slot parse-date--slot-names) token))) + (set-matched-slot slot nil token))) + ;; Check for weekday. + (let ((dow (assoc (car tokens) parse-time-weekdays))) + (when dow + ;; Day of the week. + (set-matched-slot 6 nil (cdr dow)) + (pop tokens))) + ;; Day. + (set-numeric 3 (pop tokens)) + ;; Alphabetic month. + (let* ((month (pop tokens)) + (match (assoc month parse-time-months))) + (if match + (set-matched-slot 4 nil (cdr match)) + (signal 'date-parse-error + (list "Expected an alphabetic month" month)))) + ;; Year. + (let ((year (pop tokens))) + ;; Check the year for the right number of digits. + (cond ((> year 1000) + (set-numeric 5 year)) + ((or (not obs-format?) + (>= year 100)) + "Four digit years are required but found '%s'" year) + ((>= year 50) + ;; second half of the 20th century. + (set-numeric 5 (+ 1900 year))) + (t + ;; first half of the 21st century. + (set-numeric 5 (+ 2000 year))))) + ;; Time. + (let ((time (pop tokens))) + (cond ((or (null time) (natnump time)) + (signal 'date-parse-error + (list "Expected a time" time))) + ((string-match + "^\\([0-9][0-9]?\\):\\([0-9][0-9]\\):\\([0-9][0-9]\\)$" + time) + (set-matched-slot 2 1 time) + (set-matched-slot 1 2 time) + (set-matched-slot 0 3 time)) + ((string-match "^\\([0-9][0-9]?\\):\\([0-9][0-9]\\)$" time) + ;; Time without seconds. + (set-matched-slot 2 1 time) + (set-matched-slot 1 2 time) + (set-matched-slot 0 nil 0)) + (t + (signal 'date-parse-error + (list "Expected a time" time))))) + ;; Timezone. + (let* ((zone (pop tokens)) + (match (assoc zone parse-time-zoneinfo))) + (cond (match + (set-matched-slot 8 nil (cadr match)) + (set-matched-slot 7 nil (caddr match))) + ((and (stringp zone) + (string-match "^[-+][0-9][0-9][0-9][0-9]$" zone)) + ;; Numeric time zone. + (set-matched-slot + 8 nil + (* 60 + (+ (cl-parse-integer zone :start 3 :end 5) + (* 60 (cl-parse-integer zone :start 1 :end 3))) + (if (= (aref zone 0) ?-) -1 1)))) + (zone + (signal 'date-parse-error + (list "Expected a timezone" zone))))) + (when tokens + (signal 'date-parse-error + (list "Extra token(s)" (car tokens))))) + time)) + +(defun parse-date--default (time-string two-digit-year?) + ;; Do the standard parsing thing. This is mostly free form, in that + ;; tokens may appear in any order, but we expect to introduce some + ;; state dependence. + (let ((tokens (parse-date--tokenize-string (downcase time-string))) + (time (list nil nil nil nil nil nil nil -1 nil))) + (cl-flet ((set-matched-slot (slot index token) + ;; Assign a slot value from match data if index is + ;; non-nil, else from token, signalling an error if + ;; it's already been assigned or is out of range. + (let ((value (if index + (cl-parse-integer (match-string index token)) + token)) + (range (nth slot parse-date--slot-ranges))) + (unless (equal (nth slot time) + (if (= slot 7) -1 nil)) + (signal 'date-parse-error + (list "Duplicate slot value" + (nth slot parse-date--slot-names) token))) + (when (and range + (not (<= (car range) value (cadr range)))) + (signal 'date-parse-error + (list "Slot out of range" + (nth slot parse-date--slot-names) + token (car range) (cadr range)))) + (setf (nth slot time) value)))) + (while tokens + (let ((token (pop tokens)) + (match nil)) + (cond ((numberp token) + ;; A bare number could be a month, day, or year. + ;; The order of these tests matters greatly. + (cond ((>= token 1000) + (set-matched-slot 5 nil token)) + ((and (<= 1 token 31) + (not (nth 3 time))) + ;; Assume days come before months or years. + (set-matched-slot 3 nil token)) + ((and (<= 1 token 12) + (not (nth 4 time))) + ;; Assume days come before years. + (set-matched-slot 4 nil token)) + ((or (nth 5 time) + (not two-digit-year?) + (> token 100)) + (signal 'date-parse-error + (list "Unrecognized token" token))) + ;; It's a two-digit year. + ((>= token 50) + ;; second half of the 20th century. + (set-matched-slot 5 nil (+ 1900 token))) + (t + ;; first half of the 21st century. + (set-matched-slot 5 nil (+ 2000 token))))) + ((setq match (assoc token parse-time-weekdays)) + (set-matched-slot 6 nil (cdr match))) + ((setq match (assoc token parse-time-months)) + (set-matched-slot 4 nil (cdr match))) + ((setq match (assoc token parse-time-zoneinfo)) + (set-matched-slot 8 nil (cadr match)) + (set-matched-slot 7 nil (caddr match))) + ((string-match "^[-+][0-9][0-9][0-9][0-9]$" token) + ;; Numeric time zone. + (set-matched-slot + 8 nil + (* 60 + (+ (cl-parse-integer token :start 3 :end 5) + (* 60 (cl-parse-integer token :start 1 :end 3))) + (if (= (aref token 0) ?-) -1 1)))) + ((string-match + "^\\([0-9][0-9][0-9][0-9]\\)[-/]\\([0-9][0-9]?\\)[-/]\\([0-9][0-9]?\\)$" + token) + ;; ISO-8601-style date (YYYY-MM-DD). + (set-matched-slot 5 1 token) + (set-matched-slot 4 2 token) + (set-matched-slot 3 3 token)) + ((string-match + "^\\([0-9][0-9]?\\)[-/]\\([0-9][0-9]?\\)[-/]\\([0-9][0-9][0-9][0-9]\\)$" + token) + ;; US date (MM-DD-YYYY), but we insist on four + ;; digits for the year. + (set-matched-slot 4 1 token) + (set-matched-slot 3 2 token) + (set-matched-slot 5 3 token)) + ((string-match + "^\\([0-9][0-9]?\\):\\([0-9][0-9]\\):\\([0-9][0-9]\\)$" + token) + (set-matched-slot 2 1 token) + (set-matched-slot 1 2 token) + (set-matched-slot 0 3 token)) + ((string-match "^\\([0-9][0-9]?\\):\\([0-9][0-9]\\)$" token) + ;; Time without seconds. + (set-matched-slot 2 1 token) + (set-matched-slot 1 2 token) + (set-matched-slot 0 nil 0)) + ((member token '("am" "pm")) + (unless (nth 2 time) + (signal 'date-parse-error + (list "Missing time" token))) + (unless (<= (nth 2 time) 12) + (signal 'date-parse-error + (list "Time already past noon" token))) + (when (equal token "pm") + (cl-incf (nth 2 time) 12))) + (t + (signal 'date-parse-error + (list "Unrecognized token" token))))))) + time)) + +;;;###autoload +(cl-defgeneric parse-date (time-string &optional format) + "Parse TIME-STRING according to FORMAT, returning a list. +The FORMAT value is a symbol that may be one of the following: + + iso-8601 => parse the string according to the ISO-8601 +standard. See `parse-iso8601-time-string'. + + rfc822 => parse an RFC822 (old email) date, which allows +two-digit years and internal '()' comments. In dates of the form +'11 Jan 12', the 11 is assumed to be the day, and the 12 is +assumed to mean 2012. Be sure you really want this; the format +is more limited than most human-supplied dates. + + rfc2822 => parse an RFC2822 (new email) date, which allows +only four-digit years. Again, this is a fairly restricted +format, with fields required to be in a specified order and +representation. + + us-date => parse a US-style date, of the form MM/DD/YYYY, but +allowing two-digit years. In dates of the form '01/11/12', the 1 +is the month, 11 is the day, and the 12 is assumed to mean 2012. + + nil => like us-date with two-digit years disallowed. + +Anything else is treated as iso-8601 if it looks similar, else +us-date with two-digit years disallowed. + + * For all formats except iso-8601, parsing is case-insensitive. + + * Commas and whitespace are ignored. + + * In date specifications, either '/' or '-' may be used to +separate components, but all three components must be given. + + * A date that starts with four digits is YYYY-MM-DD, ISO-8601 +style, but a date that ends with four digits is MM-DD-YYYY [at +least in us-date format]. + + * Two digit years, when allowed, are in the 1900's when +between 50 and 99 inclusive and in the 2000's when between 0 and +49 inclusive. + +A `date-parse-error' is signalled when time values are duplicated, +unrecognized, or out of range. No consistency checks between +fields are done. For instance, the weekday is not checked to see +that it corresponds to the date, and parse-date complains about +the 32nd of March (or any other month) but blithely accepts the +29th of February in non-leap years -- or the 31st of February in +any year. + +The result is a list of (SEC MIN HOUR DAY MON YEAR DOW DST TZ), +which can be accessed as a decoded-time defstruct (q.v.), +e.g. `decoded-time-year' to extract the year, and turned into an +Emacs timestamp by `encode-time'. The values returned are +identical to those of `decode-time', but any unknown values other +than DST are returned as nil, and an unknown DST value is +returned as -1.") + +(cl-defmethod parse-date (time-string (_format (eql iso-8601))) + (iso8601-parse time-string)) + +(cl-defmethod parse-date (time-string (_format (eql rfc2822))) + (parse-date--x822 time-string nil)) + +(cl-defmethod parse-date (time-string (_format (eql rfc822))) + (parse-date--x822 time-string t)) + +(cl-defmethod parse-date (time-string (_format (eql us-date))) + (parse-date--default time-string t)) + +(cl-defmethod parse-date (time-string (_format (eql nil))) + (parse-date--default time-string nil)) + +(cl-defmethod parse-date (time-string _format) + ;; Re-dispatch after guessing the format. + (parse-date time-string (parse-date--guess-format time-string))) + +(provide 'parse-date) + +;;; parse-date.el ends here diff --git a/test/lisp/calendar/parse-date-tests.el b/test/lisp/calendar/parse-date-tests.el new file mode 100644 index 0000000000..bd2b344d71 --- /dev/null +++ b/test/lisp/calendar/parse-date-tests.el @@ -0,0 +1,247 @@ +;;; parse-date-tests.el --- Test suite for parse-date.el -*- lexical-binding:t -*- + +;; Copyright (C) 2016-2021 Free Software Foundation, Inc. + +;; Author: Lars Ingebrigtsen <larsi@gnus.org> + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: + +;;; Code: + +(require 'ert) +(require 'parse-date) + +(ert-deftest parse-date-tests () + "Test basic parse-date functionality." + + ;; Test tokenization. + (should (equal (parse-date--tokenize-string " ") '())) + (should (equal (parse-date--tokenize-string " a b") '("a" "b"))) + (should (equal (parse-date--tokenize-string "a bbc dde") '("a" "bbc" "dde"))) + (should (equal (parse-date--tokenize-string " , a 27 b,, c 14:32 ") + '("a" 27 "b" "c" "14:32"))) + ;; Some folding whitespace tests. + (should (equal (parse-date--tokenize-string " a b (end) c" 'first) + '("a" "b"))) + (should (equal (parse-date--tokenize-string "(quux)a (foo (bar)) b(baz)" t) + '("a" "b"))) + (should (equal (parse-date--tokenize-string "a b\\cde" 'all) + ;; Strictly incorrect, but strictly unnecessary syntax. + '("a" "b\\cde"))) + (should (equal (parse-date--tokenize-string "a b\\ de" 'all) + '("a" "b\\ de"))) + (should (equal (parse-date--tokenize-string "a \\de \\(f" 'all) + '("a" "\\de" "\\(f"))) + + ;; Start with some compatible RFC822 dates. + (dolist (format '(nil rfc822 rfc2822)) + (should (equal (parse-date "Mon, 22 Feb 2016 19:35:42 +0100" format) + '(42 35 19 22 2 2016 1 -1 3600))) + (should (equal (parse-date "22 Feb 2016 19:35:42 +0100" format) + '(42 35 19 22 2 2016 nil -1 3600))) + (should (equal (parse-date "Mon, 22 February 2016 19:35:42 +0100" format) + '(42 35 19 22 2 2016 1 -1 3600))) + (should (equal (parse-date "Mon, 22 feb 2016 19:35:42 +0100" format) + '(42 35 19 22 2 2016 1 -1 3600))) + (should (equal (parse-date "Monday, 22 february 2016 19:35:42 +0100" format) + '(42 35 19 22 2 2016 1 -1 3600))) + (should (equal (parse-date "Monday, 22 february 2016 19:35:42 PST" format) + '(42 35 19 22 2 2016 1 nil -28800))) + (should (equal (parse-date "Friday, 21 Sep 2018 13:47:58 PDT" format) + '(58 47 13 21 9 2018 5 t -25200))) + (should (equal (parse-date "Friday, 21 Sep 2018 13:47:58" format) + '(58 47 13 21 9 2018 5 -1 nil)))) + ;; These are not allowed by the default format. + (should (equal (parse-date "22 Feb 16 19:35:42 +0100" 'rfc822) + '(42 35 19 22 2 2016 nil -1 3600))) + (should (equal (parse-date "22 Feb 96 19:35:42 +0100" 'rfc822) + '(42 35 19 22 2 1996 nil -1 3600))) + ;; Try them again with comments. + (should (equal (parse-date "22 Feb (today) 16 19:35:42 +0100" 'rfc822) + '(42 35 19 22 2 2016 nil -1 3600))) + (should (equal (parse-date "22 Feb 96 (long ago) 19:35:42 +0100" 'rfc822) + '(42 35 19 22 2 1996 nil -1 3600))) + (should (equal (parse-date + "Friday, 21 Sep(comment \\) with \\( parens)18 19:35:42" + 'rfc822) + '(42 35 19 21 9 2018 5 -1 nil))) + (should (equal (parse-date + "Friday, 21 Sep 18 19:35:42 (unterminated comment" + 'rfc822) + '(42 35 19 21 9 2018 5 -1 nil))) + + ;; Test some RFC822 error cases + (dolist (test '(("33 1 2022" ("Slot out of range" day 33 1 31)) + ("0 1 2022" ("Slot out of range" day 0 1 31)) + ("1 1 2020 2021" ("Expected an alphabetic month" 1)) + ("1 Jan 2020 2021" ("Expected a time" 2021)) + ("1 Jan 2020 20:21 2000" ("Expected a timezone" 2000)) + ("1 Jan 2020 20:21 +0200 33" ("Extra token(s)" 33)))) + (should (equal (condition-case err (parse-date (car test) 'rfc822) + (date-parse-error (cdr err))) + (cadr test)))) + + ;; And these are not allowed by rfc822 because of missing time. + (should (equal (parse-date "Friday, 21 Sep 2018" nil) + '(nil nil nil 21 9 2018 5 -1 nil))) + (should (equal (parse-date "22 Feb 2016 +0100" nil) + '(nil nil nil 22 2 2016 nil -1 3600))) + + ;; Test the default format with both hyphens and slashes in dates. + (dolist (case '(;; Month can be numeric if date uses hyphens/slashes. + ("Friday, 2018-09-21" (nil nil nil 21 9 2018 5 -1 nil)) + ;; Year can come last if four digits. + ("Friday, 9-21-2018" (nil nil nil 21 9 2018 5 -1 nil)) + ;; Day of week is optional + ("2018-09-21" (nil nil nil 21 9 2018 nil -1 nil)) + ;; The order of date, time, etc., does not matter. + ("13:47:58, +0100, 2018-09-21, Friday" + (58 47 13 21 9 2018 5 -1 3600)) + ;; Month, day, or both, can be a single digit. + ("Friday, 2018-9-08" (nil nil nil 8 9 2018 5 -1 nil)) + ("Friday, 2018-09-8" (nil nil nil 8 9 2018 5 -1 nil)) + ("Friday, 2018-9-8" (nil nil nil 8 9 2018 5 -1 nil)))) + (let ((string (car case)) + (expected (cadr case))) + ;; Test with hyphens. + (should (equal (parse-date string nil) expected)) + (while (string-match "-" string) + (setq string (replace-match "/" t t string))) + ;; Test with slashes. + (should (equal (parse-date string nil) expected)))) + + ;; Time by itself is recognized as such. + (should (equal (parse-date "03:47:58" nil) + '(58 47 3 nil nil nil nil -1 nil))) + ;; A leading zero for hours is optional. + (should (equal (parse-date "3:47:58" nil) + '(58 47 3 nil nil nil nil -1 nil))) + ;; Missing seconds are assumed to be zero. + (should (equal (parse-date "3:47" nil) + '(0 47 3 nil nil nil nil -1 nil))) + ;; AM/PM are understood (in any case combination). + (dolist (am '(am AM Am)) + (should (equal (parse-date (format "3:47 %s" am) nil) + '(0 47 3 nil nil nil nil -1 nil)))) + (dolist (pm '(pm PM Pm)) + (should (equal (parse-date (format "3:47 %s" pm) nil) + '(0 47 15 nil nil nil nil -1 nil)))) + + ;; Ensure some cases fail. + (should-error (parse-date "22 Feb 196" 'us-date)) + (should-error (parse-date "22 Feb 16 19:35:42" nil)) + (should-error (parse-date "22 Feb 96 19:35:42" nil)) ;; two-digit year + (should-error (parse-date "2 Feb 2021 1996" nil)) ;; duplicate year + + (dolist (test '(("22 Feb 196" 'us-date ;; bad year + ("Unrecognized token" 196)) + ("22 Feb 16 19:35:42" nil ;; two-digit year + ("Unrecognized token" 16)) + ("22 Feb 96 19:35:42" nil ;; two-digit year + ("Unrecognized token" 96)) + ("2 Feb 2021 1996" nil + ("Duplicate slot value" year 1996)) + ("2020-1-1 2021" nil + ("Duplicate slot value" year 2021)) + ("22 Feb 196" 'us-date + ("Unrecognized token" 196)) + ("22 Feb 16 19:35:42" nil + ("Unrecognized token" 16)) + ("22 Feb 96 19:35:42" nil + ("Unrecognized token" 96)) + ("2 Feb 2021 1996" nil + ("Duplicate slot value" year 1996)) + ("2020-1-1 30" nil + ("Unrecognized token" 30)) + ("2020-1-1 12" nil + ("Unrecognized token" 12)) + ("15:47 15:15" nil + ("Duplicate slot value" hour "15:15")) + ("2020-1-1 +0800 -0800" t + ("Duplicate slot value" zone -28800)) + ("15:47 PM" nil + ("Time already past noon" "pm")) + ("15:47 AM" nil + ("Time already past noon" "am")) + ("2020-1-1 PM" nil + ("Missing time" "pm")) + ;; Range tests + ("2021-12-32" nil + ("Slot out of range" day "2021-12-32" 1 31)) + ("2021-12-0" nil + ("Slot out of range" day "2021-12-0" 1 31)) + ("2021-13-3" nil + ("Slot out of range" month "2021-13-3" 1 12)) + ("0000-12-3" nil + ("Slot out of range" year "0000-12-3" 1 9999)) + ("20021 Dec 3" nil + ("Slot out of range" year 20021 1 9999)) + ("24:21:14" nil + ("Slot out of range" hour "24:21:14" 0 23)) + ("14:60:21" nil + ("Slot out of range" minute "14:60:21" 0 59)) + ("14:21:61" nil + ("Slot out of range" second "14:21:61" 0 60)))) + (should (equal (condition-case err (parse-date (car test) (cadr test)) + (date-parse-error (cdr err))) + (caddr test)))) + (should (equal (parse-date "14:21:60" nil) ;; a leap second! + '(60 21 14 nil nil nil nil -1 nil))) + + ;; Test ISO-8601 dates. + (dolist (format '(t iso-8601)) + (should (equal (parse-date "1998-09-12T12:21:54-0200" format) + '(54 21 12 12 9 1998 nil nil -7200))) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (encode-time + (parse-date "1998-09-12T12:21:54-0230" format)) + t) + "1998-09-12 14:51:54")) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (encode-time + (parse-date "1998-09-12T12:21:54-02:00" format)) + t) + "1998-09-12 14:21:54")) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (encode-time + (parse-date "1998-09-12T12:21:54-02" format)) + t) + "1998-09-12 14:21:54")) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (encode-time + (parse-date "1998-09-12T12:21:54+0230" format)) + t) + "1998-09-12 09:51:54")) + (should (equal (format-time-string + "%Y-%m-%d %H:%M:%S" + (encode-time + (parse-date "1998-09-12T12:21:54+02" format)) + t) + "1998-09-12 10:21:54")) + (should (equal (parse-date "1998-09-12T12:21:54Z" t) + '(54 21 12 12 9 1998 nil nil 0))) + (should (equal (parse-date "1998-09-12T12:21:54" format) + '(54 21 12 12 9 1998 nil -1 nil))))) + +(provide 'parse-date-tests) + +;;; parse-date-tests.el ends here ^ permalink raw reply related [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-28 15:52 ` Bob Rogers @ 2021-12-29 15:19 ` Lars Ingebrigtsen 2021-12-29 19:29 ` Paul Eggert 2021-12-30 21:08 ` Bob Rogers 0 siblings, 2 replies; 40+ messages in thread From: Lars Ingebrigtsen @ 2021-12-29 15:19 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209, Paul Eggert Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > I am currently working on broadening what the parser will accept, > though I think it is close to a usable state. Makes sense to me. Perhaps Paul has some comments; added to the CCs. > +(cl-defmethod parse-date (time-string (_format (eql iso-8601))) By the way, this should be (cl-defmethod parse-date (time-string (_format (eql 'iso-8601))) now -- we're transitioning to eval-ing the eql specifier. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-29 15:19 ` Lars Ingebrigtsen @ 2021-12-29 19:29 ` Paul Eggert 2021-12-29 22:01 ` Bob Rogers 2021-12-30 21:08 ` Bob Rogers 1 sibling, 1 reply; 40+ messages in thread From: Paul Eggert @ 2021-12-29 19:29 UTC (permalink / raw) To: Lars Ingebrigtsen, Bob Rogers; +Cc: 52209 On 12/29/21 07:19, Lars Ingebrigtsen wrote: > Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > >> I am currently working on broadening what the parser will accept, >> though I think it is close to a usable state. > > Makes sense to me. Perhaps Paul has some comments; added to the CCs. My first comment is "be careful what you're getting into" :-). I'm trying to retire from date-parsing as its users are never happy and rightly so. But here goes. I took a quick look at <https://bugs.gnu.org/52209#58> and have a few comments. * Calling it parse-date is a bit confusing, as it parses both dates and times. I suggest calling it parse-timestamp or parse-date-time instead. (I know the existing package is called parse-time but we can't fix that.) * If the package is called X, the error should be called X-error. Currently the package is called parse-date and the error is called date-parse-error, which is confusing. * The patch should also modify the comment at the start of parse-time.el to indicate parse-date-time as another possibility. * I suggest preferring the symbol 'rfc-email' for parsing email-related dates, for consistency with the --rfc-email option of GNU 'date'. This should use the current RFC (5322 now, perhaps updated later). I suppose you could also advertise 'rfc-822' for strict RFC 822 conformance, and similarly 'rfc2822' for strict 2822 conformance, but I expect these alternatives would be less useful in practice. > + nil => like us-date with two-digit years disallowed. This doesn't sound like a good default. For example, it completely mishandles dates in Brazil, which use DD/MM/YYYY format. > +Anything else is treated as iso-8601 if it looks similar, else > +us-date with two-digit years disallowed. This might be a better default (for nil), but it should have an explicit name other than nil. > + * For all formats except iso-8601, parsing is case-insensitive. It's pretty common for ISO 8601 parsers to be case-insensitive. For example, Java's OffsetDateTime.parse(CharSequence) allow both lower and upper case T and Z. Perhaps some people need strict ISO 8601 parsers, but I imagine a more-generous parser would be more useful. So you could have iso-8601 and iso-8601-strict; or you could have a strictness arg; or something like that. > + * Commas and whitespace are ignored. This is quite wrong for some formats, if you want to be strict. And even if not, commas are part of ISO 8601 format and can't be ignored if I understand what you mean by "ignored". > + * Two digit years, when allowed, are in the 1900's when > +between 50 and 99 inclusive and in the 2000's when between 0 and > +49 inclusive. This disagrees with the POSIX standard for 'date' (supported by GNU 'date'), which says 69-99 are treated as 1969-1999 and 00-68 are treated as 2000-2068. I suggest going with the POSIX heuristic if you're going to use a fixed heuristic for dates at all. Better might be to have an optional argument of context specifying the default time for incomplete timestamps. You can use that the context to fill in more-significant parts that are missing. E.g., if the year is missing, you take it from the context; if the century is missing, you take that from the context. The default context would be empty, i.e., missing years or centuries would be an error. For more formats that need parsing, see: https://en.wikipedia.org/wiki/Date_format_by_country https://metacpan.org/search?q=datetime%3A%3Aformat You don't need to support them all now, but you should take a look at what's out there and make sure the API can be extended to handle them. ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-29 19:29 ` Paul Eggert @ 2021-12-29 22:01 ` Bob Rogers 2021-12-30 5:32 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2021-12-29 22:01 UTC (permalink / raw) To: Paul Eggert; +Cc: Lars Ingebrigtsen, 52209 From: Paul Eggert <eggert@cs.ucla.edu> Date: Wed, 29 Dec 2021 11:29:44 -0800 On 12/29/21 07:19, Lars Ingebrigtsen wrote: > Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > >> I am currently working on broadening what the parser will accept, >> though I think it is close to a usable state. > > Makes sense to me. Perhaps Paul has some comments; added to the CCs. My first comment is "be careful what you're getting into" :-). I'm trying to retire from date-parsing as its users are never happy and rightly so. No worries; I have spent more of my career than I like to think about dealing with date/time issues, so I know what a can of worms I am in the process of opening. But here goes. I took a quick look at <https://bugs.gnu.org/52209#58> and have a few comments. They are greatly appreciated; thank you. * Calling it parse-date is a bit confusing, as it parses both dates and times. I suggest calling it parse-timestamp or parse-date-time instead. (I know the existing package is called parse-time but we can't fix that.) Lars originally suggested parse-time, but there's already a parse-time-tests.el, so I switched to parse-date so I could use parse-date-tests.el to correspond. So the namespace is already crowded. But I would be OK with either of those alternatives. Since it will actually give you either date or time, or both, parse-date-time might make more sense. * If the package is called X, the error should be called X-error. Currently the package is called parse-date and the error is called date-parse-error, which is confusing. My thought was that for the "parse-date" function, the verb should come before the noun, and in "date-parse-error", the "date" is an adjective further modifying "parse error." But I think I'm way fussier about these things than anybody I know, so your point is well taken. * The patch should also modify the comment at the start of parse-time.el to indicate parse-date-time as another possibility. I took that as a late-stage task, something to do alongside updating Elisp documentation. (Which I haven't even begun to look at.) * I suggest preferring the symbol 'rfc-email' for parsing email-related dates, for consistency with the --rfc-email option of GNU 'date'. This should use the current RFC (5322 now, perhaps updated later). I started with RFC822 and RFC2822 because I had copies of these lying around; you're right that I should have looked for more recent standards. And using rfc-email as a synonym for the latest version is a good idea. I suppose you could also advertise 'rfc-822' for strict RFC 822 conformance, and similarly 'rfc2822' for strict 2822 conformance, but I expect these alternatives would be less useful in practice. Anyone parsing email headers would need their date parser to support RFC822 in case they encountered very old emails, but (since later standards are backward-compatible) it's not clear what supporting intermediate standards would buy. > + nil => like us-date with two-digit years disallowed. This doesn't sound like a good default. For example, it completely mishandles dates in Brazil, which use DD/MM/YYYY format. I subsequently added a euro-date format for DD/MM (with various lengths of years). > +Anything else is treated as iso-8601 if it looks similar, else > +us-date with two-digit years disallowed. This might be a better default (for nil), but it should have an explicit name other than nil. Suggestions? > + * For all formats except iso-8601, parsing is case-insensitive. It's pretty common for ISO 8601 parsers to be case-insensitive. For example, Java's OffsetDateTime.parse(CharSequence) allow both lower and upper case T and Z. Perhaps some people need strict ISO 8601 parsers, but I imagine a more-generous parser would be more useful. So you could have iso-8601 and iso-8601-strict; or you could have a strictness arg; or something like that. Actually, I am handing those off to the existing iso8601-parse code, which doesn't like lowercase T (at least). > + * Commas and whitespace are ignored. This is quite wrong for some formats, if you want to be strict. And even if not, commas are part of ISO 8601 format and can't be ignored if I understand what you mean by "ignored". I see I need to clarify the docstring to state that these other bulleted comments also do not apply to ISO-8601 dates. > + * Two digit years, when allowed, are in the 1900's when > +between 50 and 99 inclusive and in the 2000's when between 0 and > +49 inclusive. This disagrees with the POSIX standard for 'date' (supported by GNU 'date'), which says 69-99 are treated as 1969-1999 and 00-68 are treated as 2000-2068. I suggest going with the POSIX heuristic if you're going to use a fixed heuristic for dates at all. I was just following the existing parse-time-string heuristic. So which do you think should rule: POSIX or parse-time-string compatibility? Better might be to have an optional argument of context specifying the default time for incomplete timestamps. You can use that the context to fill in more-significant parts that are missing. E.g., if the year is missing, you take it from the context; if the century is missing, you take that from the context. The default context would be empty, i.e., missing years or centuries would be an error. Again, I'm just doing what parse-time-string is doing, namely leaving everything that is not specified nil, and letting the caller decide how to apply defaults. The only exception is when time is specified without seconds; in that case, the seconds are set to zero (which is also compatible with parse-time-string). And even defaulting from context is not straightforward: If given a date without a year that is not today, should that be in the future or in the past? There's a can of worms I don't need to touch. ;-} For more formats that need parsing, see: https://en.wikipedia.org/wiki/Date_format_by_country https://metacpan.org/search?q=datetime%3A%3Aformat You don't need to support them all now, but you should take a look at what's out there and make sure the API can be extended to handle them. Excellent; thank you! I have been looking at date parsing module documentation but so far the ones I've seen have not been very clear about what they actually accept. -- Bob Rogers http://www.rgrjr.com/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-29 22:01 ` Bob Rogers @ 2021-12-30 5:32 ` Bob Rogers 0 siblings, 0 replies; 40+ messages in thread From: Bob Rogers @ 2021-12-30 5:32 UTC (permalink / raw) To: Paul Eggert, Lars Ingebrigtsen, 52209 [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 1002 bytes --] From: Bob Rogers <rogers-emacs@rgrjr.homedns.org> Date: Wed, 29 Dec 2021 17:01:01 -0500 From: Paul Eggert <eggert@cs.ucla.edu> Date: Wed, 29 Dec 2021 11:29:44 -0800 * I suggest preferring the symbol 'rfc-email' for parsing email-related dates, for consistency with the --rfc-email option of GNU 'date'. This should use the current RFC (5322 now, perhaps updated later). The only update I saw at https://www.rfc-editor.org (RFC6854) only affects addressing syntax. I started with RFC822 and RFC2822 because I had copies of these lying around; you're right that I should have looked for more recent standards. And using rfc-email as a synonym for the latest version is a good idea. FYI, there is no substantial difference between RFC2822 and RFC5322 in date/time syntax. They hide the whitespace in different productions, but the end result is the same. So I'll change the format name to rfc5322 and add rfc-email as a synonym. -- Bob [-- Attachment #2: Type: text/x-patch, Size: 4132 bytes --] --- rfc2822-date.text 2021-12-30 00:15:38.588023882 -0500 +++ rfc5322-date.text 2021-12-29 23:41:39.492629354 -0500 @@ -1,15 +1,15 @@ 3.3. Date and Time Specification - Date and time occur in several header fields. This section + Date and time values occur in several header fields. This section specifies the syntax for a full date and time specification. Though folding white space is permitted throughout the date-time specification, it is RECOMMENDED that a single space be used in each place that FWS appears (whether it is required or optional); some older - implementations may not interpret other occurrences of folding white + implementations will not interpret longer sequences of folding white space correctly. - date-time = [ day-of-week "," ] date FWS time [CFWS] + date-time = [ day-of-week "," ] date time [CFWS] day-of-week = ([FWS] day-name) / obs-day-of-week @@ -18,17 +18,15 @@ date = day month year - day = ([FWS] 1*2DIGIT) / obs-day + day = ([FWS] 1*2DIGIT FWS) / obs-day - month = (FWS month-name FWS) / obs-month - - month-name = "Jan" / "Feb" / "Mar" / "Apr" / + month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" / "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec" - year = 4*DIGIT / obs-year + year = (FWS 4*DIGIT FWS) / obs-year - time = time-of-day FWS zone + time = time-of-day zone time-of-day = hour ":" minute [ ":" second ] @@ -38,7 +36,7 @@ second = 2DIGIT / obs-second - zone = (( "+" / "-" ) 4DIGIT) / obs-zone + zone = (FWS ( "+" / "-" ) 4DIGIT) / obs-zone The day is the numeric day of the month. The year is any numeric year 1900 or later. @@ -54,28 +52,27 @@ day is ahead of (i.e., east of) or behind (i.e., west of) Universal Time. The first two digits indicate the number of hours difference from Universal Time, and the last two digits indicate the number of - minutes difference from Universal Time. (Hence, +hhmm means + additional minutes difference from Universal Time. (Hence, +hhmm means +(hh * 60 + mm) minutes, and -hhmm means -(hh * 60 + mm) minutes). The form "+0000" SHOULD be used to indicate a time zone at Universal Time. Though "-0000" also indicates Universal Time, it is used to indicate that the time was generated on a system that may be in a local - time zone other than Universal Time and therefore indicates that the - date-time contains no + time zone other than Universal Time and that the date-time contains no information about the local time zone. A date-time specification MUST be semantically valid. That is, the - day-of-the-week (if included) MUST be the day implied by the date, the + day-of-week (if included) MUST be the day implied by the date, the numeric day-of-month MUST be between 1 and the number of days allowed for the specified month (in the specified year), the time-of-day MUST be in the range 00:00:00 through 23:59:60 (the number of seconds - allowing for a leap second; see [STD12]), and the zone MUST be within - the range -9959 through +9959. + allowing for a leap second; see [RFC1305]), and the last two digits of + the zone MUST be within the range 00 through 59. 4.3. Obsolete Date and Time The syntax for the obsolete date format allows a 2 digit year in the - date field and allows for a list of alphabetic time zone specifications - that were used in earlier versions of this standard. It also + date field and allows for a list of alphabetic time zone specifiers + that were used in earlier versions of this specification. It also permits comments and folding white space between many of the tokens. obs-day-of-week = [CFWS] day-name [CFWS] @@ -138,3 +135,8 @@ and "Z" is equivalent to "+0000". However, because of the error in [RFC0822], they SHOULD all be considered equivalent to "-0000" unless there is out-of-band information confirming their meaning. + + Other multi-character (usually between 3 and 5) alphabetic time zones + have been used in Internet messages. Any such time zone whose meaning + is not known SHOULD be considered equivalent to "-0000" unless there is + out-of-band information confirming their meaning. ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-29 15:19 ` Lars Ingebrigtsen 2021-12-29 19:29 ` Paul Eggert @ 2021-12-30 21:08 ` Bob Rogers 2022-01-01 14:47 ` Lars Ingebrigtsen 1 sibling, 1 reply; 40+ messages in thread From: Bob Rogers @ 2021-12-30 21:08 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 52209, Paul Eggert From: Lars Ingebrigtsen <larsi@gnus.org> Date: Wed, 29 Dec 2021 16:19:03 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: . . . > +(cl-defmethod parse-date (time-string (_format (eql iso-8601))) By the way, this should be (cl-defmethod parse-date (time-string (_format (eql 'iso-8601))) now -- we're transitioning to eval-ing the eql specifier. Thanks for the heads-up; now done. -- Bob ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2021-12-30 21:08 ` Bob Rogers @ 2022-01-01 14:47 ` Lars Ingebrigtsen 2022-01-01 14:56 ` Andreas Schwab 0 siblings, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2022-01-01 14:47 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209, Paul Eggert I wonder whether we should look at this another way. We currently have two built-in date parsing functions in Emacs: `iso8601-parse' and `parse-time-string', and both parse strings according to well-defined standards (ISO8601 and RFC822bis, respectively). (But the latter's doc string didn't explicitly say so, so people thought it was a DWIM parser.) DWIM date parsing is impossible, though, because there's an infinite variety of date formats out there, and variants are ambiguous. And adding an infinite number of date parsers to Emacs doesn't seem attractive. So how about just adding something that makes parsing common date formats easier, but without being DWIM or being hard-coded. Like: (parse-time "%Y/%m/%d" "2021/01/01") => (nil nil nil 01 01 2021) or something. It could be regexp-ey (parse-time "%Y.*%m.*%d" "2021 01-01") and basically accept the same things that format-time-string accepts, like: (with-locale-environment "fr_FR" (parse-time "%d +%h" "5 août")) => (nil nil nil 5 8 nil) I think that'd be more generally useful. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-01-01 14:47 ` Lars Ingebrigtsen @ 2022-01-01 14:56 ` Andreas Schwab 2022-01-02 0:41 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Andreas Schwab @ 2022-01-01 14:56 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Bob Rogers, 52209, Paul Eggert On Jan 01 2022, Lars Ingebrigtsen wrote: > So how about just adding something that makes parsing common date > formats easier, but without being DWIM or being hard-coded. Like: > > (parse-time "%Y/%m/%d" "2021/01/01") > => (nil nil nil 01 01 2021) Aka strptime. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-01-01 14:56 ` Andreas Schwab @ 2022-01-02 0:41 ` Bob Rogers 2022-01-03 11:34 ` Lars Ingebrigtsen 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2022-01-02 0:41 UTC (permalink / raw) To: Lars Ingebrigtsen, Andreas Schwab; +Cc: 52209, Paul Eggert From: Lars Ingebrigtsen <larsi@gnus.org> Date: Sat, 01 Jan 2022 15:47:05 +0100 I wonder whether we should look at this another way. We currently have two built-in date parsing functions in Emacs: `iso8601-parse' and `parse-time-string', and both parse strings according to well-defined standards (ISO8601 and RFC822bis, respectively). (But the latter's doc string didn't explicitly say so, so people thought it was a DWIM parser.) DWIM date parsing is impossible, though, because there's an infinite variety of date formats out there, and variants are ambiguous. And adding an infinite number of date parsers to Emacs doesn't seem attractive. After perusing [1], I had started to think in terms just three basic formats: dmy (formerly euro-date), ymd, and mdy (formerly us-date), plus possibly adding "." as a date separator. That doesn't cover everything but ought to broaden the set to make most of the world happy, especially if I add a few hacks I have in mind to broaden recognition of four-digit years and alphabetic months. The rest I think could be left to "patches welcome." And in that context, it may make more sense to say, "Use the original parse-time-string if you know you have email dates, or iso8601-parse if you have dates that conform to ISO-8601," rather than having parse-date handle them itself. So how about just adding something that makes parsing common date formats easier, but without being DWIM or being hard-coded . . . I think that'd be more generally useful. Perhaps, but I see that as a different problem: One where you have a date or set of dates in a precise format and just need to knock them out. I was trying to solve the problem where you have date(s) that you only know the general origin (e.g. North America) and don't know whether they are numeric, alphabetic, or how precise, and just want the parser to do the best it can, and signal a reasonably informative error rather than return an incorrect result. ================ From: Andreas Schwab <schwab@linux-m68k.org> Date: Sat, 01 Jan 2022 15:56:37 +0100 On Jan 01 2022, Lars Ingebrigtsen wrote: > (parse-time "%Y/%m/%d" "2021/01/01") > => (nil nil nil 01 01 2021) Aka strptime. Oh, you're talking about the POSIX strptime, not the Perl Date::Parse strptime, which is free-form. Not being a C programmer, I was not aware of the POSIX version. But now I know where the odd name came from. ;-} -- Bob [1] https://en.wikipedia.org/wiki/Date_format_by_country ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-01-02 0:41 ` Bob Rogers @ 2022-01-03 11:34 ` Lars Ingebrigtsen 2022-01-04 4:45 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2022-01-03 11:34 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209, Andreas Schwab, Paul Eggert Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > And in that context, it may make more sense to say, "Use the original > parse-time-string if you know you have email dates, or iso8601-parse if > you have dates that conform to ISO-8601," rather than having parse-date > handle them itself. Yeah. And rename `parse-time-string' to something less confusing. > So how about just adding something that makes parsing common date > formats easier, but without being DWIM or being hard-coded . . . > > I think that'd be more generally useful. > > Perhaps, but I see that as a different problem: One where you have a > date or set of dates in a precise format and just need to knock them > out. I was trying to solve the problem where you have date(s) that you > only know the general origin (e.g. North America) and don't know whether > they are numeric, alphabetic, or how precise, and just want the parser > to do the best it can, and signal a reasonably informative error rather > than return an incorrect result. Yes, I think a function like that would be welcomed by many... but would then lead to an endless series of patches as it'd be extended because it doesn't work correctly on dates from, say, Iceland. That is, a DWIM function would never be finished. > On Jan 01 2022, Lars Ingebrigtsen wrote: > > > (parse-time "%Y/%m/%d" "2021/01/01") > > => (nil nil nil 01 01 2021) > > Aka strptime. > > Oh, you're talking about the POSIX strptime, not the Perl Date::Parse > strptime, which is free-form. Not being a C programmer, I was not aware > of the POSIX version. But now I know where the odd name came from. ;-} POSIX strptime isn't very useful, because if you know the format that precisely, you might as well just write a regexp for it yourself. But something like that, but with more sloppiness (i.e., allowing regexp matching for the non-time bits) might be useful. (And I think if we had that, then implementing DWIM-ish parsing of, say, US dates on top of that would be a matter of writing a series of these strings to match them. Probably.) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-01-03 11:34 ` Lars Ingebrigtsen @ 2022-01-04 4:45 ` Bob Rogers 2022-01-05 15:46 ` Lars Ingebrigtsen 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2022-01-04 4:45 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 52209, Andreas Schwab, Paul Eggert From: Lars Ingebrigtsen <larsi@gnus.org> Date: Mon, 03 Jan 2022 12:34:33 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > And in that context, it may make more sense to say, "Use the original > parse-time-string if you know you have email dates, or iso8601-parse if > you have dates that conform to ISO-8601," rather than having parse-date > handle them itself. Yeah. And rename `parse-time-string' to something less confusing. I would certainly have found that helpful. FWIW, I did some grepping of the elisp sources to count the callers of parse-time-string to seen how much trouble it would be to rename (there are around 60 of them), and found that ietf-drums-parse-date is just encode-time of parse-time-string. Since ietf-drums.el declares itself "Functions for parsing RFC 2822 headers," perhaps parse-date--x822 should find a new home as ietf-drums-parse-date-string, and parse-time-string could then be made obsolescent in its favor. > So how about just adding something that makes parsing common date > formats easier, but without being DWIM or being hard-coded . . . > > I think that'd be more generally useful. > > Perhaps, but I see that as a different problem: One where you have a > date or set of dates in a precise format and just need to knock them > out. I was trying to solve the problem where you have date(s) that you > only know the general origin (e.g. North America) and don't know whether > they are numeric, alphabetic, or how precise, and just want the parser > to do the best it can, and signal a reasonably informative error rather > than return an incorrect result. Yes, I think a function like that would be welcomed by many... but would then lead to an endless series of patches as it'd be extended because it doesn't work correctly on dates from, say, Iceland. That is, a DWIM function would never be finished. But then, as I think someone on the list might have said very recently, neither is Emacs. ;-} POSIX strptime isn't very useful, because if you know the format that precisely, you might as well just write a regexp for it yourself. Agreed. And even writing and debugging regexps can often be less than straightforward. What you are suggesting is effectively expanding the set of metacharacters with percent-escapes, which could makes it easier, or could make it worse. But something like that, but with more sloppiness (i.e., allowing regexp matching for the non-time bits) might be useful. One thing regexps can't do (at least not without adding a fair bit of complexity) is allow components to be in different order or omitted. So it still just takes one approximate date/time, and the caller is back to writing regexps to validate before passing it to the "real" parser. I was thinking that the next dimension in which to extend parse-date would be to add keywords to refine what is accepted, on top of the basic MDY order, e.g.: :date-separators "-/" :time-separators ":." :two-numbers-are :month-year ;; or (e.g.) :day-month :timezone :required ;; could be :optional or :forbidden :timezone-has-colon t ;; RFC5322 forbids, ISO-8601 requires Some keywords could even be regexp-valued. Others could be "umbrella" keywords that change the defaults for subsets of more specific keywords. In any case, that should make patching to add new features easier and (eventually) allow for much more fine tuning by callers. (And I think if we had that, then implementing DWIM-ish parsing of, say, US dates on top of that would be a matter of writing a series of these strings to match them. Probably.) If I understand you correctly, this parse-date-DWIMishly would go through the string and recognize (say) that it had come to something that matches "%M/%d/%Y", concatenate that to a strptime-like format string it was building, and then call parse-date-strptime-style (or whatever) with that and the original string. But it seems to me that if it could recognize that it had found "%M/%d/%Y" in the string, it would be much easier to just fill in the month, day, and year right then. -- Bob ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-01-04 4:45 ` Bob Rogers @ 2022-01-05 15:46 ` Lars Ingebrigtsen 2022-01-05 22:49 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2022-01-05 15:46 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209, Andreas Schwab, Paul Eggert Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > One thing regexps can't do (at least not without adding a fair bit of > complexity) is allow components to be in different order or omitted. So > it still just takes one approximate date/time, and the caller is back to > writing regexps to validate before passing it to the "real" parser. Yes. But you can't just have a function that you can give any string to and it'll tell you what the date contained in it is. "1.2" is a perfectly normal way to specify "January second" in some countries, but no amount of general DWIM is going to take us there. The caller has to say what they expect the format to be is. > I was thinking that the next dimension in which to extend parse-date > would be to add keywords to refine what is accepted, on top of the basic > MDY order, e.g.: > > :date-separators "-/" > :time-separators ":." > :two-numbers-are :month-year ;; or (e.g.) :day-month > :timezone :required ;; could be :optional or :forbidden > :timezone-has-colon t ;; RFC5322 forbids, ISO-8601 requires > > Some keywords could even be regexp-valued. Others could be "umbrella" > keywords that change the defaults for subsets of more specific keywords. > In any case, that should make patching to add new features easier and > (eventually) allow for much more fine tuning by callers. I think it'd be easier to just write a regexp than to use a date-parsing function like that. 😀 > (And I think if we had that, then implementing DWIM-ish parsing of, > say, US dates on top of that would be a matter of writing a series of > these strings to match them. Probably.) > > If I understand you correctly, this parse-date-DWIMishly would go > through the string and recognize (say) that it had come to something > that matches "%M/%d/%Y", concatenate that to a strptime-like format > string it was building, and then call parse-date-strptime-style (or > whatever) with that and the original string. But it seems to me that if > it could recognize that it had found "%M/%d/%Y" in the string, it would > be much easier to just fill in the month, day, and year right then. Well, I was thinking more like looping over a common set of formats and see whether we have a match. For the US, looping over "%M.*%d.*%Y?", "%M.*%b.*%Y?" and "%M.*%B.*%Y?" would probably cover most of the American-language dates. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-01-05 15:46 ` Lars Ingebrigtsen @ 2022-01-05 22:49 ` Bob Rogers [not found] ` <25105.33397.961104.269676@orion.rgrjr.com> 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2022-01-05 22:49 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 52209, Andreas Schwab, Paul Eggert From: Lars Ingebrigtsen <larsi@gnus.org> Date: Wed, 05 Jan 2022 16:46:01 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > One thing regexps can't do (at least not without adding a fair bit of > complexity) is allow components to be in different order or omitted. So > it still just takes one approximate date/time, and the caller is back to > writing regexps to validate before passing it to the "real" parser. Yes. But you can't just have a function that you can give any string to and it'll tell you what the date contained in it is. "1.2" is a perfectly normal way to specify "January second" in some countries, but no amount of general DWIM is going to take us there. The caller has to say what they expect the format to be is. Granted, but (as with any API) how much they need to say can be greatly reduced by suitable defaults. > I was thinking that the next dimension in which to extend parse-date > would be to add keywords to refine what is accepted, on top of the basic > MDY order, e.g.: > > :date-separators "-/" > :time-separators ":." > :two-numbers-are :month-year ;; or (e.g.) :day-month > :timezone :required ;; could be :optional or :forbidden > :timezone-has-colon t ;; RFC5322 forbids, ISO-8601 requires > > Some keywords could even be regexp-valued. Others could be "umbrella" > keywords that change the defaults for subsets of more specific keywords. > In any case, that should make patching to add new features easier and > (eventually) allow for much more fine tuning by callers. I think it'd be easier to just write a regexp than to use a date-parsing function like that. Again, suitable defaults should take care of that. (And I'm beginning to suspect you're better at writing regexps than I am. ;-) > . . . Well, I was thinking more like looping over a common set of formats and see whether we have a match. For the US, looping over "%M.*%d.*%Y?", "%M.*%b.*%Y?" and "%M.*%B.*%Y?" would probably cover most of the American-language dates. Except that (according to "man strptime" on my system), "%M" is the descriptor for minute, which rather makes the point that composing these is not straightforward. It also occurs to me that using ".*" could be dangerous if it matches into the time or timezone fields. And (also based on my reading of "man strptime") you wouldn't need to specify "%b" and "%B" separately, as they are treated equivalently, but if you wanted to be DWIMmy about two-digit years, you'd have to cover "%y" as well as "%Y": %m[-/]%d[-/]%y %m[-/]%d[-/]%Y %m[-/]%b[-/]%y %m[-/]%b[-/]%Y This does not strike me as an improvement. In any case, I would like to bring parse-date.el to completion soon, so here is what I plan to do: 1. Drop ISO-8601 parsing, and point the documentation to iso8601-parse. 2. Drop email date parsing and use the code to create a patch that updates ietf-drums.el, which could perhaps start the process to replace parse-time-string. 3. Restrict parse-date formats to mdy, dmy, and ymd, with some extra heuristics for four-digit years and alphanumeric months, then call it a day. If you think the resulting parse-date is worth the trouble, then it can become part of Emacs; if not, then I will offer it to ELPA. Either way, parse-date will be off my plate. But I don't think I will take up the mantle of writing a strptime-like date parser, as I don't think it will be very useful. -- Bob ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <25105.33397.961104.269676@orion.rgrjr.com>]
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates [not found] ` <25105.33397.961104.269676@orion.rgrjr.com> @ 2022-02-20 12:25 ` Lars Ingebrigtsen 2022-02-20 13:03 ` Andreas Schwab 1 sibling, 0 replies; 40+ messages in thread From: Lars Ingebrigtsen @ 2022-02-20 12:25 UTC (permalink / raw) To: 52209 (Resending because bug report was archived.) Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Here's what I have for this phase of the plan; let me know what you > think. It took longer than expected because it became a project unto > itself, and so my procrastinator kicked in, making it longer still. :-/ :-) Have you benchmarked your new implementation versus the current one? It's important that the parsing is performant, otherwise it'd slow down many things that parse a large number of date strings. Some minor comments about the code: > +(defsubst ietf-drums-date--ignore-char? (char) > + ;; Ignore whitespace and commas. > + (or (eq char ?\ ) (eq char ?\t) (eq char ?\r) (eq char ?\n) (eq char ?,))) In Emacs Lisp, we don't use Scheme-style predicate names -- we use -p instead. > +(defun ietf-drums-date--tokenize-string (string &optional comment-eof?) And the same with booleans -- we don't use foo? for those, but just foo. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates [not found] ` <25105.33397.961104.269676@orion.rgrjr.com> 2022-02-20 12:25 ` Lars Ingebrigtsen @ 2022-02-20 13:03 ` Andreas Schwab [not found] ` <87ilt9vicd.fsf@gnus.org> 1 sibling, 1 reply; 40+ messages in thread From: Andreas Schwab @ 2022-02-20 13:03 UTC (permalink / raw) To: Bob Rogers; +Cc: Lars Ingebrigtsen, 52209, Paul Eggert On Feb 19 2022, Bob Rogers wrote: > + (or (eq char ?\ ) (eq char ?\t) (eq char ?\r) (eq char ?\n) (eq char ?,))) (memq char '(?\s ?\t ?\r ?\n ?,)) -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <87ilt9vicd.fsf@gnus.org>]
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates [not found] ` <87ilt9vicd.fsf@gnus.org> @ 2022-02-20 22:14 ` Bob Rogers 2022-02-23 23:15 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2022-02-20 22:14 UTC (permalink / raw) To: Lars Ingebrigtsen, Andreas Schwab; +Cc: 52209, Paul Eggert From: Lars Ingebrigtsen <larsi@gnus.org> Date: Sun, 20 Feb 2022 13:21:54 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Here's what I have for this phase of the plan; let me know what you > think. It took longer than expected because it became a project unto > itself, and so my procrastinator kicked in, making it longer still. :-/ :-) Have you benchmarked your new implementation versus the current one? It's important that the parsing is performant, otherwise it'd slow down many things that parse a large number of date strings. No benchmarking; I will do that presently. Some minor comments about the code: > +(defsubst ietf-drums-date--ignore-char? (char) . . . In Emacs Lisp, we don't use Scheme-style predicate names -- we use -p instead . . . And the same with booleans -- we don't use foo? for those, but just foo. OK. ================ From: Andreas Schwab <schwab@linux-m68k.org> Date: Sun, 20 Feb 2022 14:03:55 +0100 On Feb 19 2022, Bob Rogers wrote: > + (or (eq char ?\ ) (eq char ?\t) (eq char ?\r) (eq char ?\n) (eq char ?,))) (memq char '(?\s ?\t ?\r ?\n ?,)) Good eye; I had been adding to that set incrementally, so I missed the forest for the trees. -- Bob ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-02-20 22:14 ` Bob Rogers @ 2022-02-23 23:15 ` Bob Rogers 2022-02-24 9:19 ` Lars Ingebrigtsen 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2022-02-23 23:15 UTC (permalink / raw) To: Lars Ingebrigtsen, Andreas Schwab, 52209, Paul Eggert [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 1218 bytes --] From: Bob Rogers <rogers-emacs@rgrjr.homedns.org> Date: Sun, 20 Feb 2022 17:14:36 -0500 From: Lars Ingebrigtsen <larsi@gnus.org> Date: Sun, 20 Feb 2022 13:21:54 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Here's what I have for this phase of the plan; let me know what you > think. It took longer than expected because it became a project unto > itself, and so my procrastinator kicked in, making it longer still. :-/ :-) Have you benchmarked your new implementation versus the current one? It's important that the parsing is performant, otherwise it'd slow down many things that parse a large number of date strings. No benchmarking; I will do that presently. Benchmarking code and results attached. I extracted a handful of non-error cases from the tests as being more representative than any of the error cases; the resulting numbers make it seem like any difference between the two implementations is in the noise. But this is my first foray into elisp benchmarking, so I may have overlooked something. Fortunately, email dates are not that diverse, so I am hoping this sampling may be broad enough. -- Bob [-- Attachment #2: ietf-drums-date-timings.el --] [-- Type: text/x-emacs-lisp, Size: 1938 bytes --] ;;; ietf-drums-date-timings.el --- timing ietf-drums-date.el -*- lexical-binding: t -*- ;; Copyright (C) 2022 Free Software Foundation, Inc. ;; Author: Bob Rogers <rogers@rgrjr.com> (defun run-timings (parse-fn) (dolist (case '(("Mon, 22 Feb 2016 19:35:42 +0100" (42 35 19 22 2 2016 1 -1 3600) (22219 21758)) ("22 Feb 2016 19:35:42 +0100" (42 35 19 22 2 2016 nil -1 3600) (22219 21758)) ("Mon, 22 February 2016 19:35:42 +0100" (42 35 19 22 2 2016 1 -1 3600) (22219 21758)) ("Mon, 22 feb 2016 19:35:42 +0100" (42 35 19 22 2 2016 1 -1 3600) (22219 21758)) ("Monday, 22 february 2016 19:35:42 +0100" (42 35 19 22 2 2016 1 -1 3600) (22219 21758)) ("Monday, 22 february 2016 19:35:42 PST" (42 35 19 22 2 2016 1 nil -28800) (22219 54158)) ("Friday, 21 Sep 2018 13:47:58 PDT" (58 47 13 21 9 2018 5 t -25200) (23461 22782)) ("Friday, 21 Sep 2018 13:47:58" (58 47 13 21 9 2018 5 -1 nil) (23461 11982)))) (funcall parse-fn (car case)))) (benchmark-run-compiled 10000 (run-timings #'ietf-drums-parse-date)) ;; (7.220905228 83 3.3420971879999968) ;; (7.24936647 83 3.3321491059999993) ;; (7.3240701370000005 84 3.371737411) ;; (/ (+ 7.249 7.324 7.324) 3) 7.299 (defun ietf-drums-old-parse-date (string) "Return an Emacs time spec from STRING." (encode-time (parse-time-string string))) (benchmark-run 10000 (run-timings #'ietf-drums-old-parse-date)) ;; (7.249068317 83 3.3251401939999994) ;; (7.317397244 84 3.3750772899999983) ;; (7.268244294 84 3.3820036280000005) ;; (/ (+ 7.249 7.317 7.268) 3) 7.278 ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-02-23 23:15 ` Bob Rogers @ 2022-02-24 9:19 ` Lars Ingebrigtsen 2022-02-25 0:49 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2022-02-24 9:19 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209, Andreas Schwab, Paul Eggert Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Benchmarking code and results attached. I extracted a handful of > non-error cases from the tests as being more representative than any of > the error cases; the resulting numbers make it seem like any difference > between the two implementations is in the noise. > > But this is my first foray into elisp benchmarking, so I may have > overlooked something. Fortunately, email dates are not that diverse, so > I am hoping this sampling may be broad enough. [...] > (benchmark-run-compiled 10000 (run-timings #'ietf-drums-parse-date)) > ;; (7.220905228 83 3.3420971879999968) > ;; (7.24936647 83 3.3321491059999993) > ;; (7.3240701370000005 84 3.371737411) > ;; (/ (+ 7.249 7.324 7.324) 3) 7.299 [...] > (benchmark-run 10000 (run-timings #'ietf-drums-old-parse-date)) > ;; (7.249068317 83 3.3251401939999994) > ;; (7.317397244 84 3.3750772899999983) > ;; (7.268244294 84 3.3820036280000005) > ;; (/ (+ 7.249 7.317 7.268) 3) 7.278 Thanks; that looks quite promising. Can you send a new version of the patch, and I'll get it pushed? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-02-24 9:19 ` Lars Ingebrigtsen @ 2022-02-25 0:49 ` Bob Rogers 2022-02-25 2:16 ` Lars Ingebrigtsen 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2022-02-25 0:49 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 52209, Andreas Schwab, Paul Eggert [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 669 bytes --] From: Lars Ingebrigtsen <larsi@gnus.org> Date: Thu, 24 Feb 2022 10:19:43 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Benchmarking code and results attached. I extracted a handful of > non-error cases from the tests as being more representative than any of > the error cases; the resulting numbers make it seem like any difference > between the two implementations is in the noise. Thanks; that looks quite promising. Can you send a new version of the patch, and I'll get it pushed? Here it is; there should be no changes from what I last sent other than from the suggestions you and Andreas made. Thanks, -- Bob [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Enhanced-date-parsing-for-ietf-drums.el.patch --] [-- Type: text/x-patch, Size: 23385 bytes --] From bdf96f4132cb0433dfcd48b862175ef9bbcc41bb Mon Sep 17 00:00:00 2001 From: Bob Rogers <rogers@rgrjr.com> Date: Tue, 1 Feb 2022 14:36:31 -0500 Subject: [PATCH] Enhanced date parsing for ietf-drums.el * lisp/mail/ietf-drums-date.el (added): + (ietf-drums-parse-date-string): parse-time-string replacement which is compatible but can be made stricter if desired. * test/lisp/mail/ietf-drums-date-tests.el (added): + Add tests for ietf-drums-parse-date-string. * lisp/mail/ietf-drums.el: + (ietf-drums-parse-date): Use ietf-drums-parse-date-string. --- lisp/mail/ietf-drums-date.el | 274 ++++++++++++++++++++++++ lisp/mail/ietf-drums.el | 6 +- test/lisp/mail/ietf-drums-date-tests.el | 176 +++++++++++++++ 3 files changed, 455 insertions(+), 1 deletion(-) create mode 100644 lisp/mail/ietf-drums-date.el create mode 100644 test/lisp/mail/ietf-drums-date-tests.el diff --git a/lisp/mail/ietf-drums-date.el b/lisp/mail/ietf-drums-date.el new file mode 100644 index 0000000000..6f64ae7337 --- /dev/null +++ b/lisp/mail/ietf-drums-date.el @@ -0,0 +1,274 @@ +;;; ietf-drums-date.el --- parse time/date for ietf-drums.el -*- lexical-binding: t -*- + +;; Copyright (C) 2022 Free Software Foundation, Inc. + +;; Author: Bob Rogers <rogers@rgrjr.com> +;; Keywords: mail, util + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: + +;; 'ietf-drums-parse-date-string' parses a time and/or date in a +;; string and returns a list of values, just like `decode-time', where +;; unspecified elements in the string are returned as nil (except +;; unspecified DST is returned as -1). `encode-time' may be applied +;; on these values to obtain an internal time value. + +;; Historically, `parse-time-string' was used for this purpose, but it +;; was gradually but imperfectly extended to handle other date +;; formats. 'ietf-drums-parse-date-string' is compatible in that it +;; uses the same return value format and parses the same email date +;; formats by default, but can be made stricter if desired. + +;;; Code: + +(require 'cl-lib) +(require 'parse-time) + +(define-error 'date-parse-error "Date/time parse error" 'error) + +(defconst ietf-drums-date--slot-names + '(second minute hour day month year weekday dst zone) + "Names of return value slots, for better error messages +See the decoded-time defstruct.") + +(defconst ietf-drums-date--slot-ranges + '((0 60) (0 59) (0 23) (1 31) (1 12) (1 9999)) + "Numeric slot ranges, for bounds checking. +Note that RFC5322 explicitly requires that seconds go up to 60, +to allow for leap seconds (see Mills, D., 'Network Time +Protocol', STD 12, RFC 1119, September 1989).") + +(defsubst ietf-drums-date--ignore-char-p (char) + ;; Ignore whitespace and commas. + (memq char '(?\s ?\t ?\r ?\n ?,))) + +(defun ietf-drums-date--tokenize-string (string &optional comment-eof) + "Turn STRING into tokens, separated only by whitespace and commas. +Multiple commas are ignored. Pure digit sequences are turned +into integers. If COMMENT-EOF is true, then a comment as +defined by RFC5322 (strictly, the CFWS production that also +accepts comments) is treated as an end-of-file, and no further +tokens are recognized, otherwise we strip out all comments and +treat them as whitespace (per RFC822)." + (let ((index 0) + (end (length string)) + (list ())) + (cl-flet ((skip-ignored () + ;; Skip ignored characters at index (the scan + ;; position). Skip RFC822 comments in matched parens, + ;; but do not complain about unterminated comments. + (let ((char nil) + (nest 0)) + (while (and (< index end) + (setq char (aref string index)) + (or (> nest 0) + (ietf-drums-date--ignore-char-p char) + (and (not comment-eof) (eql char ?\()))) + (cl-incf index) + ;; FWS bookkeeping. + (cond ((and (eq char ?\\) + (< (1+ index) end)) + ;; Move to the next char but don't check + ;; it to see if it might be a paren. + (cl-incf index)) + ((eq char ?\() (cl-incf nest)) + ((eq char ?\)) (cl-decf nest))))))) + (skip-ignored) ;; Skip leading whitespace. + (while (and (< index end) + (not (and comment-eof + (eq (aref string index) ?\()))) + (let* ((start index) + (char (aref string index)) + (all-digits (<= ?0 char ?9))) + ;; char is valid; look for more valid characters. + (when (and (eq char ?\\) + (< (1+ index) end)) + ;; Escaped character, which might be a "(". If so, we are + ;; correct to include it in the token, even though the + ;; caller is sure to barf. If not, we violate RFC2?822 by + ;; not removing the backslash, but no characters in valid + ;; RFC2?822 dates need escaping anyway, so it shouldn't + ;; matter that this is not done strictly correctly. -- + ;; rgr, 24-Dec-21. + (cl-incf index)) + (while (and (< (cl-incf index) end) + (setq char (aref string index)) + (not (or (ietf-drums-date--ignore-char-p char) + (eq char ?\()))) + (unless (<= ?0 char ?9) + (setq all-digits nil)) + (when (and (eq char ?\\) + (< (1+ index) end)) + ;; Escaped character, see above. + (cl-incf index))) + (push (if all-digits + (cl-parse-integer string :start start :end index) + (substring string start index)) + list) + (skip-ignored))) + (nreverse list)))) + +(defun ietf-drums-parse-date-string (time-string &optional error no-822) + "Parse an RFC5322 or RFC822 date, passed as TIME-STRING. +The optional ERROR parameter causes syntax errors to be flagged +by signalling an instance of the date-parse-error condition. The +optional NO-822 parameter disables the more lax RFC822 syntax, +which is permitted by default. + +The result is a list of (SEC MIN HOUR DAY MON YEAR DOW DST TZ), +which can be accessed as a decoded-time defstruct (q.v.), +e.g. `decoded-time-year' to extract the year, and turned into an +Emacs timestamp by `encode-time'. + +The strict syntax for RFC5322 is as follows: + + [ day-of-week \",\" ] day FWS month-name FWS year FWS time [CFWS] + +where the \"time\" production is: + + 2DIGIT \":\" 2DIGIT [ \":\" 2DIGIT ] FWS ( \"+\" / \"-\" ) 4DIGIT + +and FWS is \"folding white space,\" and CFWS is \"comments and/or +folding white space\", where comments are included in nesting +parentheses and are equivalent to white space. RFC822 also +accepts comments in random places (all of which is handled by +ietf-drums-date--tokenize-string) and two-digit years. For +two-digit years, 50 and up are interpreted as 1950 through 1999 +and 00 through 49 as 200 through 2049. + +We are somewhat more lax in what we accept (specifically, the +hours don't have to be two digits, and the TZ and the comma after +the DOW are optional), but we do insist that the items that are +present do appear in this order. Unspecified/unrecognized +elements in the string are returned as nil (except unspecified +DST is returned as -1)." + (let ((tokens (ietf-drums-date--tokenize-string (downcase time-string) + no-822)) + (time (list nil nil nil nil nil nil nil -1 nil))) + (cl-labels ((set-matched-slot (slot index token) + ;; Assign a slot value from match data if index is + ;; non-nil, else from token, signalling an error if + ;; enabled and it's out of range. + (let ((value (if index + (cl-parse-integer (match-string index token)) + token))) + (when error + (let ((range (nth slot ietf-drums-date--slot-ranges))) + (when (and range + (not (<= (car range) value (cadr range)))) + (signal 'date-parse-error + (list "Slot out of range" + (nth slot ietf-drums-date--slot-names) + token (car range) (cadr range)))))) + (setf (nth slot time) value))) + (set-numeric (slot token) + ;; Only assign the slot if the token is a number. + (cond ((natnump token) + (set-matched-slot slot nil token)) + (error + (signal 'date-parse-error + (list "Not a number" + (nth slot ietf-drums-date--slot-names) + token)))))) + ;; Check for weekday. + (let ((dow (assoc (car tokens) parse-time-weekdays))) + (when dow + ;; Day of the week. + (set-matched-slot 6 nil (cdr dow)) + (pop tokens))) + ;; Day. + (set-numeric 3 (pop tokens)) + ;; Alphabetic month. + (let* ((month (pop tokens)) + (match (assoc month parse-time-months))) + (cond (match + (set-matched-slot 4 nil (cdr match))) + (error + (signal 'date-parse-error + (list "Expected an alphabetic month" month))) + (t + (push month tokens)))) + ;; Year. + (let ((year (pop tokens))) + ;; Check the year for the right number of digits. + (cond ((not (natnump year)) + (when error + (signal 'date-parse-error + (list "Expected a year" year))) + (push year tokens)) + ((>= year 1000) + (set-numeric 5 year)) + ((or no-822 + (>= year 100)) + (when error + (signal 'date-parse-error + (list "Four-digit years are required" year))) + (push year tokens)) + ((>= year 50) + ;; second half of the 20th century. + (set-numeric 5 (+ 1900 year))) + (t + ;; first half of the 21st century. + (set-numeric 5 (+ 2000 year))))) + ;; Time. + (let ((time (pop tokens))) + (cond ((or (null time) (natnump time)) + (when error + (signal 'date-parse-error + (list "Expected a time" time))) + (push time tokens)) + ((string-match + "^\\([0-9][0-9]?\\):\\([0-9][0-9]\\):\\([0-9][0-9]\\)$" + time) + (set-matched-slot 2 1 time) + (set-matched-slot 1 2 time) + (set-matched-slot 0 3 time)) + ((string-match "^\\([0-9][0-9]?\\):\\([0-9][0-9]\\)$" time) + ;; Time without seconds. + (set-matched-slot 2 1 time) + (set-matched-slot 1 2 time) + (set-matched-slot 0 nil 0)) + (error + (signal 'date-parse-error + (list "Expected a time" time))))) + ;; Timezone. + (let* ((zone (pop tokens)) + (match (assoc zone parse-time-zoneinfo))) + (cond (match + (set-matched-slot 8 nil (cadr match)) + (set-matched-slot 7 nil (caddr match))) + ((and (stringp zone) + (string-match "^[-+][0-9][0-9][0-9][0-9]$" zone)) + ;; Numeric time zone. + (set-matched-slot + 8 nil + (* 60 + (+ (cl-parse-integer zone :start 3 :end 5) + (* 60 (cl-parse-integer zone :start 1 :end 3))) + (if (= (aref zone 0) ?-) -1 1)))) + ((and zone error) + (signal 'date-parse-error + (list "Expected a timezone" zone))))) + (when (and tokens error) + (signal 'date-parse-error + (list "Extra token(s)" (car tokens))))) + time)) + +(provide 'ietf-drums-date) + +;;; ietf-drums-date.el ends here diff --git a/lisp/mail/ietf-drums.el b/lisp/mail/ietf-drums.el index 85aa27235f..d1ad671b16 100644 --- a/lisp/mail/ietf-drums.el +++ b/lisp/mail/ietf-drums.el @@ -294,9 +294,13 @@ ietf-drums-unfold-fws (replace-match " " t t)) (goto-char (point-min))) +(declare-function ietf-drums-parse-date-string "ietf-drums-date" + (time-string &optional error? no-822?)) + (defun ietf-drums-parse-date (string) "Return an Emacs time spec from STRING." - (encode-time (parse-time-string string))) + (require 'ietf-drums-date) + (encode-time (ietf-drums-parse-date-string string))) (defun ietf-drums-narrow-to-header () "Narrow to the header section in the current buffer." diff --git a/test/lisp/mail/ietf-drums-date-tests.el b/test/lisp/mail/ietf-drums-date-tests.el new file mode 100644 index 0000000000..2d4b39dfae --- /dev/null +++ b/test/lisp/mail/ietf-drums-date-tests.el @@ -0,0 +1,176 @@ +;;; ietf-drums-date-tests.el --- Test suite for ietf-drums-date.el -*- lexical-binding:t -*- + +;; Copyright (C) 2022 Free Software Foundation, Inc. + +;; Author: Bob Rogers <rogers@rgrjr.com> + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: + +;;; Code: + +(require 'ert) +(require 'ietf-drums) +(require 'ietf-drums-date) + +(ert-deftest ietf-drums-date-tests () + "Test basic ietf-drums-parse-date-string functionality." + + ;; Test tokenization. + (should (equal (ietf-drums-date--tokenize-string " ") '())) + (should (equal (ietf-drums-date--tokenize-string " a b") '("a" "b"))) + (should (equal (ietf-drums-date--tokenize-string "a bbc dde") + '("a" "bbc" "dde"))) + (should (equal (ietf-drums-date--tokenize-string " , a 27 b,, c 14:32 ") + '("a" 27 "b" "c" "14:32"))) + ;; Some folding whitespace tests. + (should (equal (ietf-drums-date--tokenize-string " a b (end) c" t) + '("a" "b"))) + (should (equal (ietf-drums-date--tokenize-string "(quux)a (foo (bar)) b(baz)") + '("a" "b"))) + (should (equal (ietf-drums-date--tokenize-string "a b\\cde") + ;; Strictly incorrect, but strictly unnecessary syntax. + '("a" "b\\cde"))) + (should (equal (ietf-drums-date--tokenize-string "a b\\ de") + '("a" "b\\ de"))) + (should (equal (ietf-drums-date--tokenize-string "a \\de \\(f") + '("a" "\\de" "\\(f"))) + + ;; Start with some compatible RFC822 dates. + (dolist (case '(("Mon, 22 Feb 2016 19:35:42 +0100" + (42 35 19 22 2 2016 1 -1 3600) + (22219 21758)) + ("22 Feb 2016 19:35:42 +0100" + (42 35 19 22 2 2016 nil -1 3600) + (22219 21758)) + ("Mon, 22 February 2016 19:35:42 +0100" + (42 35 19 22 2 2016 1 -1 3600) + (22219 21758)) + ("Mon, 22 feb 2016 19:35:42 +0100" + (42 35 19 22 2 2016 1 -1 3600) + (22219 21758)) + ("Monday, 22 february 2016 19:35:42 +0100" + (42 35 19 22 2 2016 1 -1 3600) + (22219 21758)) + ("Monday, 22 february 2016 19:35:42 PST" + (42 35 19 22 2 2016 1 nil -28800) + (22219 54158)) + ("Friday, 21 Sep 2018 13:47:58 PDT" + (58 47 13 21 9 2018 5 t -25200) + (23461 22782)) + ("Friday, 21 Sep 2018 13:47:58" + (58 47 13 21 9 2018 5 -1 nil) + (23461 11982)))) + (let* ((input (car case)) + (parsed (cadr case)) + (encoded (caddr case))) + ;; The input should parse the same without RFC822. + (should (equal (ietf-drums-parse-date-string input) parsed)) + (should (equal (ietf-drums-parse-date-string input nil t) parsed)) + ;; Check the encoded date (the official output, though + ;; the decoded-time is easier to debug). + (should (equal (ietf-drums-parse-date input) encoded)))) + + ;; Two-digit years are not allowed by the "modern" format. + (should (equal (ietf-drums-parse-date-string "22 Feb 16 19:35:42 +0100") + '(42 35 19 22 2 2016 nil -1 3600))) + (should (equal (ietf-drums-parse-date-string "22 Feb 16 19:35:42 +0100" nil t) + '(nil nil nil 22 2 nil nil -1 nil))) + (should (equal (should-error (ietf-drums-parse-date-string + "22 Feb 16 19:35:42 +0100" t t)) + '(date-parse-error "Four-digit years are required" 16))) + (should (equal (ietf-drums-parse-date-string "22 Feb 96 19:35:42 +0100") + '(42 35 19 22 2 1996 nil -1 3600))) + (should (equal (ietf-drums-parse-date-string "22 Feb 96 19:35:42 +0100" nil t) + '(nil nil nil 22 2 nil nil -1 nil))) + (should (equal (should-error (ietf-drums-parse-date-string + "22 Feb 96 19:35:42 +0100" t t)) + '(date-parse-error "Four-digit years are required" 96))) + + ;; Try some dates with comments. + (should (equal (ietf-drums-parse-date-string + "22 Feb (today) 16 19:35:42 +0100") + '(42 35 19 22 2 2016 nil -1 3600))) + (should (equal (ietf-drums-parse-date-string + "22 Feb (today) 16 19:35:42 +0100" nil t) + '(nil nil nil 22 2 nil nil -1 nil))) + (should (equal (should-error (ietf-drums-parse-date-string + "22 Feb (today) 16 19:35:42 +0100" t t)) + '(date-parse-error "Expected a year" nil))) + (should (equal (ietf-drums-parse-date-string + "22 Feb 96 (long ago) 19:35:42 +0100") + '(42 35 19 22 2 1996 nil -1 3600))) + (should (equal (ietf-drums-parse-date-string + "Friday, 21 Sep(comment \\) with \\( parens)18 19:35:42") + '(42 35 19 21 9 2018 5 -1 nil))) + (should (equal (ietf-drums-parse-date-string + "Friday, 21 Sep 18 19:35:42 (unterminated comment") + '(42 35 19 21 9 2018 5 -1 nil))) + + ;; Test some RFC822 error cases + (dolist (test '(("33 1 2022" ("Slot out of range" day 33 1 31)) + ("0 1 2022" ("Slot out of range" day 0 1 31)) + ("1 1 2020 2021" ("Expected an alphabetic month" 1)) + ("1 Jan 2020 2021" ("Expected a time" 2021)) + ("1 Jan 2020 20:21 2000" ("Expected a timezone" 2000)) + ("1 Jan 2020 20:21 +0200 33" ("Extra token(s)" 33)))) + (should (equal (should-error (ietf-drums-parse-date-string (car test) t)) + (cons 'date-parse-error (cadr test))))) + + (dolist (test '(("22 Feb 196" nil ;; bad year + ("Four-digit years are required" 196)) + ("22 Feb 16 19:35:24" t ;; two-digit year + ("Four-digit years are required" 16)) + ("22 Feb 96 19:35:42" t ;; two-digit year + ("Four-digit years are required" 96)) + ("2 Feb 2021 1996" nil + ("Expected a time" 1996)) + ("22 Fub 1996" nil + ("Expected an alphabetic month" "fub")) + ("1 Jan 2020 30" nil + ("Expected a time" 30)) + ("1 Jan 2020 16:47 15:15" nil + ("Expected a timezone" "15:15")) + ("1 Jan 2020 16:47 +0800 -0800" t + ("Extra token(s)" "-0800")) + ;; Range tests + ("32 Dec 2021" nil + ("Slot out of range" day 32 1 31)) + ("0 Dec 2021" nil + ("Slot out of range" day 0 1 31)) + ("3 13 2021" nil + ("Expected an alphabetic month" 13)) + ("3 Dec 0000" t + ("Four-digit years are required" 0)) + ("3 Dec 20021" nil + ("Slot out of range" year 20021 1 9999)) + ("1 Jan 2020 24:21:14" nil + ("Slot out of range" hour "24:21:14" 0 23)) + ("1 Jan 2020 14:60:21" nil + ("Slot out of range" minute "14:60:21" 0 59)) + ("1 Jan 2020 14:21:61" nil + ("Slot out of range" second "14:21:61" 0 60)))) + (should (equal (should-error + (ietf-drums-parse-date-string (car test) t (cadr test))) + (cons 'date-parse-error (caddr test))))) + (should (equal (ietf-drums-parse-date-string + "1 Jan 2020 14:21:60") ;; a leap second! + '(60 21 14 1 1 2020 nil -1 nil)))) + +(provide 'ietf-drums-date-tests) + +;;; ietf-drums-date-tests.el ends here -- 2.34.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-02-25 0:49 ` Bob Rogers @ 2022-02-25 2:16 ` Lars Ingebrigtsen 2022-02-25 2:32 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Lars Ingebrigtsen @ 2022-02-25 2:16 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209, Andreas Schwab, Paul Eggert Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Here it is; there should be no changes from what I last sent other than > from the suggestions you and Andreas made. Thanks, I got a test error: Test ietf-drums-date-tests condition: (ert-test-failed ((should (equal (ietf-drums-parse-date input) encoded)) :form (equal (23460 55918) (23461 11982)) :value nil :explanation (list-elt 0 (different-atoms (23460 "#x5ba4" "?室") (23461 "#x5ba5" "?宥"))))) FAILED 1/1 ietf-drums-date-tests (0.000406 sec) at lisp/mail/ietf-drums-date-tests.el:30 -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-02-25 2:16 ` Lars Ingebrigtsen @ 2022-02-25 2:32 ` Bob Rogers 2022-02-25 2:58 ` Bob Rogers 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2022-02-25 2:32 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 52209, Andreas Schwab, Paul Eggert From: Lars Ingebrigtsen <larsi@gnus.org> Date: Fri, 25 Feb 2022 03:16:56 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Here it is; there should be no changes from what I last sent other than > from the suggestions you and Andreas made. Thanks, I got a test error: Hmm. I bet we have a timezone issue . . . -- Bob ^ permalink raw reply [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-02-25 2:32 ` Bob Rogers @ 2022-02-25 2:58 ` Bob Rogers 2022-02-25 12:03 ` Lars Ingebrigtsen 0 siblings, 1 reply; 40+ messages in thread From: Bob Rogers @ 2022-02-25 2:58 UTC (permalink / raw) To: Lars Ingebrigtsen, Andreas Schwab, 52209, Paul Eggert [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 531 bytes --] From: Bob Rogers <rogers-emacs@rgrjr.homedns.org> Date: Thu, 24 Feb 2022 21:32:27 -0500 From: Lars Ingebrigtsen <larsi@gnus.org> Date: Fri, 25 Feb 2022 03:16:56 +0100 Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Here it is; there should be no changes from what I last sent other than > from the suggestions you and Andreas made. Thanks, I got a test error: Hmm. I bet we have a timezone issue . . . Yep; here's a fix, to be applied to the previous patch. -- Bob [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 3121 bytes --] From 93a92360e5ea514236366f978aa5a71e7662ba1a Mon Sep 17 00:00:00 2001 From: Bob Rogers <rogers@rgrjr.com> Date: Thu, 24 Feb 2022 21:55:30 -0500 Subject: [PATCH] Fix an ietf-drums-parse-date test without TZ * test/lisp/mail/ietf-drums-date-tests.el: + (ietf-drums-date-tests): Bug fix: Input to ietf-drums-parse-date must have a timezone, otherwise the output depends on the test environment TZ. Also add some tests without TZ, & fix indentation. --- test/lisp/mail/ietf-drums-date-tests.el | 36 +++++++++++++++++-------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/test/lisp/mail/ietf-drums-date-tests.el b/test/lisp/mail/ietf-drums-date-tests.el index 2d4b39dfae..5b798077ff 100644 --- a/test/lisp/mail/ietf-drums-date-tests.el +++ b/test/lisp/mail/ietf-drums-date-tests.el @@ -72,18 +72,32 @@ ietf-drums-date-tests ("Friday, 21 Sep 2018 13:47:58 PDT" (58 47 13 21 9 2018 5 t -25200) (23461 22782)) - ("Friday, 21 Sep 2018 13:47:58" - (58 47 13 21 9 2018 5 -1 nil) + ("Friday, 21 Sep 2018 13:47:58 EDT" + (58 47 13 21 9 2018 5 t -14400) (23461 11982)))) - (let* ((input (car case)) - (parsed (cadr case)) - (encoded (caddr case))) - ;; The input should parse the same without RFC822. - (should (equal (ietf-drums-parse-date-string input) parsed)) - (should (equal (ietf-drums-parse-date-string input nil t) parsed)) - ;; Check the encoded date (the official output, though - ;; the decoded-time is easier to debug). - (should (equal (ietf-drums-parse-date input) encoded)))) + (let* ((input (car case)) + (parsed (cadr case)) + (encoded (caddr case))) + ;; The input should parse the same without RFC822. + (should (equal (ietf-drums-parse-date-string input) parsed)) + (should (equal (ietf-drums-parse-date-string input nil t) parsed)) + ;; Check the encoded date (the official output, though the + ;; decoded-time is easier to debug). + (should (equal (ietf-drums-parse-date input) encoded)))) + + ;; Test a few without timezones. + (dolist (case '(("Mon, 22 Feb 2016 19:35:42" + (42 35 19 22 2 2016 1 -1 nil)) + ("Friday, 21 Sep 2018 13:47:58" + (58 47 13 21 9 2018 5 -1 nil)))) + (let* ((input (car case)) + (parsed (cadr case))) + ;; The input should parse the same without RFC822. + (should (equal (ietf-drums-parse-date-string input) parsed)) + (should (equal (ietf-drums-parse-date-string input nil t) parsed)) + ;; We can't check the encoded date here because it will differ + ;; depending on the TZ of the test environment. + )) ;; Two-digit years are not allowed by the "modern" format. (should (equal (ietf-drums-parse-date-string "22 Feb 16 19:35:42 +0100") -- 2.34.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates 2022-02-25 2:58 ` Bob Rogers @ 2022-02-25 12:03 ` Lars Ingebrigtsen 0 siblings, 0 replies; 40+ messages in thread From: Lars Ingebrigtsen @ 2022-02-25 12:03 UTC (permalink / raw) To: Bob Rogers; +Cc: 52209, Andreas Schwab, Paul Eggert Bob Rogers <rogers-emacs@rgrjr.homedns.org> writes: > Yep; here's a fix, to be applied to the previous patch. That fixes the issue here, so I've now pushed the patches to Emacs 29. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2022-02-25 12:03 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-11-30 20:55 bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates Bob Rogers 2021-12-01 4:17 ` Lars Ingebrigtsen 2021-12-03 5:19 ` Katsumi Yamaoka 2021-12-03 16:29 ` Lars Ingebrigtsen 2021-12-03 18:38 ` Michael Heerdegen 2021-12-04 18:58 ` Paul Eggert 2021-12-19 21:11 ` Bob Rogers 2021-12-20 10:08 ` Lars Ingebrigtsen 2021-12-20 15:57 ` Bob Rogers 2021-12-20 16:34 ` Bob Rogers 2021-12-21 11:01 ` Lars Ingebrigtsen 2021-12-23 19:48 ` Bob Rogers 2021-12-24 9:29 ` Lars Ingebrigtsen 2021-12-24 15:58 ` Bob Rogers 2021-12-25 11:58 ` Lars Ingebrigtsen 2021-12-25 22:50 ` Bob Rogers 2021-12-26 11:31 ` Lars Ingebrigtsen 2021-12-28 15:52 ` Bob Rogers 2021-12-29 15:19 ` Lars Ingebrigtsen 2021-12-29 19:29 ` Paul Eggert 2021-12-29 22:01 ` Bob Rogers 2021-12-30 5:32 ` Bob Rogers 2021-12-30 21:08 ` Bob Rogers 2022-01-01 14:47 ` Lars Ingebrigtsen 2022-01-01 14:56 ` Andreas Schwab 2022-01-02 0:41 ` Bob Rogers 2022-01-03 11:34 ` Lars Ingebrigtsen 2022-01-04 4:45 ` Bob Rogers 2022-01-05 15:46 ` Lars Ingebrigtsen 2022-01-05 22:49 ` Bob Rogers [not found] ` <25105.33397.961104.269676@orion.rgrjr.com> 2022-02-20 12:25 ` Lars Ingebrigtsen 2022-02-20 13:03 ` Andreas Schwab [not found] ` <87ilt9vicd.fsf@gnus.org> 2022-02-20 22:14 ` Bob Rogers 2022-02-23 23:15 ` Bob Rogers 2022-02-24 9:19 ` Lars Ingebrigtsen 2022-02-25 0:49 ` Bob Rogers 2022-02-25 2:16 ` Lars Ingebrigtsen 2022-02-25 2:32 ` Bob Rogers 2022-02-25 2:58 ` Bob Rogers 2022-02-25 12:03 ` Lars Ingebrigtsen
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).