From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Bob Rogers Newsgroups: gmane.emacs.bugs Subject: bug#52209: 28.0.60; [PATCH] date-to-time fails on pure dates Date: Wed, 29 Dec 2021 17:01:01 -0500 Message-ID: <25036.55965.149104.938898@orion.rgrjr.com> References: <7c22f300-eedb-da65-db02-e82025ec2f48@cs.ucla.edu> <25023.40959.627321.685762@orion.rgrjr.com> <875yrjpp03.fsf@gnus.org> <25024.42989.718735.680188@orion.rgrjr.com> <871r26i5n0.fsf@gnus.org> <25028.53876.304365.706795@orion.rgrjr.com> <87zgoq2vwm.fsf@gnus.org> <25029.60989.564217.290743@orion.rgrjr.com> <87bl1428x5.fsf@gnus.org> <25031.40994.286546.498819@orion.rgrjr.com> <87ee5zzjpg.fsf@gnus.org> <25035.12991.328986.987982@orion.rgrjr.com> <87o84z782g.fsf@gnus.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28310"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Lars Ingebrigtsen , 52209@debbugs.gnu.org To: Paul Eggert Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Dec 29 23:02:26 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1n2h1e-0007CX-6X for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 29 Dec 2021 23:02:26 +0100 Original-Received: from localhost ([::1]:56946 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n2h1d-0002aL-1v for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 29 Dec 2021 17:02:25 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:53478) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n2h1H-0002a1-94 for bug-gnu-emacs@gnu.org; Wed, 29 Dec 2021 17:02:04 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:38986) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n2h1G-0007Lt-F9 for bug-gnu-emacs@gnu.org; Wed, 29 Dec 2021 17:02:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1n2h1G-0006JX-Dd for bug-gnu-emacs@gnu.org; Wed, 29 Dec 2021 17:02:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Bob Rogers Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 29 Dec 2021 22:02:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52209 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 52209-submit@debbugs.gnu.org id=B52209.164081527124198 (code B ref 52209); Wed, 29 Dec 2021 22:02:02 +0000 Original-Received: (at 52209) by debbugs.gnu.org; 29 Dec 2021 22:01:11 +0000 Original-Received: from localhost ([127.0.0.1]:50532 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n2h0M-0006I4-GU for submit@debbugs.gnu.org; Wed, 29 Dec 2021 17:01:10 -0500 Original-Received: from rgrjr.com ([69.164.211.47]:39322) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n2h0K-0006Hr-01 for 52209@debbugs.gnu.org; Wed, 29 Dec 2021 17:01:05 -0500 Original-Received: from rgrjr.com (c-73-16-206-7.hsd1.ma.comcast.net [73.16.206.7]) by rgrjr.com (Postfix on openSUSE) with ESMTP id D81351D6BB2; Wed, 29 Dec 2021 22:01:15 +0000 (UTC) Original-Received: from orion.rgrjr.com (orion.rgrjr.com [192.168.0.3]) by scorpio.rgrjr.com (Postfix on openSUSE GNU/Linux) with ESMTP id F02D76017A; Wed, 29 Dec 2021 17:01:02 -0500 (EST) In-Reply-To: X-Mailer: VM 7.19 under Emacs 29.0.50 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:223388 Archived-At: From: Paul Eggert Date: Wed, 29 Dec 2021 11:29:44 -0800 On 12/29/21 07:19, Lars Ingebrigtsen wrote: > Bob Rogers writes: > >> I am currently working on broadening what the parser will accept, >> though I think it is close to a usable state. > > Makes sense to me. Perhaps Paul has some comments; added to the CCs. My first comment is "be careful what you're getting into" :-). I'm trying to retire from date-parsing as its users are never happy and rightly so. No worries; I have spent more of my career than I like to think about dealing with date/time issues, so I know what a can of worms I am in the process of opening. But here goes. I took a quick look at and have a few comments. They are greatly appreciated; thank you. * Calling it parse-date is a bit confusing, as it parses both dates and times. I suggest calling it parse-timestamp or parse-date-time instead. (I know the existing package is called parse-time but we can't fix that.) Lars originally suggested parse-time, but there's already a parse-time-tests.el, so I switched to parse-date so I could use parse-date-tests.el to correspond. So the namespace is already crowded. But I would be OK with either of those alternatives. Since it will actually give you either date or time, or both, parse-date-time might make more sense. * If the package is called X, the error should be called X-error. Currently the package is called parse-date and the error is called date-parse-error, which is confusing. My thought was that for the "parse-date" function, the verb should come before the noun, and in "date-parse-error", the "date" is an adjective further modifying "parse error." But I think I'm way fussier about these things than anybody I know, so your point is well taken. * The patch should also modify the comment at the start of parse-time.el to indicate parse-date-time as another possibility. I took that as a late-stage task, something to do alongside updating Elisp documentation. (Which I haven't even begun to look at.) * I suggest preferring the symbol 'rfc-email' for parsing email-related dates, for consistency with the --rfc-email option of GNU 'date'. This should use the current RFC (5322 now, perhaps updated later). I started with RFC822 and RFC2822 because I had copies of these lying around; you're right that I should have looked for more recent standards. And using rfc-email as a synonym for the latest version is a good idea. I suppose you could also advertise 'rfc-822' for strict RFC 822 conformance, and similarly 'rfc2822' for strict 2822 conformance, but I expect these alternatives would be less useful in practice. Anyone parsing email headers would need their date parser to support RFC822 in case they encountered very old emails, but (since later standards are backward-compatible) it's not clear what supporting intermediate standards would buy. > + nil => like us-date with two-digit years disallowed. This doesn't sound like a good default. For example, it completely mishandles dates in Brazil, which use DD/MM/YYYY format. I subsequently added a euro-date format for DD/MM (with various lengths of years). > +Anything else is treated as iso-8601 if it looks similar, else > +us-date with two-digit years disallowed. This might be a better default (for nil), but it should have an explicit name other than nil. Suggestions? > + * For all formats except iso-8601, parsing is case-insensitive. It's pretty common for ISO 8601 parsers to be case-insensitive. For example, Java's OffsetDateTime.parse(CharSequence) allow both lower and upper case T and Z. Perhaps some people need strict ISO 8601 parsers, but I imagine a more-generous parser would be more useful. So you could have iso-8601 and iso-8601-strict; or you could have a strictness arg; or something like that. Actually, I am handing those off to the existing iso8601-parse code, which doesn't like lowercase T (at least). > + * Commas and whitespace are ignored. This is quite wrong for some formats, if you want to be strict. And even if not, commas are part of ISO 8601 format and can't be ignored if I understand what you mean by "ignored". I see I need to clarify the docstring to state that these other bulleted comments also do not apply to ISO-8601 dates. > + * Two digit years, when allowed, are in the 1900's when > +between 50 and 99 inclusive and in the 2000's when between 0 and > +49 inclusive. This disagrees with the POSIX standard for 'date' (supported by GNU 'date'), which says 69-99 are treated as 1969-1999 and 00-68 are treated as 2000-2068. I suggest going with the POSIX heuristic if you're going to use a fixed heuristic for dates at all. I was just following the existing parse-time-string heuristic. So which do you think should rule: POSIX or parse-time-string compatibility? Better might be to have an optional argument of context specifying the default time for incomplete timestamps. You can use that the context to fill in more-significant parts that are missing. E.g., if the year is missing, you take it from the context; if the century is missing, you take that from the context. The default context would be empty, i.e., missing years or centuries would be an error. Again, I'm just doing what parse-time-string is doing, namely leaving everything that is not specified nil, and letting the caller decide how to apply defaults. The only exception is when time is specified without seconds; in that case, the seconds are set to zero (which is also compatible with parse-time-string). And even defaulting from context is not straightforward: If given a date without a year that is not today, should that be in the future or in the past? There's a can of worms I don't need to touch. ;-} For more formats that need parsing, see: https://en.wikipedia.org/wiki/Date_format_by_country https://metacpan.org/search?q=datetime%3A%3Aformat You don't need to support them all now, but you should take a look at what's out there and make sure the API can be extended to handle them. Excellent; thank you! I have been looking at date parsing module documentation but so far the ones I've seen have not been very clear about what they actually accept. -- Bob Rogers http://www.rgrjr.com/