unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Shorter and more flexible implementation for parse-time.el
@ 2021-07-23 17:03 Guu, Jin-Cheng
  2021-07-24 12:12 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 7+ messages in thread
From: Guu, Jin-Cheng @ 2021-07-23 17:03 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2271 bytes --]

Hello,

(This is my first post. Please correct me if I missed anything. =) )

Currently, the built-in time parsing functions only allow us to parse
timestrings with respect to some given formats (ISO-8601, RFC-822) [1]. I
have a parsing function that allows users to parse with customized formats
which are easy to write. The end result is:

``` emacs-lisp

(my/parse-time "20200718-201504"
               '((:year 4) (:month 2) (:day 2) "-"
                 (:hour 2) (:minute 2) (:second 2)));; => ((:second .
4) (:minute . 15) (:hour . 20);;     (:day . 18) (:month . 7) (:year .
2020))

```

One can even parse org timestamps easily with the format

``` emacs-lisp
'("[" (:year 4) "-" (:month 2) "-" (:day 2) "]")
```

The code is ~25 lines long, which is easily extendable (see below). I
wonder if there's any interest to add this into emacs. I can write tests
and benchmarks to make sure that it doesn't change the user space.

``` emacs-lisp

;;; Actual code

(defun my/parse-time (str format)
  "Parse time string with customized format, and return an alist.A
format is a list of directives. A directive is either a stringor a
list (A B), where A is a keyword, and B is an integer."
  (flet ((parse-step
          (directive str)
          (if (atom directive)
              (if (string-match (format "^%s" directive) str)
                  (list (substring str (length directive)))
                (error "Parsing failure~ directive: %s; str: %s."
directive str))
            (let* ((key (car directive))
                   (int (car (cdr directive)))
                   (to-parse (substring str 0 int))
                   ;; TODO For natural lang, replace
                   ;; parse-integer by any customized
                   ;; transformers.
                   (value (cl-parse-integer to-parse)))
              (list (substring str int) key value)))))
    (let (result)
      (while (not (equal str ""))
        (let* ((return (parse-step (pop format) str))
               (new-str (car return))
               (key (car (cdr return)))
               (value (car (cddr return))))
          (setf str new-str)
          (when key (setf (alist-get key result) value))))

```

[1]
https://github.com/emacs-mirror/emacs/blob/master/lisp/calendar/parse-time.el

[-- Attachment #2: Type: text/html, Size: 10136 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Shorter and more flexible implementation for parse-time.el
  2021-07-23 17:03 Shorter and more flexible implementation for parse-time.el Guu, Jin-Cheng
@ 2021-07-24 12:12 ` Lars Ingebrigtsen
  2021-07-24 15:03   ` Guu, Jin-Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: Lars Ingebrigtsen @ 2021-07-24 12:12 UTC (permalink / raw)
  To: Guu, Jin-Cheng; +Cc: emacs-devel

"Guu, Jin-Cheng" <jcguu95@gmail.com> writes:

> Currently, the built-in time parsing functions only allow us to parse
> timestrings with respect to some given formats (ISO-8601, RFC-822)
> [1]. I have a parsing function that allows users to parse with
> customized formats which are easy to write. The end result is:
>
> ``` emacs-lisp
> (my/parse-time "20200718-201504"
>                '((:year 4) (:month 2) (:day 2) "-"
>                  (:hour 2) (:minute 2) (:second 2)))
> ;; => ((:second . 4) (:minute . 15) (:hour . 20)
> ;;     (:day . 18) (:month . 7) (:year . 2020))
> ```

It's an interesting approach, but in my experience (having written
parsers for probably more than a hundred different time formats), this
doesn't really get you very far -- the formats often vary in length, use
month names, and so on, so you end up having to write tiny functions for
every format.

So I'm not sure having something like this in Emacs would help that much.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Shorter and more flexible implementation for parse-time.el
  2021-07-24 12:12 ` Lars Ingebrigtsen
@ 2021-07-24 15:03   ` Guu, Jin-Cheng
  2021-07-25  6:34     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 7+ messages in thread
From: Guu, Jin-Cheng @ 2021-07-24 15:03 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1882 bytes --]

Hi Lars,

Thanks for your response! I've done some research, and the formats I have
seen all have their rigid formats (with a fixed length in particular)
[1][2][3]. Would you mind pointing me to some exceptions? I can try to
accommodate.

That said, I'd argue it is still useful. It is much shorter, is flexible
and customizable, and uses lisp sexprs instead of a DSL. Please feel free
to let me know what it lacks of, I will try to work on it and present a
final result. =)

Cheers,
Jin

[1]
https://common-lisp.net/project/local-time/manual.html#Parsing-and-Formatting
[2] https://en.wikipedia.org/wiki/ISO_8601
[3]
https://www.ibm.com/docs/en/spss-statistics/23.0.0?topic=formats-date-time

On Sat, Jul 24, 2021 at 7:12 AM Lars Ingebrigtsen <larsi@gnus.org> wrote:

> "Guu, Jin-Cheng" <jcguu95@gmail.com> writes:
>
> > Currently, the built-in time parsing functions only allow us to parse
> > timestrings with respect to some given formats (ISO-8601, RFC-822)
> > [1]. I have a parsing function that allows users to parse with
> > customized formats which are easy to write. The end result is:
> >
> > ``` emacs-lisp
> > (my/parse-time "20200718-201504"
> >                '((:year 4) (:month 2) (:day 2) "-"
> >                  (:hour 2) (:minute 2) (:second 2)))
> > ;; => ((:second . 4) (:minute . 15) (:hour . 20)
> > ;;     (:day . 18) (:month . 7) (:year . 2020))
> > ```
>
> It's an interesting approach, but in my experience (having written
> parsers for probably more than a hundred different time formats), this
> doesn't really get you very far -- the formats often vary in length, use
> month names, and so on, so you end up having to write tiny functions for
> every format.
>
> So I'm not sure having something like this in Emacs would help that much.
>
> --
> (domestic pets only, the antidote for overdose, milk.)
>    bloggy blog: http://lars.ingebrigtsen.no
>

[-- Attachment #2: Type: text/html, Size: 3661 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Shorter and more flexible implementation for parse-time.el
  2021-07-24 15:03   ` Guu, Jin-Cheng
@ 2021-07-25  6:34     ` Lars Ingebrigtsen
  2021-07-25  9:38       ` Guu, Jin-Cheng
  0 siblings, 1 reply; 7+ messages in thread
From: Lars Ingebrigtsen @ 2021-07-25  6:34 UTC (permalink / raw)
  To: Guu, Jin-Cheng; +Cc: emacs-devel

"Guu, Jin-Cheng" <jcguu95@gmail.com> writes:

> Thanks for your response! I've done some research, and the formats I
> have seen all have their rigid formats (with a fixed length in
> particular) [1][2][3]. Would you mind pointing me to some exceptions?
> I can try to accommodate.

There's a million formats out there, like "3-NOV-94" etc in various
permutations.

> That said, I'd argue it is still useful. It is much shorter, is
> flexible and customizable, and uses lisp sexprs instead of a
> DSL. Please feel free to let me know what it lacks of, I will try to
> work on it and present a final result. =)

My experience is that there is no DSL that can cover date parsing that's
handier than just writing some code, unfortunately.  Your library covers
the 

(encode-time
 (mapcar (lambda (bit)
	   (if bit
	       (string-to-number bit)
	     0))
	 (list nil nil nil (substring string 0 2)
               (substring string 2 4)
               (substring string 4 8))))

etc case, and that's not where the hard thing in date parsing is.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Shorter and more flexible implementation for parse-time.el
  2021-07-25  6:34     ` Lars Ingebrigtsen
@ 2021-07-25  9:38       ` Guu, Jin-Cheng
  2021-07-25  9:58         ` Yuri Khan
  0 siblings, 1 reply; 7+ messages in thread
From: Guu, Jin-Cheng @ 2021-07-25  9:38 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

Thanks for sharing your food for thought!

If a project wants to use a fixed format with fixed length for each
entry, I believe that short function is useful. I do not know if this
is often wanted by others, but for me I constantly run into this
question. If there's no need, I won't push too. Just want to offer
some help :)

So yeah, my library won't deal with "3-NOV-94", but it can deal with
"03-NOV-94" or even "003-NOV-94" if the user decides to fix the format
to be "dd-mmm-YY" or "0dd-mmm-YY".

And you're totally right, date parsing as a whole won't be easy. In
general we would hope for a parser that is smart enough to deal with
flexible formats. However, some specification needs to be given at
some point - for example "01/01/01". That is why I came up with an
assumption that I think is general enough but also can have a definite
output.

Cheers,
Jin



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Shorter and more flexible implementation for parse-time.el
  2021-07-25  9:38       ` Guu, Jin-Cheng
@ 2021-07-25  9:58         ` Yuri Khan
  2021-07-25 10:14           ` Arthur Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Yuri Khan @ 2021-07-25  9:58 UTC (permalink / raw)
  To: Guu, Jin-Cheng; +Cc: Lars Ingebrigtsen, Emacs developers

On Sun, 2021-07-25T16:38, Guu, Jin-Cheng <jcguu95@gmail.com> wrote:

> And you're totally right, date parsing as a whole won't be easy. In
> general we would hope for a parser that is smart enough to deal with
> flexible formats. However, some specification needs to be given at
> some point - for example "01/01/01". That is why I came up with an
> assumption that I think is general enough but also can have a definite
> output.

The right way to solve the date parsing problem is by getting everyone
on ISO-8601, not by learning to parse every existing format.

https://xkcd.com/1179/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Shorter and more flexible implementation for parse-time.el
  2021-07-25  9:58         ` Yuri Khan
@ 2021-07-25 10:14           ` Arthur Miller
  0 siblings, 0 replies; 7+ messages in thread
From: Arthur Miller @ 2021-07-25 10:14 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Lars Ingebrigtsen, Guu, Jin-Cheng, Emacs developers

Yuri Khan <yuri.v.khan@gmail.com> writes:

> On Sun, 2021-07-25T16:38, Guu, Jin-Cheng <jcguu95@gmail.com> wrote:
>
>> And you're totally right, date parsing as a whole won't be easy. In
>> general we would hope for a parser that is smart enough to deal with
>> flexible formats. However, some specification needs to be given at
>> some point - for example "01/01/01". That is why I came up with an
>> assumption that I think is general enough but also can have a definite
>> output.
>
> The right way to solve the date parsing problem is by getting everyone
> on ISO-8601, not by learning to parse every existing format.
>
> https://xkcd.com/1179/

Indeed, but as we have seen with other standards that will probably
not happen out lifetimes. Americans are still measuring in inches and
Englishman in yars, despite both signing SI convention long, long time
ago ... :).

How interesting/prospectful would it be to get (optional) elisp binding
for the icu library?

https://github.com/unicode-org/icu

http://site.icu-project.org/

It offers lots of translations, date formats, and other locale things.

ICU is likely to be installed on most of desktops since lots of other
software use it, and it has support for binding from C.

Alternatively one can transform their database into something usable
form Emacs, but it's probably more work and would be duplication of the
database which probably is installed somewhere on users computer
anyway.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-25 10:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-23 17:03 Shorter and more flexible implementation for parse-time.el Guu, Jin-Cheng
2021-07-24 12:12 ` Lars Ingebrigtsen
2021-07-24 15:03   ` Guu, Jin-Cheng
2021-07-25  6:34     ` Lars Ingebrigtsen
2021-07-25  9:38       ` Guu, Jin-Cheng
2021-07-25  9:58         ` Yuri Khan
2021-07-25 10:14           ` Arthur Miller

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).