[-- Attachment #1: Type: text/plain, Size: 2271 bytes --] Hello, (This is my first post. Please correct me if I missed anything. =) ) Currently, the built-in time parsing functions only allow us to parse timestrings with respect to some given formats (ISO-8601, RFC-822) [1]. I have a parsing function that allows users to parse with customized formats which are easy to write. The end result is: ``` emacs-lisp (my/parse-time "20200718-201504" '((:year 4) (:month 2) (:day 2) "-" (:hour 2) (:minute 2) (:second 2)));; => ((:second . 4) (:minute . 15) (:hour . 20);; (:day . 18) (:month . 7) (:year . 2020)) ``` One can even parse org timestamps easily with the format ``` emacs-lisp '("[" (:year 4) "-" (:month 2) "-" (:day 2) "]") ``` The code is ~25 lines long, which is easily extendable (see below). I wonder if there's any interest to add this into emacs. I can write tests and benchmarks to make sure that it doesn't change the user space. ``` emacs-lisp ;;; Actual code (defun my/parse-time (str format) "Parse time string with customized format, and return an alist.A format is a list of directives. A directive is either a stringor a list (A B), where A is a keyword, and B is an integer." (flet ((parse-step (directive str) (if (atom directive) (if (string-match (format "^%s" directive) str) (list (substring str (length directive))) (error "Parsing failure~ directive: %s; str: %s." directive str)) (let* ((key (car directive)) (int (car (cdr directive))) (to-parse (substring str 0 int)) ;; TODO For natural lang, replace ;; parse-integer by any customized ;; transformers. (value (cl-parse-integer to-parse))) (list (substring str int) key value))))) (let (result) (while (not (equal str "")) (let* ((return (parse-step (pop format) str)) (new-str (car return)) (key (car (cdr return))) (value (car (cddr return)))) (setf str new-str) (when key (setf (alist-get key result) value)))) ``` [1] https://github.com/emacs-mirror/emacs/blob/master/lisp/calendar/parse-time.el [-- Attachment #2: Type: text/html, Size: 10136 bytes --]
"Guu, Jin-Cheng" <jcguu95@gmail.com> writes: > Currently, the built-in time parsing functions only allow us to parse > timestrings with respect to some given formats (ISO-8601, RFC-822) > [1]. I have a parsing function that allows users to parse with > customized formats which are easy to write. The end result is: > > ``` emacs-lisp > (my/parse-time "20200718-201504" > '((:year 4) (:month 2) (:day 2) "-" > (:hour 2) (:minute 2) (:second 2))) > ;; => ((:second . 4) (:minute . 15) (:hour . 20) > ;; (:day . 18) (:month . 7) (:year . 2020)) > ``` It's an interesting approach, but in my experience (having written parsers for probably more than a hundred different time formats), this doesn't really get you very far -- the formats often vary in length, use month names, and so on, so you end up having to write tiny functions for every format. So I'm not sure having something like this in Emacs would help that much. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no
[-- Attachment #1: Type: text/plain, Size: 1882 bytes --] Hi Lars, Thanks for your response! I've done some research, and the formats I have seen all have their rigid formats (with a fixed length in particular) [1][2][3]. Would you mind pointing me to some exceptions? I can try to accommodate. That said, I'd argue it is still useful. It is much shorter, is flexible and customizable, and uses lisp sexprs instead of a DSL. Please feel free to let me know what it lacks of, I will try to work on it and present a final result. =) Cheers, Jin [1] https://common-lisp.net/project/local-time/manual.html#Parsing-and-Formatting [2] https://en.wikipedia.org/wiki/ISO_8601 [3] https://www.ibm.com/docs/en/spss-statistics/23.0.0?topic=formats-date-time On Sat, Jul 24, 2021 at 7:12 AM Lars Ingebrigtsen <larsi@gnus.org> wrote: > "Guu, Jin-Cheng" <jcguu95@gmail.com> writes: > > > Currently, the built-in time parsing functions only allow us to parse > > timestrings with respect to some given formats (ISO-8601, RFC-822) > > [1]. I have a parsing function that allows users to parse with > > customized formats which are easy to write. The end result is: > > > > ``` emacs-lisp > > (my/parse-time "20200718-201504" > > '((:year 4) (:month 2) (:day 2) "-" > > (:hour 2) (:minute 2) (:second 2))) > > ;; => ((:second . 4) (:minute . 15) (:hour . 20) > > ;; (:day . 18) (:month . 7) (:year . 2020)) > > ``` > > It's an interesting approach, but in my experience (having written > parsers for probably more than a hundred different time formats), this > doesn't really get you very far -- the formats often vary in length, use > month names, and so on, so you end up having to write tiny functions for > every format. > > So I'm not sure having something like this in Emacs would help that much. > > -- > (domestic pets only, the antidote for overdose, milk.) > bloggy blog: http://lars.ingebrigtsen.no > [-- Attachment #2: Type: text/html, Size: 3661 bytes --]
"Guu, Jin-Cheng" <jcguu95@gmail.com> writes: > Thanks for your response! I've done some research, and the formats I > have seen all have their rigid formats (with a fixed length in > particular) [1][2][3]. Would you mind pointing me to some exceptions? > I can try to accommodate. There's a million formats out there, like "3-NOV-94" etc in various permutations. > That said, I'd argue it is still useful. It is much shorter, is > flexible and customizable, and uses lisp sexprs instead of a > DSL. Please feel free to let me know what it lacks of, I will try to > work on it and present a final result. =) My experience is that there is no DSL that can cover date parsing that's handier than just writing some code, unfortunately. Your library covers the (encode-time (mapcar (lambda (bit) (if bit (string-to-number bit) 0)) (list nil nil nil (substring string 0 2) (substring string 2 4) (substring string 4 8)))) etc case, and that's not where the hard thing in date parsing is. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no
Thanks for sharing your food for thought! If a project wants to use a fixed format with fixed length for each entry, I believe that short function is useful. I do not know if this is often wanted by others, but for me I constantly run into this question. If there's no need, I won't push too. Just want to offer some help :) So yeah, my library won't deal with "3-NOV-94", but it can deal with "03-NOV-94" or even "003-NOV-94" if the user decides to fix the format to be "dd-mmm-YY" or "0dd-mmm-YY". And you're totally right, date parsing as a whole won't be easy. In general we would hope for a parser that is smart enough to deal with flexible formats. However, some specification needs to be given at some point - for example "01/01/01". That is why I came up with an assumption that I think is general enough but also can have a definite output. Cheers, Jin
On Sun, 2021-07-25T16:38, Guu, Jin-Cheng <jcguu95@gmail.com> wrote: > And you're totally right, date parsing as a whole won't be easy. In > general we would hope for a parser that is smart enough to deal with > flexible formats. However, some specification needs to be given at > some point - for example "01/01/01". That is why I came up with an > assumption that I think is general enough but also can have a definite > output. The right way to solve the date parsing problem is by getting everyone on ISO-8601, not by learning to parse every existing format. https://xkcd.com/1179/
Yuri Khan <yuri.v.khan@gmail.com> writes: > On Sun, 2021-07-25T16:38, Guu, Jin-Cheng <jcguu95@gmail.com> wrote: > >> And you're totally right, date parsing as a whole won't be easy. In >> general we would hope for a parser that is smart enough to deal with >> flexible formats. However, some specification needs to be given at >> some point - for example "01/01/01". That is why I came up with an >> assumption that I think is general enough but also can have a definite >> output. > > The right way to solve the date parsing problem is by getting everyone > on ISO-8601, not by learning to parse every existing format. > > https://xkcd.com/1179/ Indeed, but as we have seen with other standards that will probably not happen out lifetimes. Americans are still measuring in inches and Englishman in yars, despite both signing SI convention long, long time ago ... :). How interesting/prospectful would it be to get (optional) elisp binding for the icu library? https://github.com/unicode-org/icu http://site.icu-project.org/ It offers lots of translations, date formats, and other locale things. ICU is likely to be installed on most of desktops since lots of other software use it, and it has support for binding from C. Alternatively one can transform their database into something usable form Emacs, but it's probably more work and would be duplication of the database which probably is installed somewhere on users computer anyway.