From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: rx.el sexp regexp syntax Date: Mon, 04 Jun 2018 09:56:56 -0400 Message-ID: References: <87h8mw3yoc.fsf@gmail.com> <20180525155126.GA4096@ACM> <87lgc7hebk.fsf@gmail.com> <87r2lzd375.fsf@ericabrahamsen.net> <87lgbx2dc1.fsf@ericabrahamsen.net> <87in6z6gwa.fsf@ericabrahamsen.net> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1528120517 2350 195.159.176.226 (4 Jun 2018 13:55:17 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 4 Jun 2018 13:55:17 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jun 04 15:55:13 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fPpx3-0000UT-Lj for ged-emacs-devel@m.gmane.org; Mon, 04 Jun 2018 15:55:13 +0200 Original-Received: from localhost ([::1]:39932 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fPpzA-0006eX-Rw for ged-emacs-devel@m.gmane.org; Mon, 04 Jun 2018 09:57:24 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43017) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fPpz3-0006eF-Bd for emacs-devel@gnu.org; Mon, 04 Jun 2018 09:57:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fPpz0-0001bC-77 for emacs-devel@gnu.org; Mon, 04 Jun 2018 09:57:17 -0400 Original-Received: from [195.159.176.226] (port=46948 helo=blaine.gmane.org) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fPpyz-0001aN-VM for emacs-devel@gnu.org; Mon, 04 Jun 2018 09:57:14 -0400 Original-Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1fPpwj-0008T2-In for emacs-devel@gnu.org; Mon, 04 Jun 2018 15:54:53 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 75 Original-X-Complaints-To: usenet@blaine.gmane.org Cancel-Lock: sha1:ef+MZ56o4jkmJojV3I0Zv5eDhF0= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 195.159.176.226 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:225990 Archived-At: > Even after removing "extra" backslashes, it's still a bear: > > "([0-9][BkKMGTPEZY]? > (([0-9][0-9][0-9][0-9]-)?[01][0-9]-[0-3][0-9][ T][ 0-2][0-9][:.][0-5][0-9](:[0-6][0-9]([.,][0-9]+)?( ?[-+][0-2][0-9][0-5][0-9])?)?|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9])|.*[0-9][BkKMGTPEZY]? > ((([A-Za-z']|[^\0-])([A-Za-z']|[^\0-])+\\.? +[ 0-3][0-9]|[ 0-3][0-9]\\.? > ([A-Za-z']|[^\0-])([A-Za-z']|[^\0-])+\\.?) > +([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9])|([A-Za-z']|[^\0-])([A-Za-z']|[^\0-])+\\.? > +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]|([ 0-1]?[0-9]([A-Za-z]|[^\0-])? > [ 0-3][0-9]([A-Za-z]|[^\0-])? +|[ 0-3][0-9] [ 0-1]?[0-9] > +)([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9]([A-Za-z]|[^\0-])?))) +" For such regexps, the exact syntax (PCRE, BRE, ERE, RX, ...) in use has fairly little importance: if written "raw" as above, it will be indecipherable in any case. To make it readable, you need to add human-level explanations e.g. by adding comments and naming sub-elements. Which is indeed what is done in the source code: (defvar directory-listing-before-filename-regexp (let* ((l "\\([A-Za-z]\\|[^\0-\177]\\)") (l-or-quote "\\([A-Za-z']\\|[^\0-\177]\\)") ;; In some locales, month abbreviations are as short as 2 letters, ;; and they can be followed by ".". ;; In Breton, a month name can include a quote character. (month (concat l-or-quote l-or-quote "+\\.?")) (s " ") (yyyy "[0-9][0-9][0-9][0-9]") (dd "[ 0-3][0-9]") (HH:MM "[ 0-2][0-9][:.][0-5][0-9]") (seconds "[0-6][0-9]\\([.,][0-9]+\\)?") (zone "[-+][0-2][0-9][0-5][0-9]") (iso-mm-dd "[01][0-9]-[0-3][0-9]") (iso-time (concat HH:MM "\\(:" seconds "\\( ?" zone "\\)?\\)?")) (iso (concat "\\(\\(" yyyy "-\\)?" iso-mm-dd "[ T]" iso-time "\\|" yyyy "-" iso-mm-dd "\\)")) (western (concat "\\(" month s "+" dd "\\|" dd "\\.?" s month "\\)" s "+" "\\(" HH:MM "\\|" yyyy "\\)")) (western-comma (concat month s "+" dd "," s "+" yyyy)) ;; Japanese MS-Windows ls-lisp has one-digit months, and ;; omits the Kanji characters after month and day-of-month. ;; On Mac OS X 10.3, the date format in East Asian locales is ;; day-of-month digits followed by month digits. (mm "[ 0-1]?[0-9]") (east-asian (concat "\\(" mm l "?" s dd l "?" s "+" "\\|" dd s mm s "+" "\\)" "\\(" HH:MM "\\|" yyyy l "?" "\\)"))) ;; The "[0-9]" below requires the previous column to end in a digit. ;; This avoids recognizing `1 may 1997' as a date in the line: ;; -r--r--r-- 1 may 1997 1168 Oct 19 16:49 README ;; The "[BkKMGTPEZY]?" below supports "ls -alh" output. ;; For non-iso date formats, we add the ".*" in order to find ;; the last possible match. This avoids recognizing ;; `jservice 10 1024' as a date in the line: ;; drwxr-xr-x 3 jservice 10 1024 Jul 2 1997 esg-host ;; vc dired listings provide the state or blanks between file ;; permissions and date. The state is always surrounded by ;; parentheses: ;; -rw-r--r-- (modified) 2005-10-22 21:25 files.el ;; This is not supported yet. (purecopy (concat "\\([0-9][BkKMGTPEZY]? " iso "\\|.*[0-9][BkKMGTPEZY]? " "\\(" western "\\|" western-comma "\\|" east-asian "\\)" "\\) +"))) "Regular expression to match up to the file name in a directory listing. The default value is designed to recognize dates and times regardless of the language.") -- Stefan