unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: "Basil L. Contovounesios" <contovob@tcd.ie>,
	Ag Ibragimov <agzam.ibragimov@gmail.com>,
	Emacs developers <emacs-devel@gnu.org>
Subject: Re: Pattern matching on match-string groups #elisp #question
Date: Sat, 27 Feb 2021 19:10:59 +0100	[thread overview]
Message-ID: <F3C55521-14AC-44C3-936F-714841F5735A@acm.org> (raw)
In-Reply-To: <jwvy2f98unk.fsf-monnier+emacs@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]

27 feb. 2021 kl. 15.39 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> Nevertheless, I went ahead with this change (after remembering that
> wrapping the code in `ignore` should eliminate the extra warnings).

So where does that leave us with the rx pattern? There's still the interleaved match data problem, which I've tried to address below.

> It's clearly The Right Thing™.

Perhaps it is; a proposed diff is attached below which treats zero and one variable specially and uses a list followed by immediate destructuring for >1 variables. (By the way, using a backquote form to generate backquote forms is annoying.)

My guess is that a vector may be faster than a list if there are more than N elements, for some N.

>> My guess is that a vector may be faster than a list if there are more than N elements, for some N.
> 
> I'll let you benchmark it to determine the N.

I now have, and am sad to say that a list is always faster for any practical number of N (I didn't bother trying more than 30) although the difference narrows as N grows. This is despite the destructuring code becoming considerably bigger for lists (as we get a long chain of tests and branches) than for vectors. It all boils down to vector construction being more expensive than lists.

Maybe we should pack N>1 variables into N-1 cons cells by using the last cdr (improper list), but list* is no primitive so it may be a loss for N>M, for some M>2.

> currently `string-match-p` is ever so slightly
> slower than `string-match` and since we clobber the match data in other
> cases, we might as well clobber the match data in this case as well: any
> code which presumes the match data isn't affected by some other code
> which uses regular expressions is quite confused.

Right; I'm sticking to string-match for the time being.

> I don't think it's much more complicated than your current constant
> folding: when you see a let-binding of a variable to a *constructor*,
> stash that expression in your context as a "partially known constant"
> and then do the constant folding when you see a matching *destructor*.

Doable, but definitely not low-hanging fruit. Since pcase has made a dog's breakfast of the destructuring code it's not straightforward to recognise it as such in the optimiser. Efforts needed elsewhere first!

> go back to the last option it tried and accept it even though it failed
> to match.  It still sucks, but maybe it'll give someone else a better idea?

Sounds like pcase--dead-end would fit then, at least as an internal name.

Or pcase--Sherlock-Holmes.


[-- Attachment #2: rx-pcase-list-destructure.diff --]
[-- Type: application/octet-stream, Size: 2401 bytes --]

diff --git a/lisp/emacs-lisp/rx.el b/lisp/emacs-lisp/rx.el
index ffc21951b6..736758d01f 100644
--- a/lisp/emacs-lisp/rx.el
+++ b/lisp/emacs-lisp/rx.el
@@ -1436,17 +1436,31 @@ rx
                    introduced by a previous (let REF ...)
                    construct."
   (let* ((rx--pcase-vars nil)
-         (regexp (rx--to-expr (rx--pcase-transform (cons 'seq regexps)))))
+         (regexp (rx--to-expr (rx--pcase-transform (cons 'seq regexps))))
+         (nvars (length rx--pcase-vars)))
     `(and (pred stringp)
-          ;; `pcase-let' takes a match for granted and discards all unnecessary
-          ;; conditions, which means that a `pred' clause cannot be used for
-          ;; the match condition.  The following construct seems to survive.
-          (app (lambda (s) (string-match ,regexp s)) (pred identity))
-          ,@(let ((i 0))
-              (mapcar (lambda (name)
-                        (setq i (1+ i))
-                        `(app (match-string ,i) ,name))
-                      (reverse rx--pcase-vars))))))
+          ,(cond ((= nvars 0)
+                  ;; No variables bound: a single predicate suffices.
+                  `(pred (string-match ,regexp)))
+                 ((= nvars 1)
+                  ;; Single variable: bind it to the result of the
+                  ;; lambda function below.
+                  `(app (lambda (s)
+                          (and (string-match ,regexp s)
+                               (match-string 1 s)))
+                        ,(car rx--pcase-vars)))
+                 (t
+                  ;; Multiple variables: pack the submatches into a list
+                  ;; which is then immediately destructured into individual
+                  ;; variables again.  This is of course slightly inefficient.
+                  `(app (lambda (s)
+                          (and (string-match ,regexp s)
+                               (list
+                                ,@(mapcar (lambda (i) `(match-string ,i s))
+                                          (number-sequence 1 nvars)))))
+                        ,(list '\`
+                               (mapcar (lambda (name) (list '\, name))
+                                       (reverse rx--pcase-vars)))))))))
 
 ;; Obsolete internal symbol, used in old versions of the `flycheck' package.
 (define-obsolete-function-alias 'rx-submatch-n 'rx-to-string "27.1")

  reply	other threads:[~2021-02-27 18:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-25  5:11 Pattern matching on match-string groups #elisp #question Ag Ibragimov
2021-02-25 14:55 ` Basil L. Contovounesios
2021-02-25 15:32   ` Stefan Monnier
2021-02-25 18:28     ` Mattias Engdegård
2021-02-26  4:31       ` Stefan Monnier
2021-02-26 10:24         ` Mattias Engdegård
2021-02-26 19:38           ` Stefan Monnier
2021-02-27 10:17             ` Mattias Engdegård
2021-02-27 14:39               ` Stefan Monnier
2021-02-27 18:10                 ` Mattias Engdegård [this message]
2021-02-27 20:32                   ` Stefan Monnier
2021-02-28 13:46                     ` Mattias Engdegård
2021-02-28 15:37                       ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F3C55521-14AC-44C3-936F-714841F5735A@acm.org \
    --to=mattiase@acm.org \
    --cc=agzam.ibragimov@gmail.com \
    --cc=contovob@tcd.ie \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).