all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@iro.umontreal.ca>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: "Basil L. Contovounesios" <contovob@tcd.ie>,
	Ag Ibragimov <agzam.ibragimov@gmail.com>,
	emacs-devel@gnu.org
Subject: Re: Pattern matching on match-string groups #elisp #question
Date: Sat, 27 Feb 2021 09:39:27 -0500	[thread overview]
Message-ID: <jwvy2f98unk.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <288FFC66-E3BE-4E5F-AAD5-309A632F8058@acm.org> ("Mattias Engdegård"'s message of "Sat, 27 Feb 2021 11:17:55 +0100")

>> BTW, I was thinking about making the optimization more conservative, so
>> it only throws away the actual `if` but keeps the computation of the test:
> [...]
>> and it does fix the `pcase-let` problem with your original code.
> Given the trouble I think we can defend not respecting side-effects in
> something as functional as pcase!

Nevertheless, I went ahead with this change (after remembering that
wrapping the code in `ignore` should eliminate the extra warnings).

>> It should macroexpand to something morally equivalent to:
>> 
>>    (cond ((not (stringp STR)) nil)
>>          ((not (string-match "\\(?1:a*\\)" STR)) nil)
>>          ((looking-at "^"")
>>           (let* ((x1464 (match-string 1 STR)))
>>             (let ((FOO x1464)) FOO))))
>
> Oh dear... perhaps we should just go with the intermediate list (or vector)
> and suffer the small allocation penalty? (At least we should treat the case
> of a single variable specially, since no consing would then be necessary.)

It's clearly The Right Thing™.

> My guess is that a vector may be faster than a list if there are more than N elements, for some N.

I'll let you benchmark it to determine the N.

> Should we use string-match-p when there are no variables bound in the rx clause?

We could.  Tho, IIRC currently `string-match-p` is ever so slightly
slower than `string-match` and since we clobber the match data in other
cases, we might as well clobber the match data in this case as well: any
code which presumes the match data isn't affected by some other code
which uses regular expressions is quite confused.

>>> Of course a sufficiently optimising compiler would eliminate the consing!
>> Indeed, and it's not a difficult optimization (at least if you can
>> presume that this data is immutable).
> Right, although we would need some more serious data-flow infrastructure
> first. It would be useful for pattern-matching two or more values at the
> same time.

I don't think it's much more complicated than your current constant
folding: when you see a let-binding of a variable to a *constructor*,
stash that expression in your context as a "partially known constant"
and then do the constant folding when you see a matching *destructor*.
The problems I see are:

- you need to detect side effects between the constructor and the
  destructor.  You could just consider any *read* of a variable holding
  such partial-constant as a (potential) side-effect (except for those
  reads recognized as part of destructors, of course).  It should be
  good enough for the case under discussion.

- more importantly in order not to duplicate code and its side-effects
  (and suffer risks linked to moving code into a different scope), you
  need to convert your constructor so all its arguments are trivial and
  "scope safe" (e.g. gensym'd variables, integer constants, symbols,
  ...).

>>>> It's linked to the special undocumented pcase pattern `pcase--dontcare`
>>>> (whose name is not well chosen, suggestions for better names are
>>>> welcome)
>>> 
>>> pcase--give-up
>> 
>> Hmm... probably not much more explanatory than "dontcare".
>
> Well, 'dontcare' suggests that anything would do and the value not being
> used, like '_', but that's quite misleading.

Right, that's part of the problem with this naming: it doesn't want to
mean that anything matches, like `_`, but that any behavior is
acceptable when pcase has to try and match against this pattern.
The other part is that it's not true: we wouldn't settle for "any"
behavior, it still has to be sane-ish.

>> I was thinking of `pcase--impossible` as well.
> Yes, that looks acceptable. In any case, it isn't really a user-facing
> symbol, is it? Otherwise we'd need crystal-clear semantics (and lose the
> double dashes).

I would settle for something a bit less than "crystal-clear", but yes to
drop the "--" we'd need a clear enough semantics.

Here's another idea for its name: `pcase--go-back` since it makes pcase
go back to the last option it tried and accept it even though it failed
to match.  It still sucks, but maybe it'll give someone else a better idea?


        Stefan




  reply	other threads:[~2021-02-27 14:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-25  5:11 Pattern matching on match-string groups #elisp #question Ag Ibragimov
2021-02-25 14:55 ` Basil L. Contovounesios
2021-02-25 15:32   ` Stefan Monnier
2021-02-25 18:28     ` Mattias Engdegård
2021-02-26  4:31       ` Stefan Monnier
2021-02-26 10:24         ` Mattias Engdegård
2021-02-26 19:38           ` Stefan Monnier
2021-02-27 10:17             ` Mattias Engdegård
2021-02-27 14:39               ` Stefan Monnier [this message]
2021-02-27 18:10                 ` Mattias Engdegård
2021-02-27 20:32                   ` Stefan Monnier
2021-02-28 13:46                     ` Mattias Engdegård
2021-02-28 15:37                       ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvy2f98unk.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=agzam.ibragimov@gmail.com \
    --cc=contovob@tcd.ie \
    --cc=emacs-devel@gnu.org \
    --cc=mattiase@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.