bug#43100: 28.0.50; pcase not binding variables conditionally

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Pip Cet <pipcet@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Philipp Stephani <p.stephani2@gmail.com>, 43100@debbugs.gnu.org
Subject: bug#43100: 28.0.50; pcase not binding variables conditionally
Date: Wed, 2 Sep 2020 08:38:22 +0000	[thread overview]
Message-ID: <CAOqdjBcMHZOnpRscxAmPOn3PaMDOGhzaNykgnrOvBtCeWUQ0Tw@mail.gmail.com> (raw)
In-Reply-To: <jwva6yatfzo.fsf-monnier+emacs@gnu.org>

On Tue, Sep 1, 2020 at 3:12 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> > So, as I half-expected, the reaction to "pcase isn't powerful enough"
> > is "let's make it less powerful" :-)
> Your words, not mine.

I'm glad you said that!

> BTW, you can get (more or less) the behavior you want with:
>
>     (pcase V
>       ((or (and (pred symbolp) (let name name)) name)
>        (let ((foo 'bar)) name)))

That's a good alternative.

> [ The "more or less" is because when V is a symbol, the `name` within
>   the body of the branch will not refer to the presumed surrounding
>   binding of `name` but to a new binding also called `name` which
>   temporarily hides the outer binding but is initialized to the same
>   value.  The difference only kicks in if one of those bindings is
>   mutated.  ]
>
> > Seriously, I get the impression you strongly feel pcase shouldn't be
> > (more) powerful, it should instead make non-explicit but fairly strong
> > complexity promises.
>
> I just want the complexity to be predictable without having to guess
> which optimization will be applicable.

So it's better to have a predictably-bad `append' rather than a
sometimes-bad one? But how does that affect ,@? We could make that
predictably bad, too!

> >> More specifically, I'd rather not choose a semantics that imposes
> >> duplicating the branch body, since we have no control over its size and
> >> that can hence lead to potential code size explosion.
> > You're right, and it's a good thing that the duplication of the branch
> > body is a fixable implementation detail rather than something imposed
> > by the semantics.
>
> It's only fixable if you disregard the "more or less" above.
> I find it to be a pretty bad wrinkle in the semantics.

So setq the outer binding if it changed? Note that people also might expect

(pcase l (`(,a . ,b) (setq b a)))

to modify l...

> >> The design of `pcase` assumes you want to optimize away the tests that
> >> are common to the various patterns.  That can't be done with dynamic
> >> patterns.
> > Or it's a bit more difficult, at least...
> I think it's more than "a bit more difficult", because deciding how to
> optimize the tests will almost always take significantly more time than
> just doing the tests.  So in order to do it dynamically and be useful,
> you still need to have 2 stages, where you optimize at one stage and
> then use the result several times later (to recoup the cost of the
> optimization).

Wait, I think there was a misunderstanding here. I don't mean that the
pcase pattern should depend structurally on let-bound variables
appearing in it. That does sound impossible to me (except with dynamic
scoping).

> > The difficult part, in fact, is deciding that we want the arglist to
> > be part of the exposed function API: given an "arglist" function, the
> > rest of the implementation seems unproblematic,
>
> We have `help-function-arglist`, so it's not that hard.
> But it's not guaranteed to work.

Oh, thanks for pointing that out. It's not very good, though, is it?

> > though some workarounds for lexical binding are required (if nothing
> > else, this is an interesting exercise in how painful lexical binding
> > can be to work with).
>
> It's more of a philosophical issue.

Deciding whether to expose arglists for functions is, yes. That's what
I said above.

> Lexical scoping fundamentally means
> that variables don't have names: (lambda (x) x) and (lambda (y) y)
> should be indistinguishable (that's what α-renaming says, anyway).

But they're not. (equal (lambda (x) x) (lambda (y) y)) returns nil. I
don't see why pcase-call should be in the "funcall" category of being
unable to distinguish the two, rather than in the "equal" category of
being able to.

> >> So it's more like my option of returning nil, except it would return
> >> the value of a surrounding `name` variable?  That could be done, but I'm
> >> not convinced it'd be more often useful.
> >
> > I started out with a fairly explicit practical problem: parsing GCC
> > machine descriptions, which are (essentially) sexps but have made the
> > mistake of having "optional" non-final parts, and I think it would be
> > great to express that in a pcase pattern, both for the obvious reasons
> > of legibility and for some non-obvious reasons of my own.
>
> I'm sorry, I don't understand what those sexps (and their «optional"
> non-final parts») have to do with pcase's handling of `or` patterns
> where the different branches don't bind the same set of variables.

Well, maybe I'm just missing an obvious way of doing it.

> >> > disallowing the modification of name in X.
> >> That's rather hard to do (and I don't see what would be the benefit here).
> > I meant adding a cautionary note about it in the documentation, not
> > actively preventing it.
>
> So we're replacing the current "don't do that" with another "don't do
> that", neither of which is detected by the byte-compiler?

Yes. Emacs Lisp isn't free of "don't do that"s, and reducing the
"don't do that" space drastically is better than pointing out the tiny
fraction of it which remains as evidence that we shouldn't do the
reduction in the first place.

> I'd rather fix it "right" with something with a clean and
> simple semantics where the things you shouldn't do get properly flagged
> during compilation.

If you want clean and simple (lexical scoping) semantics and execute
user-provided code, you should probably call a user-provided lambda
rather than evaluating an expression in the first place?

> >> The current implementation amounts to "we should signal an error but we
> >> don't bother doing so and just warn against it in the manual".
> >> Patch welcome ;-)
> > You mean a patch that would make pcase less powerful by making what I
> > want to do impossible rather than merely difficult?
>
> The way I see it, it would make what you want to do possible because
> what you suggest would merely give meaning to programs previously invalid.
> In contrast, with the current behavior your proposal implies breaking
> backward compatibility.

I think what you mean is you'd rather break backward compatibility in
two steps rather than one, by first causing errors, then redefining
the new errors not to happen? Because if so, I totally agree.

> >> >> I don't know of a simple implementation.
> >> > Here's my better-than-nothing attempt.  I don't think that's complex;
> >> > if anything, it's too trivial.
> >> So you give it a search-based semantics.
> > I don't think the semantics are at all unclear,
>
> You misunderstood: I definitely didn't mean "unclear" when wrote "search-based".
> The semantics are definitely clear.

Saying I give it search-based semantics implies there are other
semantics I could have chosen. Semantically, all I've done is decide,
probably incorrectly, that append should be biased towards shy
matching of the fist arguments (and that it only work on proper
lists).

> >> The problem with it for me is that if we turn
> >>
> >>     `(,a ,@b)
> >>
> >> into
> >>
> >>     (append `(,a) b)
> >
> > List-final ,@ is too special, IMHO, to be turned into an (append)
> > pattern at all.
>
> So you want some ,@ to be turned into `append` and some not?

As an implementation detail, yes. Just like ` does, by the way: `(,@l)
doesn't call `append' (it doesn't even check l is a list!), so why
should it behave differently when used as a pcase pattern?

IOW, I'd like ` (as a pcase macro) to behave like ` already does:
constant-time `(,@l), linear-time `(,@l 3).

> That's exactly what I don't want, since it makes the performance
> unpredictable unless you know the details of the optimization.

I'm not opposed to the idea that there should be a form that means
"use pcase to evaluate this, but throw an error if I messed up and
complexity is worse than O(f(n))". I just don't think it should be
(pcase ...), leaving the f to be determined by a human who's allowed
to know the expected complexity of pcase patterns (currently not
mentioned in the documentation), but isn't allowed to "know the
details of the optimization".

And, again, performance of ` is unpredictable unless you know the
details of the optimization. Should we get rid of ` next?

> >> the pcase match will take a lot more time than the equivalent
> >>     `(,a . ,b)
> >> Of course, you can try and handle these "easy" cases more efficiently,
> >> but then your ,@ will sometimes be very cheap and sometimes very
> >> expensive (depending on when an optimization can be applied), which
> >> I think is a misfeature (it's for this same reason that I dislike CL
> >> keyword arguments for functions).
> > I think it's an implementation detail.
>
> I don't want implementation details to choose between O(N) and O(1).
> This is the kind of "detail" that should be chosen by the author.

All powerful matching patterns I'm aware of have horrible worst-time
complexity which is reduced to acceptable complexity for most cases,
in practice. Take `equal', for example.

(let (r s)
  (dotimes (i 100)
    (setq r (cons r r))
    (setq s (cons s s))
    (benchmark 1 `(equal ',r ',s))))

What you're asking sounds to me like the equivalent of "I want a
compression algorithm to be as good as possible, but the size of the
decompressed data has to be predictable, even if I perform recursive
decompression of the output".

> > Some reasoning about the minimum and maximum length of sequences
> > matched by pcase patterns could help ordinary pcases, too, though:
>
> Potentially, of course.  There are many options when it comes to the
> actual code generated by `pcase`.

Precisely, including ones that reduce complexity. Somewhat
unpredictably. Which is why predictable performance is a bad thing to
demand of pcase patterns.

> > (pcase '(a b c d)
> >   (`(,a ,b ,c ,d) (list a b c d)))
> >
> > could call (pcase--check-length EXPVAL 4 4) rather than calling consp
> > four times, potentially descending into expensive predicates that are
> > unnecessary.
>
> But that presumes a C implementation of `pcase--check-length` since
> (< (length X) 4) can be a very bad idea when X is a long list.

Even a Lisp implementation that isn't a wrapper around (length X)
would avoid unnecessary predicates. That is the reason I wrote
pcase--check-length rather than (<= 4 (length EXPVAL) 4)...

> > It's strange to read quotes that yo
> ...?

Well, it's strange to read quotes that you only half-deleted from your
email, Pip! (Sorry).

> > Again, I think that's a fundamental difference between us when it
> > comes to the philosophy behind pcase.  If I understand you correctly,
> > you deliberately want to limit pcase, moving away from the intuitive
> > definition of it that I gave above, because there might be a situation
> > in which people expect better performance than our limited
> > implementation can give them. Is that correct?
>
> [ The intuitive definition you gave doesn't work for most of the core
>   patterns in `pcase` (e.g. `pred`, `app`, `let`, `guard`), so while
>   it's fine as a guiding intuition, it can't be used to really define
>   the intended semantics.  ]

You mean because "pred" isn't defined as a function, and "let" is
defined differently?

Because if I'm allowed to defmacro pred, it works perfectly well:

(defmacro pred (p)
  ``(pred ,p))

(defun equalish (a b)
  (... (pcase a (`(pred ,p) (funcall p b)))

with extra hygiene being required, of course, but there's no problem here.

> No, the problem is not really one of performance, but rather one of
> defining which variables are in scope within a branch such that a given
> branch always has the same set of bindings within its scope, no matter
> how the pattern was matched, which I think gives it a cleaner and
> simpler semantics.

Yes: calling lambdas gives cleaner and simpler semantics than
evaluating forms, which gives terser code and implies precisely the
unclean (but useful) semantics I want. pcase does the latter, implying
semantics which we can implement (more easily and cleanly using
dynamic bindings, for some reason), but choose not to. I'd like to
understand why!

> [ It is also linked to a potential problem of code size explosion,
>   admittedly.  ]

Let's be clear, though, that we're not talking about an explosion in
the number of conses used to express the pcase. It's only when the
code is byte compiled that an explosion potentially happens (and it's
easily fixable with dynamic binding, but not with lexical binding, as
far as I can see).

(I think it's interesting that you can't evaluate "proper" pcase at
run time; at least, I don't see how you'd implement a pcase-dyn
function that evaluates its second argument etc. before interpreting
it. It's easy to do with dynamic binding. I think that's because our
implementation of lexical binding is too limited, but I'm not sure.)

> > I think that's a weak reason for a strong limitation, but of course
> > those are subjective questions.
>
> I don't see what strong limitation you're referring to.  Having `name`
> get the value of a surrounding `name` instead of nil seems like a very
> minor issue and definitely not a "strong limitation".

The strong limitation is "you can only add new pcase patterns if they
have predictable complexity (in code size, run time, or memory usage,
presumably)". Taken at face value, that means no ,@, nothing
equal-based, no efficient joining of two pcase cases into one... and
no conditional binding of variables.

(IMHO, (pcase X (A B) (C D)) should be equivalent to (pcase X ((or
(and (let which-alternative 0) A) (and (let which-alternative 1) C))
(case which-alternative (0 B) (1 D)))), assuming "which-alternative"
is free in A, B, C, D.)

You're right, though, that it'll usually be possible to get the right
behavior using the "bind everything to nil" idea. I hadn't understood
you to be seriously proposing we implement that, but if you do I'll
prepare a patch.

> > For example, I don't expect (pcase 9 ((* x x) x)) to work,
>
> With an appropriate (pcase-defmacro * ...) you could definitely make it
> work.

By solving polynomial roots? I think it's better to use
(pcase-defmacro * ...) for a Kleene star macro...which it turns out
isn't precisely trivial to implement with lexical scoping.

Pip

next prev parent reply	other threads:[~2020-09-02  8:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-29  9:41 bug#43100: 28.0.50; pcase not binding variables conditionally Pip Cet
2020-08-29 12:01 ` Philipp Stephani
2020-08-29 14:27   ` Pip Cet
2020-08-29 16:06     ` Stefan Monnier
2020-08-30 16:21       ` Pip Cet
2020-08-30 18:07         ` Stefan Monnier
2020-08-31 19:32           ` Pip Cet
2020-09-01  3:12             ` Stefan Monnier
2020-09-02  8:38               ` Pip Cet [this message]
2020-09-02 14:16                 ` Stefan Monnier
2020-09-05 14:36                   ` Pip Cet
2020-09-05 16:52                     ` Stefan Monnier
2021-03-02 13:38 ` Pip Cet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOqdjBcMHZOnpRscxAmPOn3PaMDOGhzaNykgnrOvBtCeWUQ0Tw@mail.gmail.com \
    --to=pipcet@gmail.com \
    --cc=43100@debbugs.gnu.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=p.stephani2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.