On Tue, Nov 14, 2023 at 12:41 AM Dmitry Gutov <dmitry@gutov.dev> wrote:
On 13/11/2023 04:43, Dmitry Gutov wrote:
>>> The case with longer lists and other data structures should be
>>> possible to improve, though. As long as the type dispatch only happens
>>> once per sequence, and not for each element.
>>
>> Maybe it's possible.  But there are two things here: first, you need
>> non-destructive versions of things in seq.el, because consing is always
>> a killer.  Second, the generic function interface means the typical
>> shortcuts I applied in cl-lib.el are difficult.  Maybe even impossible
>> without breaking the current contract?  I don't know the contract well
>> enough to tell.  At any rate seems like non-trivial work, but I'm happy
>> if someone can give it a shot.
>
> I hope someone will. And agree about destructive versions.

Here's an experimental patch that makes seq-difference about as fast as
your new improved non-destructive cl-set-difference. And some notes.

This is all interesting, until one ponders what happens if an existing
seq.el user somewhere has:

(cl-defmethod seq-contains-p ((seq my-voodoo-seq)
                              (elt (eql :secret-voodoo)) &optional _tesfn)
  (invoke-voodoo-priests seq))

making use of seq.el's support for abstract polymorphic sequences.

With seq.el 2.24 a seq-difference operation would consider this user's
method, with seq.el 2.24.dmitry (i.e. your fast seq-difference-3) it 
simply won't.  This user's code is clearly broken.

But was the user allowed to do that in the first place?  If not,
why is seq-contains-p a public generic function?

Generic function protocols aren't just a bunch of generic functions 
thrown together, they're also precise documentation as to how/when
the generics are called by the framework and what the user may
add to them, desirably making common things easy, difficult things 
possible, and mistakes hard.  Normally the framework that calls 
into generics isn't exposed to the user customizations, i.e. 
it is made of regular defuns.  Of course, you seem to know this 
as xref.el has such a practice.  But in seq.el, almost everything 
is a generic function, even the entry points, somewhat bizarrely.

What I meant before is that these are non-trivial questions that
must be answered when embarking on these optimizations of seq.el.

So when I say it's non-trivial to optimize, it's not because of
figuring out if seq-filter or seq-reduce is better, or which shortcut
is faster, or if dispatch takes time, it's because if you have made 
most everything public and customizable, you have offered an 
extremely rich contract to users, so taking any shortcuts is
much more likely break it. 

cl-lib.el is easy to optimize because the contract it offers
is mostly clear (it's straight out of the CL standard).

The case in the CL world against generic functions' performance 
is often not that the dispatch is slow, but that 
employing them too liberally makes framework optimization 
hard, because you restrict yourself from optimization 
opportunities.

seq.el is 10 years old.  Are people making custom sequences?
To what degree?  In any case, I think a tight contract should 
clearly be written down as soon as possible, preferably with all 
optimization opportunities like the ones you're making 
planned out ahead first.  Maybe demote a bunch of these generic
functions to defuns, and make them internal.

Also as advice future set.el designers, do not just make
everything a generic fucntion, think in terms of protocols.

João