On Tue, Nov 14, 2023 at 12:41 AM Dmitry Gutov <dmitry@gutov.dev> wrote:

On 13/11/2023 04:43, Dmitry Gutov wrote:
>>> The case with longer lists and other data structures should be
>>> possible to improve, though. As long as the type dispatch only happens
>>> once per sequence, and not for each element.
>>
>> Maybe it's possible. But there are two things here: first, you need
>> non-destructive versions of things in seq.el, because consing is always
>> a killer. Second, the generic function interface means the typical
>> shortcuts I applied in cl-lib.el are difficult. Maybe even impossible
>> without breaking the current contract? I don't know the contract well
>> enough to tell. At any rate seems like non-trivial work, but I'm happy
>> if someone can give it a shot.
>
> I hope someone will. And agree about destructive versions.

Here's an experimental patch that makes seq-difference about as fast as
your new improved non-destructive cl-set-difference. And some notes.

This is all interesting, until one ponders what happens if an existing

seq.el user somewhere has:

(cl-defmethod seq-contains-p ((seq my-voodoo-seq)
(elt (eql :secret-voodoo)) &optional _tesfn)

(invoke-voodoo-priests seq))

making use of seq.el's support for abstract polymorphic sequences.

With seq.el 2.24 a seq-difference operation would consider this user's

method, with seq.el 2.24.dmitry (i.e. your fast seq-difference-3) it

simply won't. This user's code is clearly broken.

But was the user allowed to do that in the first place? If not,

why is seq-contains-p a public generic function?

Generic function protocols aren't just a bunch of generic functions

thrown together, they're also precise documentation as to how/when

the generics are called by the framework and what the user may

add to them, desirably making common things easy, difficult things

possible, and mistakes hard. Normally the framework that calls

into generics isn't exposed to the user customizations, i.e.

it is made of regular defuns. Of course, you seem to know this

as xref.el has such a practice. But in seq.el, almost everything

is a generic function, even the entry points, somewhat bizarrely.

What I meant before is that these are non-trivial questions that

must be answered when embarking on these optimizations of seq.el.

So when I say it's non-trivial to optimize, it's not because of

figuring out if seq-filter or seq-reduce is better, or which shortcut

is faster, or if dispatch takes time, it's because if you have made

most everything public and customizable, you have offered an

extremely rich contract to users, so taking any shortcuts is

much more likely break it.

cl-lib.el is easy to optimize because the contract it offers

is mostly clear (it's straight out of the CL standard).

The case in the CL world against generic functions' performance

is often not that the dispatch is slow, but that

employing them too liberally makes framework optimization

hard, because you restrict yourself from optimization

opportunities.

seq.el is 10 years old. Are people making custom sequences?

To what degree? In any case, I think a tight contract should

clearly be written down as soon as possible, preferably with all

optimization opportunities like the ones you're making

planned out ahead first. Maybe demote a bunch of these generic

functions to defuns, and make them internal.

Also as advice future set.el designers, do not just make

everything a generic fucntion, think in terms of protocols.

João