From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?UTF-8?B?Sm/Do28gVMOhdm9yYQ==?= Newsgroups: gmane.emacs.devel Subject: Re: What's missing in ELisp that makes people want to use cl-lib? Date: Tue, 14 Nov 2023 10:34:57 +0000 Message-ID: References: <8734xetjkk.fsf@yahoo.com> <87cywhsrcf.fsf@yahoo.com> <87cywgx1z0.fsf@web.de> <83wmuowwp3.fsf@gnu.org> <8334xcwank.fsf@gnu.org> <320999cc-6c83-2315-0044-cc0403400af3@gutov.dev> <9ab5d2bd-a648-cae0-a4a7-ae86be10af0f@gutov.dev> <87r0kuqxbf.fsf@gmail.com> <54e115a2-fc36-3056-a030-0dbf32416ddb@gutov.dev> <43f290b0-4119-597b-c89a-0fb4c7db1665@gutov.dev> <6ec0607f-3047-91d3-6a55-40e06fa002fa@gutov.dev> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000002fdf1a060a1a4c8e" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8482"; mail-complaints-to="usenet@ciao.gmane.io" Cc: =?UTF-8?Q?Gerd_M=C3=B6llmann?= , Eli Zaretskii , michael_heerdegen@web.de, emacs-devel@gnu.org To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Nov 14 11:33:14 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1r2qjK-000228-FT for ged-emacs-devel@m.gmane-mx.org; Tue, 14 Nov 2023 11:33:14 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r2qiJ-0000BM-8o; Tue, 14 Nov 2023 05:32:11 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r2qiH-0000Aa-P7 for emacs-devel@gnu.org; Tue, 14 Nov 2023 05:32:09 -0500 Original-Received: from mail-lf1-x12f.google.com ([2a00:1450:4864:20::12f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r2qiE-0000i8-Vu; Tue, 14 Nov 2023 05:32:09 -0500 Original-Received: by mail-lf1-x12f.google.com with SMTP id 2adb3069b0e04-507a0907896so7597300e87.2; Tue, 14 Nov 2023 02:32:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699957924; x=1700562724; darn=gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=rgcFfrRByCgvRPXaT0jfc0Aqnu+tTMuobjV8Fr/OseY=; b=MtFdOM423XClU68mjv8XRT6wC5pGkgC0lEqP9jnkzq5LbTnSQlNgtwk09KKbkSGZI9 iXLVjw1NWVpuEiZfo4E/ofY3DdgBxgph+QHO2QYXKzC6jzM+ryF64d/ITU3g+kg6po+N UMMa3pFMSian23bHPwnn8NoKN4Gm6McvxTYrYaesua1rA7f25itA++otjJcfUgLiLiOp DoIu/SIH6lfHbRHjyVIuQ2AeAS274auevxxx8fBq2gh+U4JHmbk8TTXL/NRXXDaqVvQG ORLSDWWxXgPFWV1mwo5SGVKjzEHnBSOKUGybd2wO8lBCXZEUhF6HU8+6p+QFc0W314JZ cvxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699957924; x=1700562724; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rgcFfrRByCgvRPXaT0jfc0Aqnu+tTMuobjV8Fr/OseY=; b=uJR219UbXfhorC3Nc/nQEXUf7oZixt0iIwsS5ntPtaki59kPNh2hS7cVS1XFo0hKCQ GbQChIqCTDNHAqPelw8VWCrzbA16R4+l8QnVPsFCt4JAu7unipo18NkMi/qJ9IMlfc/G YhXuXdEZ35SxIOeQj6SugdHaK/m7s/pS8MTOHNU78YtRzkq9viHiFR7swa/w4or15Sez QyQkAo/ENAxS+5DHOH91ss5mWc+VH8nvdUefI+Ozo/kmKklXO1rLT0c3c4yIo8qudn2M z3lV0veSpjvTCEoBdOIZGIhEizQ+euK5zeLS3j+QLWTFP14oZlLsHmxtpIQPexFmqkDb q32Q== X-Gm-Message-State: AOJu0YxJ0ThG7QcPE9cCZnC2ZGeK/6MqbDqIl9QQbLNV3KS6ebHhgqEQ grOccPYngNfLgj7X/Qf6Mc90SKhVduZxnOIk0lcb2Yr+32QhhQ== X-Google-Smtp-Source: AGHT+IG9fSQh3+7yV0w/reDAzQm1pvteVHayGET9X0yE1U2UTBuVNFFY64jyMckaLiBpdun2d7374m5V2gc+PAM1Hww= X-Received: by 2002:a19:520a:0:b0:509:44d5:18e5 with SMTP id m10-20020a19520a000000b0050944d518e5mr5869151lfb.63.1699957923895; Tue, 14 Nov 2023 02:32:03 -0800 (PST) In-Reply-To: <6ec0607f-3047-91d3-6a55-40e06fa002fa@gutov.dev> Received-SPF: pass client-ip=2a00:1450:4864:20::12f; envelope-from=joaotavora@gmail.com; helo=mail-lf1-x12f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:312719 Archived-At: --0000000000002fdf1a060a1a4c8e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable [Replying to both of your mails here] On Tue, Nov 14, 2023 Dmitry Gutov wrote: > And if the list has just 1 element (or zero?), the gap would be even > larger. > > Only seeing a 1.6x difference here is a nice surprise, actually. These two statements together seem contradictory. Why would you be happy to see a particular factor for a given list length if you know the factor is unbounded in general for small lists? And it's only in the small lists case. If I pass my own predicate to both cl-set-difference and your best seq-difference-3 (defun myequal (a b) (equal a b)) (setq cc (make-list 12 "blabla")) (setq list2 (make-list 12 "shooveedoowaa")) (when nil ;; (0.106080265 4 0.03849865399999963) (benchmark-run 10000 (cl-set-difference cc list2 :test #'myequal)) ;; (0.5504704210000001 39 0.394207416) (benchmark-run 10000 (seq-difference-3 cc list2 #'myequal)) ) I get the 5.5x slowdown again (in one experiment, not shown, I got a whopping 200x slowdown, but I can't reproduce it right now from Emacs -Q, maybe I had some method on some crucial seq.el generic) > > This is all interesting, until one ponders what happens if an existing > > seq.el user somewhere has: > > > > (cl-defmethod seq-contains-p ((seq my-voodoo-seq) > > (elt (eql :secret-voodoo)) &optional _tesfn) > > (invoke-voodoo-priests seq)) > > > > making use of seq.el's support for abstract polymorphic sequences. > > > > With seq.el 2.24 a seq-difference operation would consider this user's > > method, with seq.el 2.24.dmitry (i.e. your fast seq-difference-3) it > > simply won't. This user's code is clearly broken. > > > > But was the user allowed to do that in the first place? If not, > > why is seq-contains-p a public generic function? > > It was allowed, and yes, indeed, it's a breaking change. So we should at > least examine the existing public code out there and see whether such > overrides exist. Not so easy, I'm afraid. seq-contains-p is not the only such generic, it's just one I happened to spot first. seq-do is a generic and in that group mentioned in the sparse seq.el documentation about mandatory customization, meaning presumably it is essential for custom sequences. See https://github.com/search?q=3D%22%28cl-defmethod+seq-do%22&type=3Dcode = for a list of custom sequences that use it. Your seq-difference-3 doesn't call seq-do like the original seq-difference does, so the set difference operations for those custom sequences might well be broken. What's even more problematic is that it skips seq-do entirely for certain predicates and not entirely for certain other predicates. And this is the type of headache that make-everything-a-gf designs like seq.el's brings to developers trying to optimize it. > > Generic function protocols aren't just a bunch of generic functions > > thrown together, they're also precise documentation as to how/when > > the generics are called by the framework and what the user may > > add to them, desirably making common things easy, difficult things > > possible, and mistakes hard. Normally the framework that calls > > into generics isn't exposed to the user customizations, i.e. > > it is made of regular defuns. Of course, you seem to know this > > as xref.el has such a practice. But in seq.el, almost everything > > is a generic function, even the entry points, somewhat bizarrely. > > I'm not sure that is a problem in itself, except in the cases like I've > described, where the type dispatch ends up happening N times instead of > just once. It's still bizarre. Here's how I think: if an application is customizing the entry point directly, why even call the entry point? OK, so what if a library is customizing the entry point? It's presumably because that library provides a data type for applications to instantiate and use. But if the library does that, all calling guarantees of other generic functions are lost for, say, user method for subtypes of that data type (or :around, or :after, etc). So it's also nonsensical. The only think where it _could_ make a shred of sense is in a kind of "private library" where the application controls both the data type and knows exactly the shortcuts taken. But then "private library" is an oxymoron. > > What I meant before is that these are non-trivial questions that > > must be answered when embarking on these optimizations of seq.el. > > > > So when I say it's non-trivial to optimize, it's not because of > > figuring out if seq-filter or seq-reduce is better, or which shortcut > > is faster, or if dispatch takes time, it's because if you have made > > most everything public and customizable, you have offered an > > extremely rich contract to users, so taking any shortcuts is > > much more likely break it. > > > > cl-lib.el is easy to optimize because the contract it offers > > is mostly clear (it's straight out of the CL standard). > > cl-lib is pretty complex and alien from somebody with zero experience in > it (and CL), and it's a lot more code, with different macros. So I'm not > sure I'd call cl-lib easier overall. Oh, don't get me wrong: cl-lib.el's implementation details are not pretty and not easy, likely typical library code (though i've seen much much worse). What's fundamentally easier about optimizing cl-lib is that the contract it offers is much, much more well specified. Possibly because it comes as a result of careful design in committees of very capable programmers in the 90's, before the CL winter, that were already on to this class of difficulties and wanted to make a common interface. The interface they made for sequences is not fully generic, but there _are_ a lot of different implementations for that interface, each CL compiler has one. Most of the tricks, like what I did to optimize cl-{n}set-difference are standard stuff in the CL world. There are even reference LOOP implementations (likely much more performant than ours, though possibly not compatible with some non-standard edge cases our cl-loop has of which I know a couple). > > The case in the CL world against generic functions' performance > > is often not that the dispatch is slow, but that > > employing them too liberally makes framework optimization > > hard, because you restrict yourself from optimization > > opportunities. > > I don't know if that's a major issue: just like cl-some has different > branches, seq-some could have three different definitions. The > approaches to optimization seem transferable, more or less. Seq-some also calls into a lot of user customizable code, so it'll suffer from the same class of problems. Say, are you going to skip the seq-do generic sometimes there as well? Maybe. Depends on what contract we want to uphold, and we have no idea. At least I don't. > I believe the actual value it provides is a list of implementations that > are quite compact and as such easy to fix/extend/modify. And if cl-lib > is handy for someone with CL experience, I'll guess that seq.el rings > more familiar to the Clojure users. Maybe in the superficial user interface, because of names etc. Or are you saying seq.el interface is extracted from Clojure's standard much like cl-lib is from CL standard? (Do Clojure generic functions work like ours?) Then by all means we should rush to study that contract closely, to know what we can and can't do as optimizers. Jo=C3=A3o --0000000000002fdf1a060a1a4c8e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
[Replying to both of your mails here]
<= br>
On Tue, Nov 14, 2023 Dmitry Gutov <dmitry@gutov.dev> wrote:

> And if the list ha= s just 1 element (or zero?), the gap would be even
> larger.
><= br>> Only seeing a 1.6x difference here is a nice surprise, actually.
These two statements together seem contradictory.=C2=A0 Why would
y= ou be happy to see a particular factor for a given
list length if you kn= ow the factor is unbounded in general
for small lists?

And it'= ;s only in the small lists case.=C2=A0 If I pass my own predicate to=C2=A0<= /div>
both cl-set-difference and your best seq-difference-3

(defun myequal (a b)
=C2=A0 (equal a b))

(setq cc (make-list 1= 2 "blabla"))
(setq list2 (make-list 12 "shooveedoowaa&quo= t;))

(when nil
=C2=A0 ;; (0.106080265 4 0.03849865399999963)
= =C2=A0 (benchmark-run 10000 (cl-set-difference cc list2 :test #'myequal= ))
=C2=A0 ;; (0.5504704210000001 39 0.394207416)
=C2=A0 (benchmark-ru= n 10000 (seq-difference-3 cc list2 #'myequal))

=C2=A0 =C2=A0)

I get the 5.5x slowdown again (in one experiment, no= t shown,
I got a whopping 200x slowdown, but I can't reproduc= e it right now
from Emacs -Q, maybe I had some method on some cru= cial seq.el generic)

> > This is all int= eresting, until one ponders what happens if an existing
> >= seq.el user somewhere has:
> >
> > (cl-defmethod seq-con= tains-p ((seq my-voodoo-seq)
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0(elt (eql :secret-voodoo)) &optional _tesfn)
> > =C2=A0 = =C2=A0(invoke-voodoo-priests seq))
> >
> > making use of = seq.el's support for abstract polymorphic sequences.
> >
&g= t; > With seq.el 2.24 a seq-difference operation would consider this use= r's
> > method, with seq.el 2.24.dmitry (i.e. your fast seq-di= fference-3) it
> > simply won't.=C2=A0 This user's code is= clearly broken.
> >
> > But was the user allowed to do t= hat in the first place?=C2=A0 If not,
> > why is seq-contains-p a = public generic function?
>
> It was allowed, and yes, indeed, i= t's a breaking change. So we should at
> least examine the existi= ng public code out there and see whether such
> overrides exist.

Not so easy, I'm afraid.=C2=A0 seq-contains-p is n= ot the only such generic,
it's just one I happened to spot fi= rst.=C2=A0 seq-do is a generic and in that
group mentioned in the= sparse seq.el documentation about mandatory
customization, meani= ng presumably it is essential for custom sequences.

See https://github.com/search?q=3D%22%28cl-defmethod+seq-do%2= 2&type=3Dcode for
a list of custom sequences that use it.=

Your seq-difference-3 doesn't call seq-do= like the original seq-difference
does, so the set difference ope= rations for those custom sequences might=C2=A0
well be broken.=C2= =A0 What's even more problematic is that it skips seq-do
enti= rely for certain predicates and not entirely for certain other=C2=A0
<= div>predicates.

And this is the type of headache t= hat make-everything-a-gf designs=C2=A0
like seq.el's brings t= o developers trying to optimize it.

> > Generic fu= nction protocols aren't just a bunch of generic functions
> > = thrown together, they're also precise documentation as to how/when
&= gt; > the generics are called by the framework and what the user may
= > > add to them, desirably making common things easy, difficult thing= s
> > possible, and mistakes hard.=C2=A0 Normally the framework th= at calls
> > into generics isn't exposed to the user customiza= tions, i.e.
> > it is made of regular defuns.=C2=A0 Of course, you= seem to know this
> > as xref.el has such a practice.=C2=A0 But i= n seq.el, almost everything
> > is a generic function, even the en= try points, somewhat bizarrely.
>
> I'm not sure that is a = problem in itself, except in the cases like I've
> described, whe= re the type dispatch ends up happening N times instead of
> just once= .

It's still bizarre.=C2=A0 Here's how I t= hink: if an application is customizing
the entry point directly, = why even call the entry point?=C2=A0 OK, so what if=C2=A0
a libra= ry is customizing the entry point? =C2=A0 It's presumably because that<= /div>
library provides a data type for applications to instantiate and = use.=C2=A0 But
if the library does that, all calling guarantees o= f other generic=C2=A0
functions are lost for, say, user method fo= r subtypes of that data type=C2=A0
(or :around, or :after, etc).= =C2=A0 So it's also nonsensical.=C2=A0 The only think=C2=A0
w= here it _could_ make a shred of sense is in a kind of "private library= " where
the application controls both the data type and know= s exactly the shortcuts
taken.=C2=A0 But then "private libra= ry" is an oxymoron. =C2=A0

> > Wha= t I meant before is that these are non-trivial questions that
>= ; > must be answered when embarking on these optimizations of seq.el.> >
> > So when I say it's non-trivial to optimize, it&= #39;s not because of
> > figuring out if seq-filter or seq-reduce = is better, or which shortcut
> > is faster, or if dispatch takes t= ime, it's because if you have made
> > most everything public = and customizable, you have offered an
> > extremely rich contract = to users, so taking any shortcuts is
> > much more likely break it= .
> >
> > cl-lib.el is easy to optimize because the contr= act it offers
> > is mostly clear (it's straight out of the CL= standard).
>
> cl-lib is pretty complex and alien from somebod= y with zero experience in
> it (and CL), and it's a lot more code= , with different macros. So I'm not
> sure I'd call cl-lib ea= sier overall.

Oh, don't get me wrong: cl-lib.e= l's implementation details are not pretty=C2=A0
and not easy,= likely typical library code (though i've seen much much=C2=A0
worse).=C2=A0 What's fundamentally easier about optimizing cl-lib is = that the
contract it offers is much, much more well specified.=C2= =A0

Possibly because it comes as a resul= t of careful design in committees of=C2=A0
very capable programme= rs in the 90's, before the CL winter, that were=C2=A0
already= on to this class of difficulties and wanted to make a common=C2=A0
interface.=C2=A0

The interface they = made for sequences is not fully generic, but there=C2=A0
_are_ a = lot of different implementations for that interface, each CL=C2=A0
compiler has=C2=A0 one.=C2=A0 Most of the tricks, like what I did to opti= mize=C2=A0
cl-{n}set-difference=C2=A0 are standard stuff in the C= L world.=C2=A0=C2=A0 There are even=C2=A0
reference LOOP implemen= tations (likely much more performant than ours,=C2=A0
though poss= ibly not compatible with some non-standard edge cases our=C2=A0
c= l-loop has of which I know a couple).

> >= ; The case in the CL world against generic functions' performance
&g= t; > is often not that the dispatch is slow, but that
> > emplo= ying them too liberally makes framework optimization
> > hard, bec= ause you restrict yourself from optimization
> > opportunities.>
> I don't know if that's a major issue: just like cl-so= me has different
> branches, seq-some could have three different defi= nitions. The
> approaches to optimization seem transferable, more or = less.

Seq-some also calls into a lot of user customizable cod= e, so it'll=C2=A0
suffer from the same class of problems.=C2= =A0 Say, are you going to skip
the seq-do generic sometimes = there as well? Maybe.=C2=A0 Depends on what=C2=A0
contract we wan= t to uphold, and we have no idea.=C2=A0 At least I don't.

> I believe the actual value it provides is a list of im= plementations that
> are quite compact and as such easy to fix/extend= /modify. And if cl-lib
> is handy for someone with CL experience, I&#= 39;ll guess that seq.el rings
> more familiar to the Clojure users.

Maybe in the superficial user interface, because of= names etc.=C2=A0 Or=C2=A0
are you saying seq.el interface is ext= racted from Clojure's standard much
like cl-lib is from CL st= andard?=C2=A0 (Do Clojure generic functions work like=C2=A0
ours?= )=C2=A0 Then by all means we should rush to study that contract closely,=C2= =A0
to know what we can and can't do as optimizers.
=

Jo=C3=A3o

--0000000000002fdf1a060a1a4c8e--