From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Dmitry Gutov <dmitry@gutov.dev>
Newsgroups: gmane.emacs.devel
Subject: Re: What's missing in ELisp that makes people want to use cl-lib?
Date: Tue, 14 Nov 2023 13:47:32 +0200
Message-ID: <2580712a-8411-18e7-4cd7-d3a17ed4c50e@gutov.dev>
References: <E1qyNS3-0007fw-AS@fencepost.gnu.org> <8734xetjkk.fsf@yahoo.com>
 <CALDnm52-xR9W0oWEr2aC7gGeGL970ZkOiozjkqf2AWF0nvxtgA@mail.gmail.com>
 <87cywhsrcf.fsf@yahoo.com> <87cywgx1z0.fsf@web.de>
 <83wmuowwp3.fsf@gnu.org>
 <CALDnm539MrZfnhpaKPUOyyvyowS4dO_vhm+LKobOYpYDEBDcoQ@mail.gmail.com>
 <m28r74tk7o.fsf@Pro.fritz.box> <8334xcwank.fsf@gnu.org>
 <m24jhstb7k.fsf@Pro.fritz.box>
 <f8d2bfc5-1043-57c4-6c5a-5520c765797f@gutov.dev>
 <CALDnm50UNjUHspASLt2fcN0gzyEKaJ2bCej0ogdHqhpjiZ4NxQ@mail.gmail.com>
 <320999cc-6c83-2315-0044-cc0403400af3@gutov.dev>
 <CALDnm53bJZKNmCufNTSnLXpm7VcM36GBdwZgPGZN6xqxKui++Q@mail.gmail.com>
 <9ab5d2bd-a648-cae0-a4a7-ae86be10af0f@gutov.dev>
 <87r0kuqxbf.fsf@gmail.com>
 <54e115a2-fc36-3056-a030-0dbf32416ddb@gutov.dev>
 <43f290b0-4119-597b-c89a-0fb4c7db1665@gutov.dev>
 <CALDnm51mHC3g6S2MOp+t4hTw1VcN-_0CJ1LVj8AYvvwsPCUthA@mail.gmail.com>
 <6ec0607f-3047-91d3-6a55-40e06fa002fa@gutov.dev>
 <CALDnm51v_-uQTY-nV71Qc7Mk-Up60xJ+3vmp9h9bWbCO8zn0iA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="39346"; mail-complaints-to="usenet@ciao.gmane.io"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.13.0
Cc: =?UTF-8?Q?Gerd_M=c3=b6llmann?= <gerd.moellmann@gmail.com>,
 Eli Zaretskii <eliz@gnu.org>, michael_heerdegen@web.de, emacs-devel@gnu.org
To: =?UTF-8?B?Sm/Do28gVMOhdm9yYQ==?= <joaotavora@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Nov 14 12:48:54 2023
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1r2ruX-0009tw-Gd
	for ged-emacs-devel@m.gmane-mx.org; Tue, 14 Nov 2023 12:48:54 +0100
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces@gnu.org>)
	id 1r2rte-00078l-7u; Tue, 14 Nov 2023 06:47:58 -0500
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dmitry@gutov.dev>) id 1r2rtP-00078U-41
 for emacs-devel@gnu.org; Tue, 14 Nov 2023 06:47:44 -0500
Original-Received: from wout4-smtp.messagingengine.com ([64.147.123.20])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <dmitry@gutov.dev>)
 id 1r2rtM-0006oE-AV; Tue, 14 Nov 2023 06:47:42 -0500
Original-Received: from compute5.internal (compute5.nyi.internal [10.202.2.45])
 by mailout.west.internal (Postfix) with ESMTP id 22AC73202029;
 Tue, 14 Nov 2023 06:47:37 -0500 (EST)
Original-Received: from mailfrontend2 ([10.202.2.163])
 by compute5.internal (MEProxy); Tue, 14 Nov 2023 06:47:37 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc
 :cc:content-transfer-encoding:content-type:content-type:date
 :date:from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:sender:subject:subject:to:to; s=fm2; t=
 1699962456; x=1700048856; bh=/iNOXc7nxdb8bJmFHiaF82Lwm7IHLhkDHPn
 HEpU02kw=; b=pYLY9sNcQ1cMxM7FVKMSbcDvHQUssVhFCImx81MFzSTJ0pQ/s+d
 InoByCAyp4bM4DetifbW3f3IKTSzW7Dt36SRKCYJIOquYiS1Jy41D2L+yvV1fq29
 njlTfKsjf39sx7u8LEs8i/hD6XwguZ1Utr5PNzuytx2+L8nm/Ep6oBv0j8VimkCW
 B/ThdNuVA2DhCxvCiJFABv+yGUo/5hJCXiafreRyLfhFMloaBKjnzt9OOLlvgidb
 /ASHChp1f1GLFxcYw1e7Ah/+kzeK6w4HWVxC0IrGG4vVBwwikNB6QNCKtyggYO1l
 xmapM82/sgv+Pocj95YO9H1vedaI5wI15bA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding
 :content-type:content-type:date:date:feedback-id:feedback-id
 :from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:sender:subject:subject:to:to:x-me-proxy
 :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=
 1699962456; x=1700048856; bh=/iNOXc7nxdb8bJmFHiaF82Lwm7IHLhkDHPn
 HEpU02kw=; b=3N+tAH4f/TPjm5Ri1aiToFoXuoQ5cD1LIQ55TczEOdpxQNCCCJY
 8qsw3WpPeJ/aPYcu5kQaaM/YWXUMmhLPabDyK/rH/FCBH9zefI6eVp5U/pFBJOyh
 0e85QQbFtW3nvlUpj0JSyYk+CAYNoHJHuil3zNwV9vZkfCbiJ3f5957I6o5liNzN
 CNmxo3JqFR5lV0WJgkVpMoEStR3vB8xCFGNVxDngULlnO8EnRb9kV3RlWIt8gq1y
 MWU8Hvm09m078qW2DMwZr/scDYCqxr15OkCU3nkA4GWh4R7wjO87Zo3cKT92dd2B
 NsgAwp4yhrcCx4SLvw83Xp0KpuBp9yAC4Qg==
X-ME-Sender: <xms:WF5TZVRmOOjGAA7X_dAP_yHZTH9GTeY_BUb4tmcNQ-lgw_2d3110Cg>
 <xme:WF5TZewlWQFtS-hEdbTZ9hPYJcFtVieQN9w3Wbxzz_7PTaRqJkB1m1ptQEpDfHeYq
 RD14GL_HicmsZPJJnI>
X-ME-Received: <xmr:WF5TZa2F9z8kLY26LZa5BnAs73GRHmGJotEIkYJhzvPDcEPzYEds3E4nxkDHlHQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrudefvddgfeduucetufdoteggodetrfdotf
 fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
 uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne
 cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtfeejnecuhfhrohhmpeffmhhi
 thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth
 htvghrnhepueffveeiffeugffgveejvdegteeuhfdugfehleelfeejtdelteethfdtieeg
 vddunecuffhomhgrihhnpehgihhthhhusgdrtghomhenucevlhhushhtvghrufhiiigvpe
 dtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegumhhithhrhiesghhuthhovhdruggvvh
X-ME-Proxy: <xmx:WF5TZdDE3CQz_eIJ0pYEIXrOLP-fE18ua7yfvRf8pHx6TV-sycwtfA>
 <xmx:WF5TZehw1haP3LLD-NJPTXxIw4X1g-SskF8Ig1lBeE-XZKvM6m2fhw>
 <xmx:WF5TZRofhSkqyhbVHBd5PHWDre2qiSRKWqJlQg3u41S-hTQW-q76oQ>
 <xmx:WF5TZfuN5JJ7QBQhZrD-pen_epErESkvVpOP8wDbDvCWEDxuz9jADw>
Feedback-ID: i0e71465a:Fastmail
Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue,
 14 Nov 2023 06:47:34 -0500 (EST)
Content-Language: en-US
In-Reply-To: <CALDnm51v_-uQTY-nV71Qc7Mk-Up60xJ+3vmp9h9bWbCO8zn0iA@mail.gmail.com>
Received-SPF: pass client-ip=64.147.123.20; envelope-from=dmitry@gutov.dev;
 helo=wout4-smtp.messagingengine.com
X-Spam_score_int: -67
X-Spam_score: -6.8
X-Spam_bar: ------
X-Spam_report: (-6.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-3.971,
 RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.emacs.devel:312724
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/312724>

On 14/11/2023 12:34, João Távora wrote:
> [Replying to both of your mails here]
> 
> On Tue, Nov 14, 2023 Dmitry Gutov <dmitry@gutov.dev 
> <mailto:dmitry@gutov.dev>> wrote:
> 
>  > And if the list has just 1 element (or zero?), the gap would be even
>  > larger.
>  >
>  > Only seeing a 1.6x difference here is a nice surprise, actually.
> 
> These two statements together seem contradictory.  Why would
> you be happy to see a particular factor for a given
> list length if you know the factor is unbounded in general
> for small lists?

I don't think it's unbounded, just high. And 12-element list is a nice 
size for this particular comparison. Programs using tiny list would in 
their majority be tiny as well, so it wouldn't matter.

Though of course there are exceptions: compilers, for example.

> And it's only in the small lists case.  If I pass my own predicate to
> both cl-set-difference and your best seq-difference-3
> 
> (defun myequal (a b)
>    (equal a b))
> 
> (setq cc (make-list 12 "blabla"))
> (setq list2 (make-list 12 "shooveedoowaa"))
> 
> (when nil
>    ;; (0.106080265 4 0.03849865399999963)
>    (benchmark-run 10000 (cl-set-difference cc list2 :test #'myequal))
>    ;; (0.5504704210000001 39 0.394207416)
>    (benchmark-run 10000 (seq-difference-3 cc list2 #'myequal))
> 
>     )
> 
> I get the 5.5x slowdown again (in one experiment, not shown,

Right. This is still for the small lists case. Here we suffer the 
dynamic dispatch at least twice, along with funcall indirection. Again, 
would be great to get these ratios down, but the main scenarios where I 
*did* worry about the performance of a sequence primitive, have always 
been large lists.

> I got a whopping 200x slowdown, but I can't reproduce it right now
> from Emacs -Q, maybe I had some method on some crucial seq.el generic)

That would be helpful to know as well.

>  > > This is all interesting, until one ponders what happens if an existing
>  > > seq.el user somewhere has:
>  > >
>  > > (cl-defmethod seq-contains-p ((seq my-voodoo-seq)
>  > >                                (elt (eql :secret-voodoo)) &optional 
> _tesfn)
>  > >    (invoke-voodoo-priests seq))
>  > >
>  > > making use of seq.el's support for abstract polymorphic sequences.
>  > >
>  > > With seq.el 2.24 a seq-difference operation would consider this user's
>  > > method, with seq.el 2.24.dmitry (i.e. your fast seq-difference-3) it
>  > > simply won't.  This user's code is clearly broken.
>  > >
>  > > But was the user allowed to do that in the first place?  If not,
>  > > why is seq-contains-p a public generic function?
>  >
>  > It was allowed, and yes, indeed, it's a breaking change. So we should at
>  > least examine the existing public code out there and see whether such
>  > overrides exist.
> 
> Not so easy, I'm afraid.  seq-contains-p is not the only such generic,
> it's just one I happened to spot first.  seq-do is a generic and in that
> group mentioned in the sparse seq.el documentation about mandatory
> customization, meaning presumably it is essential for custom sequences.
> 
> See https://github.com/search?q=%22%28cl-defmethod+seq-do%22&type=code 
> <https://github.com/search?q=%22%28cl-defmethod+seq-do%22&type=code> for
> a list of custom sequences that use it.
> 
> Your seq-difference-3 doesn't call seq-do like the original seq-difference
> does, so the set difference operations for those custom sequences might
> well be broken.  What's even more problematic is that it skips seq-do
> entirely for certain predicates and not entirely for certain other
> predicates.

The way I understand this, is any new sequence type has to implement 
seq-do. As soon as that happens, a lot of (probably all) sequence 
functions in seq.el start working on that type.

But the author is also free to provide specialized implementations for 
individual functions (usually for better performance) -- and those 
implementations don't have to use seq-do. Indeed, in most cases the 
optimization would involve cutting out the overhead that seq-do brings 
in the general case.

> It's still bizarre.  Here's how I think: if an application is customizing
> the entry point directly, why even call the entry point?  OK, so what if
> a library is customizing the entry point?   It's presumably because that
> library provides a data type for applications to instantiate and use.  But
> if the library does that, all calling guarantees of other generic
> functions are lost for, say, user method for subtypes of that data type
> (or :around, or :after, etc).  So it's also nonsensical.  The only think
> where it _could_ make a shred of sense is in a kind of "private library" 
> where
> the application controls both the data type and knows exactly the shortcuts
> taken.  But then "private library" is an oxymoron.

An application shouldn't be customizing separate entry points. Separate 
libs (providing new data types) can do that. E.g. if we have a lib 
providing lazy sequences, it can provide its own set of implementations 
for seq.el. And when doing that, make sure the resulting behavior is 
consistent between functions.

I haven't though much about the interaction with :around, :after, etc. 
Maybe they'll be fine, or maybe it would be better to avoid it. Look 
fine on the superficial level, though.

> Oh, don't get me wrong: cl-lib.el's implementation details are not pretty
> and not easy, likely typical library code (though i've seen much much
> worse).  What's fundamentally easier about optimizing cl-lib is that the
> contract it offers is much, much more well specified.
> 
> Possibly because it comes as a result of careful design in committees of
> very capable programmers in the 90's, before the CL winter, that were
> already on to this class of difficulties and wanted to make a common
> interface.
> > The interface they made for sequences is not fully generic, but there
> _are_ a lot of different implementations for that interface, each CL
> compiler has  one.  Most of the tricks, like what I did to optimize
> cl-{n}set-difference  are standard stuff in the CL world.   There are even
> reference LOOP implementations (likely much more performant than ours,
> though possibly not compatible with some non-standard edge cases our
> cl-loop has of which I know a couple).

I do believe it's helpful to have it around.

>  > > The case in the CL world against generic functions' performance
>  > > is often not that the dispatch is slow, but that
>  > > employing them too liberally makes framework optimization
>  > > hard, because you restrict yourself from optimization
>  > > opportunities.
>  >
>  > I don't know if that's a major issue: just like cl-some has different
>  > branches, seq-some could have three different definitions. The
>  > approaches to optimization seem transferable, more or less.
> 
> Seq-some also calls into a lot of user customizable code, so it'll
> suffer from the same class of problems.  Say, are you going to skip
> the seq-do generic sometimes there as well?

I believe so.

>  > I believe the actual value it provides is a list of implementations that
>  > are quite compact and as such easy to fix/extend/modify. And if cl-lib
>  > is handy for someone with CL experience, I'll guess that seq.el rings
>  > more familiar to the Clojure users.
> 
> Maybe in the superficial user interface, because of names etc.  Or
> are you saying seq.el interface is extracted from Clojure's standard much
> like cl-lib is from CL standard?  (Do Clojure generic functions work like
> ours?)  Then by all means we should rush to study that contract closely,
> to know what we can and can't do as optimizers.

It looks very much inspired, both in naming and in the implementation 
approach. But it's not a carbon copy (much farther from it than in 
cl-lib's example), and our VM is also quite different from JVM in 
performance characteristics.

If someone were to write up a direct comparison, that would be great, of 
course.