From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: dmcc2 Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] add compiled regexp primitive lisp object Date: Wed, 31 Jul 2024 22:33:54 +0000 Message-ID: References: <87ttg7qbgv.fsf@posteo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37766"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Danny McClanahan , "emacs-devel@gnu.org" To: Philip Kaludercic Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Aug 01 00:34:55 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sZHuJ-0009eO-AF for ged-emacs-devel@m.gmane-mx.org; Thu, 01 Aug 2024 00:34:55 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sZHtZ-0007hI-7M; Wed, 31 Jul 2024 18:34:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sZHtX-0007gt-D3 for emacs-devel@gnu.org; Wed, 31 Jul 2024 18:34:07 -0400 Original-Received: from mail-40134.protonmail.ch ([185.70.40.134]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sZHtV-0005FX-BD for emacs-devel@gnu.org; Wed, 31 Jul 2024 18:34:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pm.me; s=protonmail3; t=1722465241; x=1722724441; bh=hKeLyufNoeANK+ViicKnKMw5lhRXgZrXrmng84Jr7XA=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=PRa1Js1PXnSNF0xQaK7nhWOOH4VWJPk8BS8qMoeOhjGT9S11F1G6SAzbspie6QSe1 M6ovZxJ0KGTom8T6p2Ft2IeieKqAIZcpLROKJiramM/9MTZXXGQbKVcu4gQhc7cKB1 9K02ZUpQxv09H3Bytg+O4V+jYXjTQ/RpjC/alu4b6nuOOb+pwSo2nagLI/RJhMQKla kB2C6ki4TyQLRm80zPla/SokEW4OWCaUDta/UfII9Ox1rYY2FMLbPXVgr2grHzMkAd qxC01Scu7E6fZ7C6Z65vkUoMXDP4zkSbly/1euSubpJYyLGUgUx8NsgZUfKM5susOq cZ7BiV8rnrDMg== In-Reply-To: <87ttg7qbgv.fsf@posteo.net> Feedback-ID: 25111224:user:proton X-Pm-Message-ID: fcea137b7fafb5d4f3a1194565f4c2ec9551ae79 Received-SPF: pass client-ip=185.70.40.134; envelope-from=algorithmicextremism@pm.me; helo=mail-40134.protonmail.ch X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:322235 Archived-At: > On Tuesday, July 30th, 2024 at 09:02, Philip Kaludercic wrote: >=20 > No comments on the patch from me, I am just curious, did you notice any > performance improvements? Or is this just cleaning up the codebase? >=20 > -- > Philip Kaludercic on peregrine I failed to provide context: very reasonable question! ^_^ This was spurred= by a discussion from the day before on how to introduce a lisp-level API f= or composing search patterns (https://lists.gnu.org/archive/html/emacs-deve= l/2024-07/msg01201.html), where I concluded that codifying compiled regexps= into a lisp object would be a useful first step towards understanding the = tradeoffs of introducing other matching logic beyond regex-emacs.c. I recei= ved a reply (https://lists.gnu.org/archive/html/emacs-devel/2024-07/msg0120= 3.html) indicating that patches would be the appropriate next step, and the= n got to work. I was incredibly pleased about how delightful and straightfo= rward it was to create this first draft and wanted to share progress, but d= idn't think further than that before falling asleep ^_^! (btw, the pdumper API is incredibly cool and much less complex than I expec= ted.) I think a useful prototype of this workstream would involve: (1) add new Lisp_Regexp primitive object constructed via `make-regexp' (thi= s patch; done), (2) store match-data in the Lisp_Regexp instead of a thread-local (done loc= ally) & extend match data accessors like `match-data' to extract from an op= tional Lisp_Regexp arg (the way `match-string' accepts an optional string a= rg), (3) add new Lisp_Match primitive object (or maybe just use a list for now) = for match functions to write results into instead of mutating the Lisp_Rege= xp match-data (I believe this will make regexp matching entirely reentrant/= thread-safe) & extend match data accessors to accept Lisp_Match as well. At that point, I am guessing it will be relatively easy to construct a benc= hmark that produces a very clear speedup (construct 100 random regexps and = search them in a loop) and demonstrably avoids recompiling via a profile ou= tput. There are also likely to be benchmarks more representative of typical= emacs workload, which I would be delighted to receive suggestions for. I think the next steps are clear enough, so I'm planning to ping this list = again when I have a working prototype achieving such a benchmark. Since the= inline diff seemed ok this time, I will also provide an inline diff for th= at unless the diff exceeds +1000 lines (not expected), in which case I will= attach a patch file. Thanks, Danny