From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Andrea Corallo Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] add compiled regexp primitive lisp object Date: Thu, 01 Aug 2024 04:30:14 -0400 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13013"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: "emacs-devel@gnu.org" To: Danny McClanahan Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Aug 01 10:31:23 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sZRDX-00038M-Ay for ged-emacs-devel@m.gmane-mx.org; Thu, 01 Aug 2024 10:31:23 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sZRCb-0002S4-TA; Thu, 01 Aug 2024 04:30:25 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sZRCY-0002RN-CV for emacs-devel@gnu.org; Thu, 01 Aug 2024 04:30:23 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sZRCQ-0001U0-PY; Thu, 01 Aug 2024 04:30:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=b8mVk3ep09tRJjQCej2EBuVGDpyP/xvTMuYaGI1sfhw=; b=fF/yhV4fhub9bSyXmYu8 gG5YXghP6ffRQ67jot31Cl8ZzxIsT7WYqVGmZj/1Lajg3jxzxhCsZ3xDMjGpoo0kzvVfEj41Av2H8 0eWgdC4uxOaOxgc3wYTMOMrYa7E1T4o9C82Ayi7JSBSZ2KqpUDdtt75gwRpHDTUeY6F9/g7FdQjjK rqei/S34HpgzLpJnL/YKU3Y4YoZptPZCdUo2lRdqvJYYDjlWUjhM3caTrvI2qDYet6Vsmmn6+4TIR VpaXt18BiqYYOoyIA/GhAfMe66wpjMbcH8KJeFpl7vcbXeQE64MV5geE/YSksTwS0rYiX+4rRynhs KM/qM7DC2y/X7Q==; Original-Received: from acorallo by fencepost.gnu.org with local (Exim 4.90_1) (envelope-from ) id 1sZRCQ-0003Hn-7m; Thu, 01 Aug 2024 04:30:14 -0400 In-Reply-To: (Danny McClanahan's message of "Tue, 30 Jul 2024 05:08:28 +0000") X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:322255 Archived-At: Danny McClanahan writes: > This is a first attempt at a lisp-level API for explicit regexp > compilation. I have provided the entire diff inline in this email > under the impression that this will make it easier to discuss the > specifics--I do apologize if diffs above a certain size should > always be attached as patch files in the future. > > The result of this change is that pre-compiled regexp objects constructed by > `make-regexp' will have the lifetime of standard lisp objects, instead of > being potentially invalidated and re-compiled upon every call to `string-match'. > > In particular, this involves the following changes: > - add PVEC_REGEXP case to lisp.h and struct Lisp_Regexp pseudovector type > containing the fields currently stored in struct regexp_cache > - add syms_of_regexp() lisp exports to regex-emacs.c, > with make-regexp and regexpp functions > - modify all methods in search.c to accept a Lisp_Regexp as well as a string > - add src/regex-emacs.h to dmpstruct_headers in Makefile.in > - make Lisp_Regexp purecopyable and pdumpable > > Finally, it modifies a few variables in lisp/image.el to store > compiled regexp objects instead of raw strings. Since image.el is loaded into > the bootstrap image, I believe this demonstrates that the compiled regexp > objects are successfully pdumpable. > > I have taken special care to avoid modifying the existing string-based > implicitly-caching logic at all, so this should not break any C-level logic. > Notably, if compiling with --enable-checking, > (re--describe-compiled (make-regexp "asdf")) produces the same output as > providing a string directly. > > However, precompiled regexp lisp objects are *not* automatically coerced to > lisp strings, so any lisp code that expects to be able to e.g. > (concat my-regexp-var "asdf") will now signal an error if my-regexp-var is > converted into a precompiled regexp with the new `make-regexp' constructor. > The regexp variables `image-type-header-regexps' and > `image-type-file-name-regexps' from lisp/image.el are converted into precompiled > regexp objects, and any user code expecting those to be strings will now error. > > I had to re-run autogen.sh to avoid segfaulting upon bootstrap after modifying > lisp.h (re-running ./configure alone didn't work). I suspect everyone else is > well aware of the ramifications of editing lisp.h enums, but wanted to make > sure that was clear. > > I have tried to extend existing idioms where obvious, and split off helper > methods to improve readability. I am very open to any style improvements > as well as architectural changes. Hi Danny, IMO the idea is in principle interesting but: Can we prove there is some relistic usecase where we see performance improvements? Even if we can, maybe we can just improve the caching mechanism to better work? Could you comment on the impact in existing Lisp code? IIUC given all methods in methods in search.c would accept Lisp_Regexp and strings should be limited, but what about other functions returning regexps like 'regexp-quote'? Should they return now strings or regexps? Thanks Andrea