From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Gerd_M=C3=B6llmann?= Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] add compiled regexp primitive lisp object Date: Thu, 01 Aug 2024 12:06:29 +0200 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18224"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Danny McClanahan , "emacs-devel@gnu.org" To: Andrea Corallo Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Aug 01 12:07:21 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sZSiO-0004Yx-2a for ged-emacs-devel@m.gmane-mx.org; Thu, 01 Aug 2024 12:07:20 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sZShh-0007mv-6Q; Thu, 01 Aug 2024 06:06:37 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sZShf-0007mi-Qo for emacs-devel@gnu.org; Thu, 01 Aug 2024 06:06:35 -0400 Original-Received: from mail-ej1-x62a.google.com ([2a00:1450:4864:20::62a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sZShe-0006Pr-1Y; Thu, 01 Aug 2024 06:06:35 -0400 Original-Received: by mail-ej1-x62a.google.com with SMTP id a640c23a62f3a-a7aa4ca9d72so852294966b.0; Thu, 01 Aug 2024 03:06:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722506791; x=1723111591; darn=gnu.org; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=mH7c04DY9dBJdUVvWiJOZLStnBMEhl5qbP2TxHRMonA=; b=HyBw1WONSdTJ0N5UE4q2RDisEV4as/Ohs/XB/16EHGWf3/xz6vLM3I6TRUkpzx5Y56 +j7JWk8GyEFtUTeh9SgxIi0R7TAsWwAp+8DQ2EBZIi9E5ss4hC9zAGHanQgYyNZUqMno h/XHcZ6th834k1ZltVbZu7LWTdYXKE/JDHOKkDjyEvLLbMtPEmtTqqtMh0+aAcpa16kc aFEMnHKw0RVDRGShYexItCg0bomTBX/+6+B8PLqzPtDgEzsSTTmJno/deV/ntk7mY8v9 TwSnqPevsU9HsvrmDcegsGAl/JitOa8ktLpvVsew9u6f0RggGjoVDzfjHXDP/ZMoVSZT vu9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722506791; x=1723111591; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mH7c04DY9dBJdUVvWiJOZLStnBMEhl5qbP2TxHRMonA=; b=A+n1Eqgufg0fKRIQymjCcIPDrKgY9rKiFHiBof/c901PHNe9SwzeMYF8J9/FLGcxHp ltK9TOR7cDUQfe89QV6nIsbRIENn9sqgM0DK8KlPs6/ZO+ueKF5RIn6I0WxArY9a7/TS /JyWdehRlA/B70xBYtzD/VN7SGca9459rLf128p4D5oruf0ZusKdRKfs3DISpbNkwvc6 u4HX4nqOWWshVF6KXbr2uXtNE42WA8a0qxXaKaSfg5rouAvTawuMTOStSuU1Ktv0f4P6 Nl0zKq9RGDe2PNq7/Kko/qUy6mi5TJu6RaQ8mH6k+kWV50ufUl7Rw88fSCWuLMfV2/f/ q4NQ== X-Forwarded-Encrypted: i=1; AJvYcCWWN5GtV6Ca5+2z0jpAPYPYJkXr8Jo0eHO0tI9lMCaCHIr0nP0SBOcMadek175ldSiLxL5mplqdLOZcK2x9D+jpZn+B X-Gm-Message-State: AOJu0YxFzrGCmJVLxS+kUKsZYj7vDqfzUm0eBwPrXa+r9rxB7fMoiBQz xPYC6qt7NSlfF+UYBl2ly9PqrfdK5YNGUcUYb6CKOBROIuPlRIbKGhXbQg== X-Google-Smtp-Source: AGHT+IE4CINgTilVvku9yB+JqcLOfrEHY/wjz1rybzuOtKnEFE/4E1/inNDl5kLZVAAeAuQ6a8s0Jw== X-Received: by 2002:a17:907:7da2:b0:a7a:b643:654b with SMTP id a640c23a62f3a-a7daf65a9e2mr139920966b.50.1722506790977; Thu, 01 Aug 2024 03:06:30 -0700 (PDT) Original-Received: from pro2.fritz.box (p4fe3ad86.dip0.t-ipconnect.de. [79.227.173.134]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a7acab236f9sm891329966b.15.2024.08.01.03.06.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Aug 2024 03:06:30 -0700 (PDT) In-Reply-To: (Andrea Corallo's message of "Thu, 01 Aug 2024 04:30:14 -0400") Received-SPF: pass client-ip=2a00:1450:4864:20::62a; envelope-from=gerd.moellmann@gmail.com; helo=mail-ej1-x62a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:322262 Archived-At: Andrea Corallo writes: > Danny McClanahan writes: > >> This is a first attempt at a lisp-level API for explicit regexp >> compilation. I have provided the entire diff inline in this email >> under the impression that this will make it easier to discuss the >> specifics--I do apologize if diffs above a certain size should >> always be attached as patch files in the future. >> >> The result of this change is that pre-compiled regexp objects constructed by >> `make-regexp' will have the lifetime of standard lisp objects, instead of >> being potentially invalidated and re-compiled upon every call to `string-match'. >> >> In particular, this involves the following changes: >> - add PVEC_REGEXP case to lisp.h and struct Lisp_Regexp pseudovector type >> containing the fields currently stored in struct regexp_cache >> - add syms_of_regexp() lisp exports to regex-emacs.c, >> with make-regexp and regexpp functions >> - modify all methods in search.c to accept a Lisp_Regexp as well as a string >> - add src/regex-emacs.h to dmpstruct_headers in Makefile.in >> - make Lisp_Regexp purecopyable and pdumpable >> >> Finally, it modifies a few variables in lisp/image.el to store >> compiled regexp objects instead of raw strings. Since image.el is loaded into >> the bootstrap image, I believe this demonstrates that the compiled regexp >> objects are successfully pdumpable. >> >> I have taken special care to avoid modifying the existing string-based >> implicitly-caching logic at all, so this should not break any C-level logic. >> Notably, if compiling with --enable-checking, >> (re--describe-compiled (make-regexp "asdf")) produces the same output as >> providing a string directly. >> >> However, precompiled regexp lisp objects are *not* automatically coerced to >> lisp strings, so any lisp code that expects to be able to e.g. >> (concat my-regexp-var "asdf") will now signal an error if my-regexp-var is >> converted into a precompiled regexp with the new `make-regexp' constructor. >> The regexp variables `image-type-header-regexps' and >> `image-type-file-name-regexps' from lisp/image.el are converted into precompiled >> regexp objects, and any user code expecting those to be strings will now error. >> >> I had to re-run autogen.sh to avoid segfaulting upon bootstrap after modifying >> lisp.h (re-running ./configure alone didn't work). I suspect everyone else is >> well aware of the ramifications of editing lisp.h enums, but wanted to make >> sure that was clear. >> >> I have tried to extend existing idioms where obvious, and split off helper >> methods to improve readability. I am very open to any style improvements >> as well as architectural changes. > > Hi Danny, > > IMO the idea is in principle interesting but: > > Can we prove there is some relistic usecase where we see performance > improvements? Even if we can, maybe we can just improve the caching > mechanism to better work? > > Could you comment on the impact in existing Lisp code? > > IIUC given all methods in methods in search.c would accept Lisp_Regexp > and strings should be limited, but what about other functions returning > regexps like 'regexp-quote'? Should they return now strings or regexps? > > Thanks > > Andrea Just wanted to add, as something to consider, that the current regexp cacheing is not without problems, to say the least. See bug#56108, as an example, and I'm sure there others. Anything fixing things like that would be a win, IMHO.