From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.devel Subject: Re: Pattern matching on match-string groups #elisp #question Date: Sat, 27 Feb 2021 19:10:59 +0100 Message-ID: References: <87v9agxkld.fsf@tcd.ie> <80CE2366-76F4-4548-B956-F16DFCE23E4C@acm.org> <258C930A-B183-4211-9917-0AD96C17A638@acm.org> <288FFC66-E3BE-4E5F-AAD5-309A632F8058@acm.org> Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.17\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_2CB2567C-B914-4B7A-8608-A639D85FC8AB" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31060"; mail-complaints-to="usenet@ciao.gmane.io" Cc: "Basil L. Contovounesios" , Ag Ibragimov , Emacs developers To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Feb 27 19:11:54 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lG44I-0007zl-HF for ged-emacs-devel@m.gmane-mx.org; Sat, 27 Feb 2021 19:11:54 +0100 Original-Received: from localhost ([::1]:35056 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lG44H-0000sP-G4 for ged-emacs-devel@m.gmane-mx.org; Sat, 27 Feb 2021 13:11:53 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36810) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lG43Z-0000MX-UG for emacs-devel@gnu.org; Sat, 27 Feb 2021 13:11:09 -0500 Original-Received: from mail1437c50.megamailservers.eu ([91.136.14.37]:37624 helo=mail263c50.megamailservers.eu) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lG43X-0006Ga-Kw for emacs-devel@gnu.org; Sat, 27 Feb 2021 13:11:09 -0500 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1614449463; bh=IUOjkCDwjsaMA3ilLeEzI8vopXbHt4JVVnkWK0l/Tio=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=KpJOAHnbRQOOWjM85hPEF8h3nUfBw3b9eAZ4AHJNDEXdZx1ei1lSLnSvw5fNmut99 hV9FlIPGqtukh+RCnHU2w3vnXPclIPjxY8oKcNhgc++GEbO6Kkv2+nyD6veMZsDJ/m BjPMgxsNFARvuTb/B4O/duSEzEwUF+/bwSkWFTGU= Feedback-ID: mattiase@acm.or Original-Received: from stanniol.lan (c-b952e353.032-75-73746f71.bbcust.telenor.se [83.227.82.185]) (authenticated bits=0) by mail263c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 11RIAx7k002284; Sat, 27 Feb 2021 18:11:01 +0000 In-Reply-To: X-Mailer: Apple Mail (2.3445.104.17) X-CTCH-RefID: str=0001.0A742F20.603A8B36.0053, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=fdHTNHYF c=1 sm=1 tr=0 a=von4qPfY+hyqc0zmWf0tYQ==:117 a=von4qPfY+hyqc0zmWf0tYQ==:17 a=M51BFTxLslgA:10 a=iRZporoAAAAA:8 a=1UpHP2DxlpCT44CIBJYA:9 a=QEXdDO2ut3YA:10 a=xAgaMFEgslUK33sBxSMA:9 a=De_Ol2h6w80A:10 a=NOBgFS-JBQ2l-kSd6-zu:22 X-Origin-Country: SE Received-SPF: softfail client-ip=91.136.14.37; envelope-from=mattiase@acm.org; helo=mail263c50.megamailservers.eu X-Spam_score_int: -11 X-Spam_score: -1.2 X-Spam_bar: - X-Spam_report: (-1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:265732 Archived-At: --Apple-Mail=_2CB2567C-B914-4B7A-8608-A639D85FC8AB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 27 feb. 2021 kl. 15.39 skrev Stefan Monnier : > Nevertheless, I went ahead with this change (after remembering that > wrapping the code in `ignore` should eliminate the extra warnings). So where does that leave us with the rx pattern? There's still the = interleaved match data problem, which I've tried to address below. > It's clearly The Right Thing=E2=84=A2. Perhaps it is; a proposed diff is attached below which treats zero and = one variable specially and uses a list followed by immediate = destructuring for >1 variables. (By the way, using a backquote form to = generate backquote forms is annoying.) My guess is that a vector may be faster than a list if there are more = than N elements, for some N. >> My guess is that a vector may be faster than a list if there are more = than N elements, for some N. >=20 > I'll let you benchmark it to determine the N. I now have, and am sad to say that a list is always faster for any = practical number of N (I didn't bother trying more than 30) although the = difference narrows as N grows. This is despite the destructuring code = becoming considerably bigger for lists (as we get a long chain of tests = and branches) than for vectors. It all boils down to vector construction = being more expensive than lists. Maybe we should pack N>1 variables into N-1 cons cells by using the last = cdr (improper list), but list* is no primitive so it may be a loss for = N>M, for some M>2. > currently `string-match-p` is ever so slightly > slower than `string-match` and since we clobber the match data in = other > cases, we might as well clobber the match data in this case as well: = any > code which presumes the match data isn't affected by some other code > which uses regular expressions is quite confused. Right; I'm sticking to string-match for the time being. > I don't think it's much more complicated than your current constant > folding: when you see a let-binding of a variable to a *constructor*, > stash that expression in your context as a "partially known constant" > and then do the constant folding when you see a matching *destructor*. Doable, but definitely not low-hanging fruit. Since pcase has made a = dog's breakfast of the destructuring code it's not straightforward to = recognise it as such in the optimiser. Efforts needed elsewhere first! > go back to the last option it tried and accept it even though it = failed > to match. It still sucks, but maybe it'll give someone else a better = idea? Sounds like pcase--dead-end would fit then, at least as an internal = name. Or pcase--Sherlock-Holmes. --Apple-Mail=_2CB2567C-B914-4B7A-8608-A639D85FC8AB Content-Disposition: attachment; filename=rx-pcase-list-destructure.diff Content-Type: application/octet-stream; x-unix-mode=0644; name="rx-pcase-list-destructure.diff" Content-Transfer-Encoding: 7bit diff --git a/lisp/emacs-lisp/rx.el b/lisp/emacs-lisp/rx.el index ffc21951b6..736758d01f 100644 --- a/lisp/emacs-lisp/rx.el +++ b/lisp/emacs-lisp/rx.el @@ -1436,17 +1436,31 @@ rx introduced by a previous (let REF ...) construct." (let* ((rx--pcase-vars nil) - (regexp (rx--to-expr (rx--pcase-transform (cons 'seq regexps))))) + (regexp (rx--to-expr (rx--pcase-transform (cons 'seq regexps)))) + (nvars (length rx--pcase-vars))) `(and (pred stringp) - ;; `pcase-let' takes a match for granted and discards all unnecessary - ;; conditions, which means that a `pred' clause cannot be used for - ;; the match condition. The following construct seems to survive. - (app (lambda (s) (string-match ,regexp s)) (pred identity)) - ,@(let ((i 0)) - (mapcar (lambda (name) - (setq i (1+ i)) - `(app (match-string ,i) ,name)) - (reverse rx--pcase-vars)))))) + ,(cond ((= nvars 0) + ;; No variables bound: a single predicate suffices. + `(pred (string-match ,regexp))) + ((= nvars 1) + ;; Single variable: bind it to the result of the + ;; lambda function below. + `(app (lambda (s) + (and (string-match ,regexp s) + (match-string 1 s))) + ,(car rx--pcase-vars))) + (t + ;; Multiple variables: pack the submatches into a list + ;; which is then immediately destructured into individual + ;; variables again. This is of course slightly inefficient. + `(app (lambda (s) + (and (string-match ,regexp s) + (list + ,@(mapcar (lambda (i) `(match-string ,i s)) + (number-sequence 1 nvars))))) + ,(list '\` + (mapcar (lambda (name) (list '\, name)) + (reverse rx--pcase-vars))))))))) ;; Obsolete internal symbol, used in old versions of the `flycheck' package. (define-obsolete-function-alias 'rx-submatch-n 'rx-to-string "27.1") --Apple-Mail=_2CB2567C-B914-4B7A-8608-A639D85FC8AB--