From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Danny McClanahan Newsgroups: gmane.emacs.devel Subject: rosie/libpexl library for regex pattern composition Date: Sat, 27 Jul 2024 13:04:28 +0000 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11836"; mail-complaints-to="usenet@ciao.gmane.io" To: "emacs-devel@gnu.org" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jul 27 17:38:19 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sXjUv-0002pG-Jy for ged-emacs-devel@m.gmane-mx.org; Sat, 27 Jul 2024 17:38:18 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sXjU7-0006wm-5B; Sat, 27 Jul 2024 11:37:28 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sXh6H-0006yU-AR for emacs-devel@gnu.org; Sat, 27 Jul 2024 09:04:41 -0400 Original-Received: from mail-4018.proton.ch ([185.70.40.18]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sXh6E-0000TB-WF for emacs-devel@gnu.org; Sat, 27 Jul 2024 09:04:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hypnicjerk.ai; s=protonmail2; t=1722085475; x=1722344675; bh=Ujf6YMtpARdAehxGT92O6D12MLKBq4vTgi27URjzg7k=; h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date: Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector; b=ig8oAuBFMgupPeNE63x+Gry+l37GGumxedx6HuEZ4ypTmsvR2qZZu4W85xDHW++7G SzqsrNdwuIxtv5Nm9a2dgG8pSlOr1vejhzk+cWUdFHnVbeV+djknIwp4JDdWhAY92G jnaJh3R8a8lr/M4pZbmzWNbSgUnXXM5MHcRl1zZgwij/1E0LRfJ0hSfpNxTmhpWlYK Ygq+Zs+xzHUDwfr+OzABBQWNnnTuKgcAqMSQZhU60tCcVvD3xRHDH+8QpfvCRXd07d Pe6jl8VkjZysrP+w4QYv8ThWrPpSStOS3QuyE8oWxH2CsOfZje0QXBw5hm98Ehk5Xl g8cajT/QIR2mg== Feedback-ID: 27837847:user:proton X-Pm-Message-ID: b562091c6614421a2850418d7bcc06b3b37b612c Received-SPF: pass client-ip=185.70.40.18; envelope-from=dmcc2@hypnicjerk.ai; helo=mail-4018.proton.ch X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sat, 27 Jul 2024 11:37:24 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:322131 Archived-At: Hello emacs-devel, I have recently become familiar with the Rosie Pattern Language (https://ro= sie-lang.org/) by Prof. Jamie Jennings at NCSU. I pinged this list a few mo= nths ago about improving the performance and worst-case behavior of regex-e= macs.c, and while I'm still working on a prototype for that, others on this= list also responded that a method to compose patterns in lisp code might b= e quite useful as well. While I understand that tree-sitter tends to be the more accepted way to pa= rse program source, there remain many use cases for which regex or somethin= g like it remains applicable, especially parsing the output of external pro= cesses (like the built-in M-x grep, or my extension https://github.com/cosm= icexplorer/helm-rg). I became especially interested reading https://rosie-l= ang.org/about/ regarding its focus on enabling maintainable/testable librar= ies of patterns, which seemed to correspond to my vision of what pattern co= mposition might look like for Emacs extension developers. While I believe Rosie has a build-time (and possibly run-time) dependency o= n Lua, PEXL (https://gitlab.com/pexlang/libpexl) is the author's new implem= entation and is written in very portable C99. It also has several new featu= res and implementation techniques over Rosie. I'm still getting familiar wi= th the project, so I can't speak to any standout features yet, but on its f= ace it seems like a potential substrate we could build lisp-level composabl= e pattern abstractions on top of. Rosie/PEXL's goals are explicitly focused more on maintainability than shee= r performance, so I'm thinking it might make sense to introduce Rosie as a = separate interface to the regex engine, while we can keep the regex engine = narrowly focused on patterns that we can more easily optimize. For example,= I was glad to hear in my previous communications with emacs-devel that the= re was some receptiveness to deprecating features like runtime lookup of mo= de-specific word boundaries from the regex engine if it would ease optimiza= tion (I'm not sure if that's necessary yet), but one way we could avoid rem= oving more complex functionality like backrefs that extension devs depend o= n is to direct them to a lisp interface wrapping Rosie, which supports back= refs (it actually supports a strictly more powerful formalization of backre= fs than regex engines do; see the author's post on it at https://jamiejenni= ngs.com/posts/2023-10-01-dont-look-back-3/). Like I said, I'm still becoming familiar with Rosie/PEXL, so I don't quite = have enough info yet to make a more thorough proposal. But I'd love to know= if others are familiar with this project and whether it might correspond t= o the use cases for lisp-level pattern composition brought up in response t= o my previous communications about improving the regex engine. Thanks, Danny