From: Danny McClanahan <dmcc2@hypnicjerk.ai>
To: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
Subject: rosie/libpexl library for regex pattern composition
Date: Sat, 27 Jul 2024 13:04:28 +0000 [thread overview]
Message-ID: <disFmiXATNq6Fywm_Y7OS4MzSw81H4xm0k1UQ_aAvvC2GsXEAYUXa94WCtJI5TMMhjXnyI_89vovpGAHB-G-kg1WA2-wssPGokj8LWpdAgo=@hypnicjerk.ai> (raw)
Hello emacs-devel,
I have recently become familiar with the Rosie Pattern Language (https://rosie-lang.org/) by Prof. Jamie Jennings at NCSU. I pinged this list a few months ago about improving the performance and worst-case behavior of regex-emacs.c, and while I'm still working on a prototype for that, others on this list also responded that a method to compose patterns in lisp code might be quite useful as well.
While I understand that tree-sitter tends to be the more accepted way to parse program source, there remain many use cases for which regex or something like it remains applicable, especially parsing the output of external processes (like the built-in M-x grep, or my extension https://github.com/cosmicexplorer/helm-rg). I became especially interested reading https://rosie-lang.org/about/ regarding its focus on enabling maintainable/testable libraries of patterns, which seemed to correspond to my vision of what pattern composition might look like for Emacs extension developers.
While I believe Rosie has a build-time (and possibly run-time) dependency on Lua, PEXL (https://gitlab.com/pexlang/libpexl) is the author's new implementation and is written in very portable C99. It also has several new features and implementation techniques over Rosie. I'm still getting familiar with the project, so I can't speak to any standout features yet, but on its face it seems like a potential substrate we could build lisp-level composable pattern abstractions on top of.
Rosie/PEXL's goals are explicitly focused more on maintainability than sheer performance, so I'm thinking it might make sense to introduce Rosie as a separate interface to the regex engine, while we can keep the regex engine narrowly focused on patterns that we can more easily optimize. For example, I was glad to hear in my previous communications with emacs-devel that there was some receptiveness to deprecating features like runtime lookup of mode-specific word boundaries from the regex engine if it would ease optimization (I'm not sure if that's necessary yet), but one way we could avoid removing more complex functionality like backrefs that extension devs depend on is to direct them to a lisp interface wrapping Rosie, which supports backrefs (it actually supports a strictly more powerful formalization of backrefs than regex engines do; see the author's post on it at https://jamiejennings.com/posts/2023-10-01-dont-look-back-3/).
Like I said, I'm still becoming familiar with Rosie/PEXL, so I don't quite have enough info yet to make a more thorough proposal. But I'd love to know if others are familiar with this project and whether it might correspond to the use cases for lisp-level pattern composition brought up in response to my previous communications about improving the regex engine.
Thanks,
Danny
next reply other threads:[~2024-07-27 13:04 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-27 13:04 Danny McClanahan [this message]
2024-07-28 7:08 ` rosie/libpexl library for regex pattern composition Helmut Eller
2024-07-28 7:51 ` Eli Zaretskii
2024-07-29 13:58 ` Danny McClanahan
2024-07-29 19:33 ` Helmut Eller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='disFmiXATNq6Fywm_Y7OS4MzSw81H4xm0k1UQ_aAvvC2GsXEAYUXa94WCtJI5TMMhjXnyI_89vovpGAHB-G-kg1WA2-wssPGokj8LWpdAgo=@hypnicjerk.ai' \
--to=dmcc2@hypnicjerk.ai \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.