unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Danny McClanahan <dmcc2@hypnicjerk.ai>
To: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
Subject: Re: prior work on non-backtracking regex engine?
Date: Tue, 12 Mar 2024 23:45:38 +0000	[thread overview]
Message-ID: <Stg6_g1_PHfTYAWkfl-oqZNpV4u5Q2NAQsT-162rcb8PGYLpUqphMqoa_YpRA6e7WGtjAPYByt18IC1gVbkyAl2FfiHYe6Wy6kKyqbzQEUk=@hypnicjerk.ai> (raw)
In-Reply-To: <3a9IKoS2YLqJYosdfpFVdq8ashG0LPPJdB-ugdUgJEqM6-O3RWFeCu01FUPYBsp87xchkX-z1PRlNqJQm8ge_h3v0ziCWcME2fx-6PW-UP4=@hypnicjerk.ai>

Came up with some more practical/specific questions about implementation. I haven't written any code for this yet and I haven't demonstrated any performance improvement yet, so this is all still totally subject to change. Feel free to ignore for now and I'll post again when I have a convincing demo!

(4) What is the best way to package third-party code for emacs?

I was thinking of architecting this regex engine as a third-party codebase exposing a C ABI shared library, which the emacs build system could detect as an optional dependency (like libjansson). I was hoping to use rust to implement this regex engine, but I know that a cargo package alone isn't enough for non-rust code to depend on: I was also planning to maintain package recipes for this regex engine for multiple linux distros, so that emacs could easily add it to the build system (like librsvg). Are there any further constraints I should know about for optional dependencies in the emacs build system?

`cbindgen' (https://github.com/mozilla/cbindgen) is a mature piece of software to generate C headers for rust code, and cargo/rustc already supports generating C ABI shared libs with the `cdylib' linkage (https://doc.rust-lang.org/reference/linkage.html). I'm also hoping to use rust's `no_std' feature (https://docs.rust-embedded.org/book/intro/no-std.html) to produce very small binaries without any dependency on the rust stdlib or even libc. Is binary size a concern at all for shared libraries emacs depends on?

Note: my goal is to make emacs faster for all users, which is why I'm thinking this non-backtracking engine makes the most sense as a direct replacement for the code in `regex-emacs.c'. However, I understand it would be nicer if this performance improvement didn't require an external dependency at all. If this code ends up being simple enough, I would absolutely consider translating it to C code and contributing it to the emacs codebase directly. Depending on the performance results, we could also consider contributing a "simple" version of this non-backtracking engine in C in emacs itself, with an optional dependency enabling the use of the more performant rust engine. We'll know more about this after we have a working prototype.

(6) How should we handle allocation?

Along with `no_std', I'm also hoping to use the Allocator API (https://doc.rust-lang.org/std/alloc/trait.Allocator.html) with rust collections in the implementation of this regex engine. This should make it feasible to present a C ABI interface which delegates any dynamic allocations to the caller, so that emacs can track regex engine state using its own GC. To be clear, I suspect this just means emacs would provide an `alloc()' function pointer to some methods when calling into the regex engine, along with a `free()' function pointer to other methods. Does that sound like a reasonable approach, or is there more complexity around correctly allocating GC-able objects that I should be aware of? Are there examples of this kind of allocation delegation in the source code that I should model the regex engine API after?

Thanks,
Danny



  reply	other threads:[~2024-03-12 23:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-10 15:41 prior work on non-backtracking regex engine? Danny McClanahan
2024-03-12 23:45 ` Danny McClanahan [this message]
2024-03-13 13:23   ` Ihor Radchenko
2024-04-07  4:42     ` Danny McClanahan
2024-04-07 14:15       ` Ihor Radchenko
2024-04-08 12:19       ` Helmut Eller
2024-04-08 13:13         ` Eli Zaretskii
2024-04-08 14:00       ` Po Lu
2024-04-08 14:23         ` Eli Zaretskii
2024-04-12  0:12           ` Danny McClanahan
2024-04-17 14:23 ` Clément Pit-Claudel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='Stg6_g1_PHfTYAWkfl-oqZNpV4u5Q2NAQsT-162rcb8PGYLpUqphMqoa_YpRA6e7WGtjAPYByt18IC1gVbkyAl2FfiHYe6Wy6kKyqbzQEUk=@hypnicjerk.ai' \
    --to=dmcc2@hypnicjerk.ai \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).