From: Danny McClanahan <dmcc2@hypnicjerk.ai>
To: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
Subject: [PATCH] add compiled regexp primitive lisp object
Date: Tue, 30 Jul 2024 05:08:28 +0000 [thread overview]
Message-ID: <DX05HWbmw37nQEdshSH46KqWNPDwwK5xaKBcjZr7XBx293dQtroHGmYssDsOVTcMgLLqHbbrvkBpJcWuBgXsDHkvhAHLsTPyQ2j5wZpeL8M=@hypnicjerk.ai> (raw)
This is a first attempt at a lisp-level API for explicit regexp
compilation. I have provided the entire diff inline in this email
under the impression that this will make it easier to discuss the
specifics--I do apologize if diffs above a certain size should
always be attached as patch files in the future.
The result of this change is that pre-compiled regexp objects constructed by
`make-regexp' will have the lifetime of standard lisp objects, instead of
being potentially invalidated and re-compiled upon every call to `string-match'.
In particular, this involves the following changes:
- add PVEC_REGEXP case to lisp.h and struct Lisp_Regexp pseudovector type
containing the fields currently stored in struct regexp_cache
- add syms_of_regexp() lisp exports to regex-emacs.c,
with make-regexp and regexpp functions
- modify all methods in search.c to accept a Lisp_Regexp as well as a string
- add src/regex-emacs.h to dmpstruct_headers in Makefile.in
- make Lisp_Regexp purecopyable and pdumpable
Finally, it modifies a few variables in lisp/image.el to store
compiled regexp objects instead of raw strings. Since image.el is loaded into
the bootstrap image, I believe this demonstrates that the compiled regexp
objects are successfully pdumpable.
I have taken special care to avoid modifying the existing string-based
implicitly-caching logic at all, so this should not break any C-level logic.
Notably, if compiling with --enable-checking,
(re--describe-compiled (make-regexp "asdf")) produces the same output as
providing a string directly.
However, precompiled regexp lisp objects are *not* automatically coerced to
lisp strings, so any lisp code that expects to be able to e.g.
(concat my-regexp-var "asdf") will now signal an error if my-regexp-var is
converted into a precompiled regexp with the new `make-regexp' constructor.
The regexp variables `image-type-header-regexps' and
`image-type-file-name-regexps' from lisp/image.el are converted into precompiled
regexp objects, and any user code expecting those to be strings will now error.
I had to re-run autogen.sh to avoid segfaulting upon bootstrap after modifying
lisp.h (re-running ./configure alone didn't work). I suspect everyone else is
well aware of the ramifications of editing lisp.h enums, but wanted to make
sure that was clear.
I have tried to extend existing idioms where obvious, and split off helper
methods to improve readability. I am very open to any style improvements
as well as architectural changes.
---
etc/emacs_lldb.py | 1 +
lisp/image.el | 59 +++++++-------
src/Makefile.in | 3 +-
src/alloc.c | 55 +++++++++++++
src/data.c | 12 +--
src/emacs.c | 1 +
src/lisp.h | 33 ++++++++
src/pdumper.c | 123 +++++++++++++++++++++++++---
src/print.c | 25 ++++++
src/regex-emacs.c | 71 +++++++++++++++++
src/regex-emacs.h | 3 +
src/search.c | 198 ++++++++++++++++++++++++++++++++--------------
12 files changed, 477 insertions(+), 107 deletions(-)
diff --git a/etc/emacs_lldb.py b/etc/emacs_lldb.py
index ba80d3431f3..2a1fd238b17 100644
--- a/etc/emacs_lldb.py
+++ b/etc/emacs_lldb.py
@@ -69,6 +69,7 @@ class Lisp_Object:
"PVEC_MODULE_FUNCTION": "struct Lisp_Module_Function",
"PVEC_NATIVE_COMP_UNIT": "struct Lisp_Native_Comp_Unit",
"PVEC_SQLITE": "struct Lisp_Sqlite",
+ "PVEC_REGEXP": "struct Lisp_Regexp",
"PVEC_COMPILED": "struct Lisp_Vector",
"PVEC_CHAR_TABLE": "struct Lisp_Vector",
"PVEC_SUB_CHAR_TABLE": "void",
diff --git a/lisp/image.el b/lisp/image.el
index 3d60b485c6b..5c72181b94c 100644
--- a/lisp/image.el
+++ b/lisp/image.el
@@ -36,30 +36,31 @@ image
(&optional filter animation-cache))
(defconst image-type-header-regexps
- `(("\\`/[\t\n\r ]*\\*.*XPM.\\*/" . xpm)
- ("\\`P[1-6]\\(?:\
+ `((,(make-regexp "\\`/[\t\n\r ]*\\*.*XPM.\\*/") . xpm)
+ (,(make-regexp "\\`P[1-6]\\(?:\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[ \t\r\n]\\)+\
\\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
-\\)\\{2\\}" . pbm)
- ("\\`GIF8[79]a" . gif)
- ("\\`\x89PNG\r\n\x1a\n" . png)
- ("\\`[\t\n\r ]*#define \\([a-z0-9_]+\\)_width [0-9]+\n\
+\\)\\{2\\}") . pbm)
+ (,(make-regexp "\\`GIF8[79]a") . gif)
+ (,(make-regexp "\\`\x89PNG\r\n\x1a\n") . png)
+ (,(make-regexp "\\`[\t\n\r ]*#define \\([a-z0-9_]+\\)_width [0-9]+\n\
#define \\1_height [0-9]+\n\\(\
#define \\1_x_hot [0-9]+\n\
#define \\1_y_hot [0-9]+\n\\)?\
-static \\(unsigned \\)?char \\1_bits" . xbm)
- ("\\`\\(?:MM\0\\*\\|II\\*\0\\)" . tiff)
- ("\\`[\t\n\r ]*%!PS" . postscript)
- ("\\`\xff\xd8" . jpeg) ; used to be (image-jpeg-p . jpeg)
- ("\\`RIFF[^z-a][^z-a][^z-a][^z-a]WEBPVP8" . webp)
+static \\(unsigned \\)?char \\1_bits") . xbm)
+ (,(make-regexp "\\`\\(?:MM\0\\*\\|II\\*\0\\)") . tiff)
+ (,(make-regexp "\\`[\t\n\r ]*%!PS") . postscript)
+ (,(make-regexp "\\`\xff\xd8") . jpeg) ; used to be (image-jpeg-p . jpeg)
+ (,(make-regexp "\\`RIFF[^z-a][^z-a][^z-a][^z-a]WEBPVP8") . webp)
(,(let* ((incomment-re "\\(?:[^-]\\|-[^-]\\)")
- (comment-re (concat "\\(?:!--" incomment-re "*-->[ \t\r\n]*<\\)")))
- (concat "\\(?:<\\?xml[ \t\r\n]+[^>]*>\\)?[ \t\r\n]*<"
- comment-re "*"
- "\\(?:!DOCTYPE[ \t\r\n]+[^>]*>[ \t\r\n]*<[ \t\r\n]*" comment-re "*\\)?"
- "[Ss][Vv][Gg]"))
+ (comment-re (concat "\\(?:!--" incomment-re "*-->[ \t\r\n]*<\\)")))
+ (make-regexp
+ (concat "\\(?:<\\?xml[ \t\r\n]+[^>]*>\\)?[ \t\r\n]*<"
+ comment-re "*"
+ "\\(?:!DOCTYPE[ \t\r\n]+[^>]*>[ \t\r\n]*<[ \t\r\n]*" comment-re "*\\)?"
+ "[Ss][Vv][Gg]")))
. svg)
- ("\\`....ftyp\\(heic\\|heix\\|hevc\\|heim\\|heis\\|hevm\\|hevs\\|mif1\\|msf1\\)" . heic))
+ (,(make-regexp "\\`....ftyp\\(heic\\|heix\\|hevc\\|heim\\|heis\\|hevm\\|hevs\\|mif1\\|msf1\\)") . heic))
"Alist of (REGEXP . IMAGE-TYPE) pairs used to auto-detect image types.
When the first bytes of an image file match REGEXP, it is assumed to
be of image type IMAGE-TYPE if IMAGE-TYPE is a symbol. If not a symbol,
@@ -68,18 +69,18 @@ image-type-header-regexps
a non-nil value, TYPE is the image's type.")
(defvar image-type-file-name-regexps
- '(("\\.png\\'" . png)
- ("\\.gif\\'" . gif)
- ("\\.jpe?g\\'" . jpeg)
- ("\\.webp\\'" . webp)
- ("\\.bmp\\'" . bmp)
- ("\\.xpm\\'" . xpm)
- ("\\.pbm\\'" . pbm)
- ("\\.xbm\\'" . xbm)
- ("\\.ps\\'" . postscript)
- ("\\.tiff?\\'" . tiff)
- ("\\.svgz?\\'" . svg)
- ("\\.hei[cf]s?\\'" . heic))
+ `((,(make-regexp "\\.png\\'") . png)
+ (,(make-regexp "\\.gif\\'") . gif)
+ (,(make-regexp "\\.jpe?g\\'") . jpeg)
+ (,(make-regexp "\\.webp\\'") . webp)
+ (,(make-regexp "\\.bmp\\'") . bmp)
+ (,(make-regexp "\\.xpm\\'") . xpm)
+ (,(make-regexp "\\.pbm\\'") . pbm)
+ (,(make-regexp "\\.xbm\\'") . xbm)
+ (,(make-regexp "\\.ps\\'") . postscript)
+ (,(make-regexp "\\.tiff?\\'") . tiff)
+ (,(make-regexp "\\.svgz?\\'") . svg)
+ (,(make-regexp "\\.hei[cf]s?\\'") . heic))
"Alist of (REGEXP . IMAGE-TYPE) pairs used to identify image files.
When the name of an image file match REGEXP, it is assumed to
be of image type IMAGE-TYPE.")
diff --git a/src/Makefile.in b/src/Makefile.in
index c278924ef94..e6ce159c051 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -537,7 +537,8 @@ all:
.PHONY: all
dmpstruct_headers=$(srcdir)/lisp.h $(srcdir)/buffer.h $(srcdir)/itree.h \
- $(srcdir)/intervals.h $(srcdir)/charset.h $(srcdir)/bignum.h
+ $(srcdir)/intervals.h $(srcdir)/charset.h $(srcdir)/bignum.h \
+ $(srcdir)/regex-emacs.h
ifeq ($(CHECK_STRUCTS),true)
pdumper.o: dmpstruct.h
endif
diff --git a/src/alloc.c b/src/alloc.c
index 48b170b866f..856393b54df 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -3467,6 +3467,16 @@ cleanup_vector (struct Lisp_Vector *vector)
}
}
break;
+ case PVEC_REGEXP:
+ {
+ struct Lisp_Regexp *r = PSEUDOVEC_STRUCT (vector, Lisp_Regexp);
+ eassert (r->buffer->allocated > 0);
+ eassert (r->buffer->used <= r->buffer->allocated);
+ xfree (r->buffer->buffer);
+ xfree (r->buffer->fastmap);
+ xfree (r->buffer);
+ }
+ break;
case PVEC_OBARRAY:
{
struct Lisp_Obarray *o = PSEUDOVEC_STRUCT (vector, Lisp_Obarray);
@@ -5881,6 +5891,36 @@ make_pure_c_string (const char *data, ptrdiff_t nchars)
static Lisp_Object purecopy (Lisp_Object obj);
+static struct re_pattern_buffer *
+make_pure_re_pattern_buffer (const struct re_pattern_buffer *bufp)
+{
+ struct re_pattern_buffer *pure = pure_alloc (sizeof *pure, -1);
+ *pure = *bufp;
+
+ pure->buffer = pure_alloc (bufp->used, -1);
+ memcpy (pure->buffer, bufp->buffer, bufp->used);
+ pure->allocated = bufp->used;
+ pure->fastmap = pure_alloc (FASTMAP_SIZE, -1);
+ memcpy (pure->fastmap, bufp->fastmap, FASTMAP_SIZE);
+ pure->translate = purecopy (bufp->translate);
+
+ return pure;
+}
+
+static Lisp_Object
+make_pure_regexp (const struct Lisp_Regexp *r)
+{
+ struct Lisp_Regexp *pure = pure_alloc (sizeof *pure, Lisp_Vectorlike);
+ *pure = *r;
+
+ pure->pattern = purecopy (r->pattern);
+ pure->whitespace_regexp = purecopy (r->whitespace_regexp);
+ pure->syntax_table = purecopy (r->syntax_table);
+ pure->buffer = make_pure_re_pattern_buffer (r->buffer);
+
+ return make_lisp_ptr (pure, Lisp_Vectorlike);
+}
+
/* Return a cons allocated from pure space. Give it pure copies
of CAR as car and CDR as cdr. */
@@ -6038,6 +6078,8 @@ purecopy (Lisp_Object obj)
obj = make_pure_string (SSDATA (obj), SCHARS (obj),
SBYTES (obj),
STRING_MULTIBYTE (obj));
+ else if (REGEXP_P (obj))
+ obj = make_pure_regexp(XREGEXP (obj));
else if (HASH_TABLE_P (obj))
{
struct Lisp_Hash_Table *table = XHASH_TABLE (obj);
@@ -7347,6 +7389,19 @@ #define CHECK_ALLOCATED_AND_LIVE_SYMBOL() ((void) 0)
break;
}
+ case PVEC_REGEXP:
+ {
+ ptrdiff_t size = ptr->header.size;
+ if (size & PSEUDOVECTOR_FLAG)
+ size &= PSEUDOVECTOR_SIZE_MASK;
+ set_vector_marked (ptr);
+ mark_stack_push_values (ptr->contents, size);
+
+ struct Lisp_Regexp *r = (struct Lisp_Regexp *)ptr;
+ mark_stack_push_value (r->buffer->translate);
+ break;
+ }
+
case PVEC_CHAR_TABLE:
case PVEC_SUB_CHAR_TABLE:
mark_char_table (ptr, (enum pvec_type) pvectype);
diff --git a/src/data.c b/src/data.c
index d947d200870..d5fff6af6b9 100644
--- a/src/data.c
+++ b/src/data.c
@@ -286,11 +286,13 @@ DEFUN ("cl-type-of", Fcl_type_of, Scl_type_of, 1, 1, 0,
return Qtreesit_node;
case PVEC_TS_COMPILED_QUERY:
return Qtreesit_compiled_query;
- case PVEC_SQLITE:
- return Qsqlite;
- case PVEC_SUB_CHAR_TABLE:
- return Qsub_char_table;
- /* "Impossible" cases. */
+ case PVEC_SQLITE:
+ return Qsqlite;
+ case PVEC_SUB_CHAR_TABLE:
+ return Qsub_char_table;
+ case PVEC_REGEXP:
+ return Qregexp;
+ /* "Impossible" cases. */
case PVEC_MISC_PTR:
case PVEC_OTHER:
case PVEC_FREE: ;
diff --git a/src/emacs.c b/src/emacs.c
index 37c8b28fc2c..560843f9c56 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -2349,6 +2349,7 @@ main (int argc, char **argv)
syms_of_window ();
syms_of_xdisp ();
syms_of_sqlite ();
+ syms_of_regexp ();
syms_of_font ();
#ifdef HAVE_WINDOW_SYSTEM
syms_of_fringe ();
diff --git a/src/lisp.h b/src/lisp.h
index 8ac65ca429c..c277c58d8b6 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -1048,6 +1048,7 @@ DEFINE_GDB_SYMBOL_END (PSEUDOVECTOR_FLAG)
PVEC_TS_NODE,
PVEC_TS_COMPILED_QUERY,
PVEC_SQLITE,
+ PVEC_REGEXP,
/* These should be last, for internal_equal and sxhash_obj. */
PVEC_CLOSURE,
@@ -2706,6 +2707,25 @@ XHASH_TABLE (Lisp_Object a)
return XUNTAG (a, Lisp_Vectorlike, struct Lisp_Hash_Table);
}
+INLINE bool
+REGEXP_P (Lisp_Object a)
+{
+ return PSEUDOVECTORP (a, PVEC_REGEXP);
+}
+
+INLINE struct Lisp_Regexp *
+XREGEXP (Lisp_Object a)
+{
+ eassert (REGEXP_P (a));
+ return XUNTAG (a, Lisp_Vectorlike, struct Lisp_Regexp);
+}
+
+INLINE void
+CHECK_REGEXP (Lisp_Object x)
+{
+ CHECK_TYPE (REGEXP_P (x), Qregexpp, x);
+}
+
INLINE Lisp_Object
make_lisp_hash_table (struct Lisp_Hash_Table *h)
{
@@ -2955,6 +2975,16 @@ xmint_pointer (Lisp_Object a)
bool is_statement;
} GCALIGNED_STRUCT;
+struct Lisp_Regexp
+{
+ union vectorlike_header header;
+ Lisp_Object pattern;
+ Lisp_Object whitespace_regexp;
+ Lisp_Object syntax_table;
+ bool posix;
+ struct re_pattern_buffer *buffer;
+} GCALIGNED_STRUCT;
+
struct Lisp_User_Ptr
{
union vectorlike_header header;
@@ -4470,6 +4500,9 @@ verify (FLT_RADIX == 2 || FLT_RADIX == 16);
/* Defined in sqlite.c. */
extern void syms_of_sqlite (void);
+/* Defined in regex-emacs.c. */
+extern void syms_of_regexp (void);
+
/* Defined in xsettings.c. */
extern void syms_of_xsettings (void);
diff --git a/src/pdumper.c b/src/pdumper.c
index 53bddf91f04..ae7fafc7d1c 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -1528,6 +1528,36 @@ dump_emacs_reloc_copy_from_dump (struct dump_context *ctx, dump_off dump_offset,
dump_off_to_lisp (size)));
}
+static void
+dump_remember_fixup_ptr_raw (struct dump_context *ctx,
+ dump_off dump_offset,
+ dump_off new_dump_offset);
+
+/* Add a byte range relocation that copies arbitrary bytes from the dump.
+
+ When the dump is loaded, Emacs copies SIZE bytes into DUMP_OFFSET from
+ dump. Dump bytes are loaded from SOURCE. */
+static void
+dump_cold_bytes (struct dump_context *ctx, dump_off dump_offset,
+ void* source, dump_off size)
+{
+ eassert (size >= 0);
+ eassert (size < (1 << EMACS_RELOC_LENGTH_BITS));
+
+ if (!ctx->flags.dump_object_contents)
+ return;
+
+ if (size == 0)
+ {
+ eassert (source == NULL);
+ return;
+ }
+ eassert (source != NULL);
+
+ dump_remember_fixup_ptr_raw (ctx, dump_offset, ctx->offset);
+ dump_write (ctx, source, size);
+}
+
/* Add an Emacs relocation that sets values to arbitrary bytes.
When the dump is loaded, Emacs copies SIZE bytes from the
@@ -2125,6 +2155,74 @@ dump_marker (struct dump_context *ctx, const struct Lisp_Marker *marker)
return finish_dump_pvec (ctx, &out->header);
}
+static dump_off
+dump_re_pattern_buffer (struct dump_context *ctx, struct re_pattern_buffer *bufp)
+{
+#if CHECK_STRUCTS && !defined (HASH_re_pattern_buffer_36714DF24A)
+# error "re_pattern_buffer changed. See CHECK_STRUCTS comment in config.h."
+#endif
+ struct re_pattern_buffer out;
+ dump_object_start (ctx, &out, sizeof (out));
+ if (bufp->buffer)
+ dump_field_fixup_later (ctx, &out, bufp, &bufp->buffer);
+ DUMP_FIELD_COPY (&out, bufp, allocated);
+ DUMP_FIELD_COPY (&out, bufp, used);
+ DUMP_FIELD_COPY (&out, bufp, charset_unibyte);
+ if (bufp->fastmap)
+ dump_field_fixup_later (ctx, &out, bufp, &bufp->fastmap);
+ dump_field_lv (ctx, &out, bufp, &bufp->translate, WEIGHT_NORMAL);
+ DUMP_FIELD_COPY (&out, bufp, re_nsub);
+ DUMP_FIELD_COPY (&out, bufp, can_be_null);
+ DUMP_FIELD_COPY (&out, bufp, regs_allocated);
+ DUMP_FIELD_COPY (&out, bufp, fastmap_accurate);
+ DUMP_FIELD_COPY (&out, bufp, used_syntax);
+ DUMP_FIELD_COPY (&out, bufp, multibyte);
+ DUMP_FIELD_COPY (&out, bufp, target_multibyte);
+ dump_off offset = dump_object_finish (ctx, &out, sizeof (out));
+ if (bufp->buffer)
+ {
+ eassert (bufp->allocated > 0);
+ if (bufp->allocated > DUMP_OFF_MAX - 1)
+ error ("regex pattern buffer too large");
+ dump_off total_size = ptrdiff_t_to_dump_off (bufp->allocated);
+ eassert (total_size > 0);
+ dump_cold_bytes
+ (ctx,
+ offset + dump_offsetof (struct re_pattern_buffer, buffer),
+ bufp->buffer,
+ total_size);
+ }
+ if (bufp->fastmap)
+ {
+ eassert (FASTMAP_SIZE <= DUMP_OFF_MAX - 1);
+ dump_off fastmap_size = FASTMAP_SIZE;
+ dump_cold_bytes
+ (ctx,
+ offset + dump_offsetof (struct re_pattern_buffer, fastmap),
+ bufp->fastmap,
+ fastmap_size);
+ }
+ return offset;
+}
+
+static dump_off
+dump_regexp (struct dump_context *ctx, const struct Lisp_Regexp *regexp)
+{
+#if CHECK_STRUCTS && !defined (HASH_Lisp_Regexp_A3381A7F05)
+# error "Lisp_Regexp changed. See CHECK_STRUCTS comment in config.h."
+#endif
+ START_DUMP_PVEC (ctx, ®exp->header, struct Lisp_Regexp, out);
+ dump_pseudovector_lisp_fields (ctx, &out->header, ®exp->header);
+ DUMP_FIELD_COPY (out, regexp, posix);
+ dump_field_fixup_later (ctx, &out, regexp, ®exp->buffer);
+ dump_off offset = finish_dump_pvec (ctx, &out->header);
+ dump_remember_fixup_ptr_raw
+ (ctx,
+ offset + dump_offsetof (struct Lisp_Regexp, buffer),
+ dump_re_pattern_buffer (ctx, regexp->buffer));
+ return offset;
+}
+
static dump_off
dump_interval_node (struct dump_context *ctx, struct itree_node *node)
{
@@ -2135,6 +2233,7 @@ dump_interval_node (struct dump_context *ctx, struct itree_node *node)
dump_object_start (ctx, &out, sizeof (out));
if (node->parent)
dump_field_fixup_later (ctx, &out, node, &node->parent);
+ /* FIXME: should these both be &node->{left,right} instead of &node->parent? */
if (node->left)
dump_field_fixup_later (ctx, &out, node, &node->parent);
if (node->right)
@@ -3065,7 +3164,7 @@ dump_vectorlike (struct dump_context *ctx,
Lisp_Object lv,
dump_off offset)
{
-#if CHECK_STRUCTS && !defined HASH_pvec_type_99104541E2
+#if CHECK_STRUCTS && !defined HASH_pvec_type_E2BA8CE9B4
# error "pvec_type changed. See CHECK_STRUCTS comment in config.h."
#endif
const struct Lisp_Vector *v = XVECTOR (lv);
@@ -3127,6 +3226,8 @@ dump_vectorlike (struct dump_context *ctx,
#ifdef HAVE_TREE_SITTER
return dump_treesit_compiled_query (ctx, XTS_COMPILED_QUERY (lv));
#endif
+ case PVEC_REGEXP:
+ return dump_regexp (ctx, XREGEXP (lv));
case PVEC_WINDOW_CONFIGURATION:
case PVEC_OTHER:
case PVEC_XWIDGET:
@@ -3462,11 +3563,11 @@ dump_cold_string (struct dump_context *ctx, Lisp_Object string)
error ("string too large");
dump_off total_size = ptrdiff_t_to_dump_off (SBYTES (string) + 1);
eassert (total_size > 0);
- dump_remember_fixup_ptr_raw
+ dump_cold_bytes
(ctx,
string_offset + dump_offsetof (struct Lisp_String, u.s.data),
- ctx->offset);
- dump_write (ctx, XSTRING (string)->u.s.data, total_size);
+ XSTRING (string)->u.s.data,
+ total_size);
}
static void
@@ -3475,12 +3576,12 @@ dump_cold_charset (struct dump_context *ctx, Lisp_Object data)
/* Dump charset lookup tables. */
int cs_i = XFIXNUM (XCAR (data));
dump_off cs_dump_offset = dump_off_from_lisp (XCDR (data));
- dump_remember_fixup_ptr_raw
+ struct charset *cs = charset_table + cs_i;
+ dump_cold_bytes
(ctx,
cs_dump_offset + dump_offsetof (struct charset, code_space_mask),
- ctx->offset);
- struct charset *cs = charset_table + cs_i;
- dump_write (ctx, cs->code_space_mask, 256);
+ cs->code_space_mask,
+ 256);
}
static void
@@ -3501,11 +3602,11 @@ dump_cold_buffer (struct dump_context *ctx, Lisp_Object data)
+ 1;
if (nbytes > DUMP_OFF_MAX)
error ("buffer too large");
- dump_remember_fixup_ptr_raw
+ dump_cold_bytes
(ctx,
buffer_offset + dump_offsetof (struct buffer, own_text.beg),
- ctx->offset);
- dump_write (ctx, b->own_text.beg, ptrdiff_t_to_dump_off (nbytes));
+ b->own_text.beg,
+ ptrdiff_t_to_dump_off (nbytes));
}
static void
diff --git a/src/print.c b/src/print.c
index 8f28b14e8b6..011b02d316f 100644
--- a/src/print.c
+++ b/src/print.c
@@ -2084,6 +2084,31 @@ print_vectorlike_unreadable (Lisp_Object obj, Lisp_Object printcharfun,
}
return;
+ case PVEC_REGEXP:
+ {
+ struct Lisp_Regexp *r = XREGEXP (obj);
+ print_c_string ("#<regexp pattern=", printcharfun);
+ print_object (r->pattern, printcharfun, escapeflag);
+ int i = sprintf (buf, " nsub=%ld", r->buffer->re_nsub);
+ strout (buf, i, i, printcharfun);
+ print_c_string (" translate=", printcharfun);
+ print_object (r->buffer->translate, printcharfun, escapeflag);
+ print_c_string (" whitespace=", printcharfun);
+ print_object (r->whitespace_regexp, printcharfun, escapeflag);
+ print_c_string (" syntax_table=", printcharfun);
+ print_object (r->syntax_table, printcharfun, escapeflag);
+ if (r->posix)
+ {
+ print_c_string (" posix=true", printcharfun);
+ }
+ else
+ {
+ print_c_string (" posix=false", printcharfun);
+ }
+ print_c_string (">", printcharfun);
+ }
+ return;
+
case PVEC_OBARRAY:
{
struct Lisp_Obarray *o = XOBARRAY (obj);
diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index 92dbdbecbf1..113a5dd9ed1 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -32,6 +32,7 @@
#include "character.h"
#include "buffer.h"
#include "syntax.h"
+#include "charset.h"
#include "category.h"
#include "dispextern.h"
@@ -468,6 +469,8 @@ print_fastmap (FILE *dest, char *fastmap)
bool was_a_range = false;
int i = 0;
+ /* FIXME: unify "1 << BYTEWIDTH" and "FASTMAP_SIZE" defines (both are
+ the same value = 256). */
while (i < (1 << BYTEWIDTH))
{
if (fastmap[i++])
@@ -5353,3 +5356,71 @@ re_compile_pattern (const char *pattern, ptrdiff_t length,
return NULL;
return re_error_msgid[ret];
}
+
+DEFUN ("make-regexp", Fmake_regexp, Smake_regexp, 1, 3, 0,
+ doc: /* Compile a regexp object from string PATTERN.
+
+POSIX is non-nil if we want full backtracking (POSIX style) for
+this pattern.
+TRANSLATE is a translation table for ignoring case, or t to look up from
+the current buffer if `case-fold-search' is on, or nil for none. */)
+ (Lisp_Object pattern, Lisp_Object posix, Lisp_Object translate)
+{
+ const char *whitespace_regexp = STRINGP (Vsearch_spaces_regexp) ?
+ SSDATA (Vsearch_spaces_regexp) : NULL;
+ char *val;
+ bool is_posix = !NILP (posix);
+ struct Lisp_Regexp *p;
+ struct re_pattern_buffer *bufp = NULL;
+
+ if (EQ(translate, Qt))
+ translate = (!NILP (Vcase_fold_search)
+ ? BVAR (current_buffer, case_canon_table)
+ : Qnil);
+
+ CHECK_STRING (pattern);
+
+ bufp = xzalloc (sizeof (*bufp));
+
+ bufp->fastmap = xzalloc (FASTMAP_SIZE);
+ bufp->translate = translate;
+ bufp->multibyte = STRING_MULTIBYTE (pattern);
+ bufp->charset_unibyte = charset_unibyte;
+
+ val = (char *) re_compile_pattern (SSDATA (pattern), SBYTES (pattern),
+ is_posix, whitespace_regexp, bufp);
+
+ if (val)
+ {
+ xfree (bufp->buffer);
+ xfree (bufp->fastmap);
+ xfree (bufp);
+ xsignal1 (Qinvalid_regexp, build_string (val));
+ }
+
+ p = ALLOCATE_PSEUDOVECTOR (struct Lisp_Regexp, syntax_table,
+ PVEC_REGEXP);
+ p->pattern = pattern;
+ p->whitespace_regexp = Vsearch_spaces_regexp;
+ /* If the compiled pattern hard codes some of the contents of the
+ syntax-table, it can only be reused with *this* syntax table. */
+ p->syntax_table = bufp->used_syntax ? BVAR (current_buffer, syntax_table) : Qt;
+ p->posix = is_posix;
+ p->buffer = bufp;
+ return make_lisp_ptr (p, Lisp_Vectorlike);
+}
+
+DEFUN ("regexpp", Fregexpp, Sregexpp, 1, 1, 0,
+ doc: /* Say whether OBJECT is a compiled regexp object. */)
+ (Lisp_Object object)
+{
+ return REGEXP_P (object)? Qt: Qnil;
+}
+
+void syms_of_regexp (void)
+{
+ defsubr (&Smake_regexp);
+ defsubr (&Sregexpp);
+ DEFSYM (Qregexp, "regexp");
+ DEFSYM (Qregexpp, "regexpp");
+}
diff --git a/src/regex-emacs.h b/src/regex-emacs.h
index 2402e539e64..4bcbf1f29f9 100644
--- a/src/regex-emacs.h
+++ b/src/regex-emacs.h
@@ -52,6 +52,9 @@ #define EMACS_REGEX_H 1
/* Roughly the maximum number of failure points on the stack. */
extern ptrdiff_t emacs_re_max_failures;
+/* The size of an allocation for a fastmap. */
+#define FASTMAP_SIZE 0400
+
/* Amount of memory that we can safely stack allocate. */
extern ptrdiff_t emacs_re_safe_alloca;
\f
diff --git a/src/search.c b/src/search.c
index 2ff8b0599c4..5710ff30005 100644
--- a/src/search.c
+++ b/src/search.c
@@ -50,7 +50,7 @@ #define REGEXP_CACHE_SIZE 20
for any syntax-table. */
Lisp_Object syntax_table;
struct re_pattern_buffer buf;
- char fastmap[0400];
+ char fastmap[FASTMAP_SIZE];
/* True means regexp was compiled to do full POSIX backtracking. */
bool posix;
/* True means we're inside a buffer match. */
@@ -181,6 +181,8 @@ freeze_pattern (struct regexp_cache *searchbuf)
{
eassert (!searchbuf->busy);
record_unwind_protect_ptr (unfreeze_pattern, searchbuf);
+ eassert (searchbuf != NULL);
+ assume (searchbuf);
searchbuf->busy = true;
}
@@ -261,11 +263,13 @@ compile_pattern (Lisp_Object pattern, struct re_registers *regp,
\f
static Lisp_Object
-looking_at_1 (Lisp_Object string, bool posix, bool modify_data)
+looking_at_1 (Lisp_Object regexp, bool posix, bool modify_data)
{
Lisp_Object val;
unsigned char *p1, *p2;
ptrdiff_t s1, s2;
+ struct re_pattern_buffer *bufp = NULL;
+ struct regexp_cache *cache_entry = NULL;
register ptrdiff_t i;
if (running_asynch_code)
@@ -276,18 +280,22 @@ looking_at_1 (Lisp_Object string, bool posix, bool modify_data)
set_char_table_extras (BVAR (current_buffer, case_canon_table), 2,
BVAR (current_buffer, case_eqv_table));
- CHECK_STRING (string);
+ if (REGEXP_P (regexp))
+ bufp = XREGEXP (regexp)->buffer;
+ else
+ CHECK_STRING (regexp);
/* Snapshot in case Lisp changes the value. */
bool modify_match_data = NILP (Vinhibit_changing_match_data) && modify_data;
- struct regexp_cache *cache_entry = compile_pattern (
- string,
- modify_match_data ? &search_regs : NULL,
- (!NILP (Vcase_fold_search)
- ? BVAR (current_buffer, case_canon_table) : Qnil),
- posix,
- !NILP (BVAR (current_buffer, enable_multibyte_characters)));
+ if (!bufp)
+ cache_entry = compile_pattern (
+ regexp,
+ modify_match_data ? &search_regs : NULL,
+ (!NILP (Vcase_fold_search)
+ ? BVAR (current_buffer, case_canon_table) : Qnil),
+ posix,
+ !NILP (BVAR (current_buffer, enable_multibyte_characters)));
/* Do a pending quit right away, to avoid paradoxical behavior */
maybe_quit ();
@@ -313,9 +321,16 @@ looking_at_1 (Lisp_Object string, bool posix, bool modify_data)
specpdl_ref count = SPECPDL_INDEX ();
freeze_buffer_relocation ();
- freeze_pattern (cache_entry);
+
+ if (!bufp)
+ {
+ eassert (cache_entry != NULL);
+ freeze_pattern (cache_entry);
+ bufp = &cache_entry->buf;
+ }
+
re_match_object = Qnil;
- i = re_match_2 (&cache_entry->buf, (char *) p1, s1, (char *) p2, s2,
+ i = re_match_2 (bufp, (char *) p1, s1, (char *) p2, s2,
PT_BYTE - BEGV_BYTE,
modify_match_data ? &search_regs : NULL,
ZV_BYTE - BEGV_BYTE);
@@ -374,11 +389,16 @@ string_match_1 (Lisp_Object regexp, Lisp_Object string, Lisp_Object start,
EMACS_INT pos;
ptrdiff_t pos_byte, i;
bool modify_match_data = NILP (Vinhibit_changing_match_data) && modify_data;
+ struct re_pattern_buffer* bufp = NULL;
if (running_asynch_code)
save_search_regs ();
- CHECK_STRING (regexp);
+ if (REGEXP_P (regexp))
+ bufp = XREGEXP (regexp)->buffer;
+ else
+ CHECK_STRING (regexp);
+
CHECK_STRING (string);
if (NILP (start))
@@ -402,17 +422,23 @@ string_match_1 (Lisp_Object regexp, Lisp_Object string, Lisp_Object start,
BVAR (current_buffer, case_eqv_table));
specpdl_ref count = SPECPDL_INDEX ();
- struct regexp_cache *cache_entry
- = compile_pattern (regexp,
- modify_match_data ? &search_regs : NULL,
- (!NILP (Vcase_fold_search)
- ? BVAR (current_buffer, case_canon_table)
- : Qnil),
- posix,
- STRING_MULTIBYTE (string));
- freeze_pattern (cache_entry);
+
+ if (!bufp)
+ {
+ struct regexp_cache *cache_entry
+ = compile_pattern (regexp,
+ modify_match_data ? &search_regs : NULL,
+ (!NILP (Vcase_fold_search)
+ ? BVAR (current_buffer, case_canon_table)
+ : Qnil),
+ posix,
+ STRING_MULTIBYTE (string));
+ freeze_pattern (cache_entry);
+ bufp = &cache_entry->buf;
+ }
+
re_match_object = string;
- val = re_search (&cache_entry->buf, SSDATA (string),
+ val = re_search (bufp, SSDATA (string),
SBYTES (string), pos_byte,
SBYTES (string) - pos_byte,
(modify_match_data ? &search_regs : NULL));
@@ -485,12 +511,21 @@ DEFUN ("posix-string-match", Fposix_string_match, Sposix_string_match, 2, 4, 0,
fast_string_match_internal (Lisp_Object regexp, Lisp_Object string,
Lisp_Object table)
{
+ struct re_pattern_buffer* bufp;
re_match_object = string;
specpdl_ref count = SPECPDL_INDEX ();
- struct regexp_cache *cache_entry
- = compile_pattern (regexp, 0, table, 0, STRING_MULTIBYTE (string));
- freeze_pattern (cache_entry);
- ptrdiff_t val = re_search (&cache_entry->buf, SSDATA (string),
+
+ if (REGEXP_P (regexp))
+ bufp = XREGEXP (regexp)->buffer;
+ else
+ {
+ struct regexp_cache *cache_entry
+ = compile_pattern (regexp, 0, table, 0, STRING_MULTIBYTE (string));
+ freeze_pattern (cache_entry);
+ bufp = &cache_entry->buf;
+ }
+
+ ptrdiff_t val = re_search (bufp, SSDATA (string),
SBYTES (string), 0,
SBYTES (string), 0);
unbind_to (count, Qnil);
@@ -509,18 +544,27 @@ fast_c_string_match_internal (Lisp_Object regexp,
const char *string, ptrdiff_t len,
Lisp_Object table)
{
- /* FIXME: This is expensive and not obviously correct when it makes
- a difference. I.e., no longer "fast", and may hide bugs.
- Something should be done about this. */
- regexp = string_make_unibyte (regexp);
+ struct re_pattern_buffer *bufp;
/* Record specpdl index because freeze_pattern pushes an
unwind-protect on the specpdl. */
specpdl_ref count = SPECPDL_INDEX ();
- struct regexp_cache *cache_entry
- = compile_pattern (regexp, 0, table, 0, 0);
- freeze_pattern (cache_entry);
+
+ if (REGEXP_P (regexp))
+ bufp = XREGEXP (regexp)->buffer;
+ else
+ {
+ /* FIXME: This is expensive and not obviously correct when it makes
+ a difference. I.e., no longer "fast", and may hide bugs.
+ Something should be done about this. */
+ regexp = string_make_unibyte (regexp);
+ struct regexp_cache *cache_entry
+ = compile_pattern (regexp, 0, table, 0, 0);
+ freeze_pattern (cache_entry);
+ bufp = &cache_entry->buf;
+ }
+
re_match_object = Qt;
- ptrdiff_t val = re_search (&cache_entry->buf, string, len, 0, len, 0);
+ ptrdiff_t val = re_search (bufp, string, len, 0, len, 0);
unbind_to (count, Qnil);
return val;
}
@@ -539,6 +583,8 @@ fast_looking_at (Lisp_Object regexp, ptrdiff_t pos, ptrdiff_t pos_byte,
unsigned char *p1, *p2;
ptrdiff_t s1, s2;
ptrdiff_t len;
+ struct re_pattern_buffer *bufp = NULL;
+ struct regexp_cache *cache_entry = NULL;
if (STRINGP (string))
{
@@ -578,13 +624,24 @@ fast_looking_at (Lisp_Object regexp, ptrdiff_t pos, ptrdiff_t pos_byte,
multibyte = ! NILP (BVAR (current_buffer, enable_multibyte_characters));
}
- struct regexp_cache *cache_entry =
- compile_pattern (regexp, 0, Qnil, 0, multibyte);
+ if (REGEXP_P (regexp))
+ bufp = XREGEXP (regexp)->buffer;
+ else
+ cache_entry =
+ compile_pattern (regexp, 0, Qnil, 0, multibyte);
+
specpdl_ref count = SPECPDL_INDEX ();
freeze_buffer_relocation ();
- freeze_pattern (cache_entry);
+
+ if (!bufp)
+ {
+ eassert (cache_entry != NULL);
+ freeze_pattern (cache_entry);
+ bufp = &cache_entry->buf;
+ }
+
re_match_object = STRINGP (string) ? string : Qnil;
- len = re_match_2 (&cache_entry->buf, (char *) p1, s1, (char *) p2, s2,
+ len = re_match_2 (bufp, (char *) p1, s1, (char *) p2, s2,
pos_byte, NULL, limit_byte);
unbind_to (count, Qnil);
@@ -1157,9 +1214,9 @@ while (0)
static struct re_registers search_regs_1;
static EMACS_INT
-search_buffer_re (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
- ptrdiff_t lim, ptrdiff_t lim_byte, EMACS_INT n,
- Lisp_Object trt, Lisp_Object inverse_trt, bool posix)
+search_buffer_re (Lisp_Object regexp, ptrdiff_t pos, ptrdiff_t pos_byte,
+ ptrdiff_t lim, ptrdiff_t lim_byte, EMACS_INT n,
+ Lisp_Object trt, Lisp_Object inverse_trt, bool posix)
{
unsigned char *p1, *p2;
ptrdiff_t s1, s2;
@@ -1167,12 +1224,17 @@ search_buffer_re (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
/* Snapshot in case Lisp changes the value. */
bool preserve_match_data = NILP (Vinhibit_changing_match_data);
- struct regexp_cache *cache_entry =
- compile_pattern (string,
- preserve_match_data ? &search_regs : &search_regs_1,
- trt, posix,
- !NILP (BVAR (current_buffer, enable_multibyte_characters)));
- struct re_pattern_buffer *bufp = &cache_entry->buf;
+ struct re_pattern_buffer *bufp = NULL;
+ struct regexp_cache *cache_entry = NULL;
+
+ if (REGEXP_P (regexp))
+ bufp = XREGEXP (regexp)->buffer;
+ else
+ cache_entry =
+ compile_pattern (regexp,
+ preserve_match_data ? &search_regs : &search_regs_1,
+ trt, posix,
+ !NILP (BVAR (current_buffer, enable_multibyte_characters)));
maybe_quit (); /* Do a pending quit right away,
to avoid paradoxical behavior */
@@ -1197,7 +1259,13 @@ search_buffer_re (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
specpdl_ref count = SPECPDL_INDEX ();
freeze_buffer_relocation ();
- freeze_pattern (cache_entry);
+
+ if (!bufp)
+ {
+ eassert (cache_entry != NULL);
+ freeze_pattern (cache_entry);
+ bufp = &cache_entry->buf;
+ }
while (n < 0)
{
@@ -3405,16 +3473,24 @@ DEFUN ("re--describe-compiled", Fre__describe_compiled, Sre__describe_compiled,
If RAW is non-nil, just return the actual bytecode. */)
(Lisp_Object regexp, Lisp_Object raw)
{
- struct regexp_cache *cache_entry
- = compile_pattern (regexp, NULL,
- (!NILP (Vcase_fold_search)
- ? BVAR (current_buffer, case_canon_table) : Qnil),
- false,
- !NILP (BVAR (current_buffer,
- enable_multibyte_characters)));
+ struct re_pattern_buffer *bufp;
+
+ if (REGEXP_P (regexp))
+ bufp = XREGEXP (regexp)->buffer;
+ else
+ {
+ struct regexp_cache *cache_entry
+ = compile_pattern (regexp, NULL,
+ (!NILP (Vcase_fold_search)
+ ? BVAR (current_buffer, case_canon_table) : Qnil),
+ false,
+ !NILP (BVAR (current_buffer,
+ enable_multibyte_characters)));
+ bufp = &cache_entry->buf;
+ }
+
if (!NILP (raw))
- return make_unibyte_string ((char *) cache_entry->buf.buffer,
- cache_entry->buf.used);
+ return make_unibyte_string ((char *) bufp->buffer, bufp->used);
else
{ /* FIXME: Why ENABLE_CHECKING? */
#if !defined ENABLE_CHECKING
@@ -3424,8 +3500,8 @@ DEFUN ("re--describe-compiled", Fre__describe_compiled, Sre__describe_compiled,
size_t size = 0;
FILE* f = open_memstream (&buffer, &size);
if (!f)
- report_file_error ("open_memstream failed", regexp);
- print_compiled_pattern (f, &cache_entry->buf);
+ report_file_error ("open_memstream failed", regexp);
+ print_compiled_pattern (f, bufp);
fclose (f);
if (!buffer)
return Qnil;
@@ -3433,7 +3509,7 @@ DEFUN ("re--describe-compiled", Fre__describe_compiled, Sre__describe_compiled,
free (buffer);
return description;
#else /* ENABLE_CHECKING && !HAVE_OPEN_MEMSTREAM */
- print_compiled_pattern (stderr, &cache_entry->buf);
+ print_compiled_pattern (stderr, bufp);
return build_string ("Description was sent to standard error");
#endif /* !ENABLE_CHECKING */
}
next reply other threads:[~2024-07-30 5:08 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-30 5:08 Danny McClanahan [this message]
2024-07-30 13:02 ` [PATCH] add compiled regexp primitive lisp object Philip Kaludercic
2024-07-31 22:33 ` dmcc2
2024-08-01 1:04 ` Pip Cet
2024-08-04 23:38 ` Danny McClanahan
2024-08-05 3:47 ` dmcc2
2024-08-05 4:39 ` Danny McClanahan
2024-08-05 7:15 ` Danny McClanahan
2024-08-05 17:55 ` Pip Cet
2024-08-06 15:15 ` Danny McClanahan
2024-08-06 15:57 ` Eli Zaretskii
2024-08-07 4:28 ` Danny McClanahan
2024-08-06 18:18 ` Pip Cet
2024-08-06 18:38 ` Eli Zaretskii
2024-08-07 4:23 ` Danny McClanahan
2024-08-07 12:00 ` Eli Zaretskii
2024-08-07 12:43 ` Helmut Eller
2024-08-07 13:40 ` Augusto Stoffel
2024-08-07 15:23 ` Danny McClanahan
2024-08-14 1:32 ` Stefan Monnier
2024-08-07 15:02 ` Danny McClanahan
2024-08-07 15:23 ` Eli Zaretskii
2024-08-14 1:25 ` Stefan Monnier
2024-08-07 7:59 ` Danny McClanahan
2024-08-06 12:08 ` Eli Zaretskii
2024-08-01 8:30 ` Andrea Corallo
2024-08-01 10:06 ` Gerd Möllmann
2024-08-06 13:47 ` Danny McClanahan
2024-08-06 13:57 ` Danny McClanahan
2024-08-07 7:21 ` Andrea Corallo
2024-08-07 8:27 ` Danny McClanahan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='DX05HWbmw37nQEdshSH46KqWNPDwwK5xaKBcjZr7XBx293dQtroHGmYssDsOVTcMgLLqHbbrvkBpJcWuBgXsDHkvhAHLsTPyQ2j5wZpeL8M=@hypnicjerk.ai' \
--to=dmcc2@hypnicjerk.ai \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.