unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH] add compiled regexp primitive lisp object
@ 2024-07-30  5:08 Danny McClanahan
  2024-07-30 13:02 ` Philip Kaludercic
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Danny McClanahan @ 2024-07-30  5:08 UTC (permalink / raw)
  To: emacs-devel@gnu.org

This is a first attempt at a lisp-level API for explicit regexp
compilation. I have provided the entire diff inline in this email
under the impression that this will make it easier to discuss the
specifics--I do apologize if diffs above a certain size should
always be attached as patch files in the future.

The result of this change is that pre-compiled regexp objects constructed by
`make-regexp' will have the lifetime of standard lisp objects, instead of
being potentially invalidated and re-compiled upon every call to `string-match'.

In particular, this involves the following changes:
- add PVEC_REGEXP case to lisp.h and struct Lisp_Regexp pseudovector type
  containing the fields currently stored in struct regexp_cache
- add syms_of_regexp() lisp exports to regex-emacs.c,
  with make-regexp and regexpp functions
- modify all methods in search.c to accept a Lisp_Regexp as well as a string
- add src/regex-emacs.h to dmpstruct_headers in Makefile.in
- make Lisp_Regexp purecopyable and pdumpable

Finally, it modifies a few variables in lisp/image.el to store
compiled regexp objects instead of raw strings. Since image.el is loaded into
the bootstrap image, I believe this demonstrates that the compiled regexp
objects are successfully pdumpable.

I have taken special care to avoid modifying the existing string-based
implicitly-caching logic at all, so this should not break any C-level logic.
Notably, if compiling with --enable-checking,
(re--describe-compiled (make-regexp "asdf")) produces the same output as
providing a string directly.

However, precompiled regexp lisp objects are *not* automatically coerced to
lisp strings, so any lisp code that expects to be able to e.g.
(concat my-regexp-var "asdf") will now signal an error if my-regexp-var is
converted into a precompiled regexp with the new `make-regexp' constructor.
The regexp variables `image-type-header-regexps' and
`image-type-file-name-regexps' from lisp/image.el are converted into precompiled
regexp objects, and any user code expecting those to be strings will now error.

I had to re-run autogen.sh to avoid segfaulting upon bootstrap after modifying
lisp.h (re-running ./configure alone didn't work). I suspect everyone else is
well aware of the ramifications of editing lisp.h enums, but wanted to make
sure that was clear.

I have tried to extend existing idioms where obvious, and split off helper
methods to improve readability. I am very open to any style improvements
as well as architectural changes.
---
 etc/emacs_lldb.py |   1 +
 lisp/image.el     |  59 +++++++-------
 src/Makefile.in   |   3 +-
 src/alloc.c       |  55 +++++++++++++
 src/data.c        |  12 +--
 src/emacs.c       |   1 +
 src/lisp.h        |  33 ++++++++
 src/pdumper.c     | 123 +++++++++++++++++++++++++---
 src/print.c       |  25 ++++++
 src/regex-emacs.c |  71 +++++++++++++++++
 src/regex-emacs.h |   3 +
 src/search.c      | 198 ++++++++++++++++++++++++++++++++--------------
 12 files changed, 477 insertions(+), 107 deletions(-)

diff --git a/etc/emacs_lldb.py b/etc/emacs_lldb.py
index ba80d3431f3..2a1fd238b17 100644
--- a/etc/emacs_lldb.py
+++ b/etc/emacs_lldb.py
@@ -69,6 +69,7 @@ class Lisp_Object:
         "PVEC_MODULE_FUNCTION": "struct Lisp_Module_Function",
         "PVEC_NATIVE_COMP_UNIT": "struct Lisp_Native_Comp_Unit",
         "PVEC_SQLITE": "struct Lisp_Sqlite",
+        "PVEC_REGEXP": "struct Lisp_Regexp",
         "PVEC_COMPILED": "struct Lisp_Vector",
         "PVEC_CHAR_TABLE": "struct Lisp_Vector",
         "PVEC_SUB_CHAR_TABLE": "void",
diff --git a/lisp/image.el b/lisp/image.el
index 3d60b485c6b..5c72181b94c 100644
--- a/lisp/image.el
+++ b/lisp/image.el
@@ -36,30 +36,31 @@ image
                   (&optional filter animation-cache))
 
 (defconst image-type-header-regexps
-  `(("\\`/[\t\n\r ]*\\*.*XPM.\\*/" . xpm)
-    ("\\`P[1-6]\\(?:\
+  `((,(make-regexp "\\`/[\t\n\r ]*\\*.*XPM.\\*/") . xpm)
+    (,(make-regexp "\\`P[1-6]\\(?:\
 \\(?:\\(?:#[^\r\n]*[\r\n]\\)*[ \t\r\n]\\)+\
 \\(?:\\(?:#[^\r\n]*[\r\n]\\)*[0-9]\\)+\
-\\)\\{2\\}" . pbm)
-    ("\\`GIF8[79]a" . gif)
-    ("\\`\x89PNG\r\n\x1a\n" . png)
-    ("\\`[\t\n\r ]*#define \\([a-z0-9_]+\\)_width [0-9]+\n\
+\\)\\{2\\}") . pbm)
+    (,(make-regexp "\\`GIF8[79]a") . gif)
+    (,(make-regexp "\\`\x89PNG\r\n\x1a\n") . png)
+    (,(make-regexp "\\`[\t\n\r ]*#define \\([a-z0-9_]+\\)_width [0-9]+\n\
 #define \\1_height [0-9]+\n\\(\
 #define \\1_x_hot [0-9]+\n\
 #define \\1_y_hot [0-9]+\n\\)?\
-static \\(unsigned \\)?char \\1_bits" . xbm)
-    ("\\`\\(?:MM\0\\*\\|II\\*\0\\)" . tiff)
-    ("\\`[\t\n\r ]*%!PS" . postscript)
-    ("\\`\xff\xd8" . jpeg)    ; used to be (image-jpeg-p . jpeg)
-    ("\\`RIFF[^z-a][^z-a][^z-a][^z-a]WEBPVP8" . webp)
+static \\(unsigned \\)?char \\1_bits") . xbm)
+    (,(make-regexp "\\`\\(?:MM\0\\*\\|II\\*\0\\)") . tiff)
+    (,(make-regexp "\\`[\t\n\r ]*%!PS") . postscript)
+    (,(make-regexp "\\`\xff\xd8") . jpeg)    ; used to be (image-jpeg-p . jpeg)
+    (,(make-regexp "\\`RIFF[^z-a][^z-a][^z-a][^z-a]WEBPVP8") . webp)
     (,(let* ((incomment-re "\\(?:[^-]\\|-[^-]\\)")
-	     (comment-re (concat "\\(?:!--" incomment-re "*-->[ \t\r\n]*<\\)")))
-	(concat "\\(?:<\\?xml[ \t\r\n]+[^>]*>\\)?[ \t\r\n]*<"
-		comment-re "*"
-		"\\(?:!DOCTYPE[ \t\r\n]+[^>]*>[ \t\r\n]*<[ \t\r\n]*" comment-re "*\\)?"
-		"[Ss][Vv][Gg]"))
+             (comment-re (concat "\\(?:!--" incomment-re "*-->[ \t\r\n]*<\\)")))
+        (make-regexp
+         (concat "\\(?:<\\?xml[ \t\r\n]+[^>]*>\\)?[ \t\r\n]*<"
+                 comment-re "*"
+                 "\\(?:!DOCTYPE[ \t\r\n]+[^>]*>[ \t\r\n]*<[ \t\r\n]*" comment-re "*\\)?"
+                 "[Ss][Vv][Gg]")))
      . svg)
-    ("\\`....ftyp\\(heic\\|heix\\|hevc\\|heim\\|heis\\|hevm\\|hevs\\|mif1\\|msf1\\)" . heic))
+    (,(make-regexp "\\`....ftyp\\(heic\\|heix\\|hevc\\|heim\\|heis\\|hevm\\|hevs\\|mif1\\|msf1\\)") . heic))
   "Alist of (REGEXP . IMAGE-TYPE) pairs used to auto-detect image types.
 When the first bytes of an image file match REGEXP, it is assumed to
 be of image type IMAGE-TYPE if IMAGE-TYPE is a symbol.  If not a symbol,
@@ -68,18 +69,18 @@ image-type-header-regexps
 a non-nil value, TYPE is the image's type.")
 
 (defvar image-type-file-name-regexps
-  '(("\\.png\\'" . png)
-    ("\\.gif\\'" . gif)
-    ("\\.jpe?g\\'" . jpeg)
-    ("\\.webp\\'" . webp)
-    ("\\.bmp\\'" . bmp)
-    ("\\.xpm\\'" . xpm)
-    ("\\.pbm\\'" . pbm)
-    ("\\.xbm\\'" . xbm)
-    ("\\.ps\\'" . postscript)
-    ("\\.tiff?\\'" . tiff)
-    ("\\.svgz?\\'" . svg)
-    ("\\.hei[cf]s?\\'" . heic))
+  `((,(make-regexp "\\.png\\'") . png)
+    (,(make-regexp "\\.gif\\'") . gif)
+    (,(make-regexp "\\.jpe?g\\'") . jpeg)
+    (,(make-regexp "\\.webp\\'") . webp)
+    (,(make-regexp "\\.bmp\\'") . bmp)
+    (,(make-regexp "\\.xpm\\'") . xpm)
+    (,(make-regexp "\\.pbm\\'") . pbm)
+    (,(make-regexp "\\.xbm\\'") . xbm)
+    (,(make-regexp "\\.ps\\'") . postscript)
+    (,(make-regexp "\\.tiff?\\'") . tiff)
+    (,(make-regexp "\\.svgz?\\'") . svg)
+    (,(make-regexp "\\.hei[cf]s?\\'") . heic))
   "Alist of (REGEXP . IMAGE-TYPE) pairs used to identify image files.
 When the name of an image file match REGEXP, it is assumed to
 be of image type IMAGE-TYPE.")
diff --git a/src/Makefile.in b/src/Makefile.in
index c278924ef94..e6ce159c051 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -537,7 +537,8 @@ all:
 .PHONY: all
 
 dmpstruct_headers=$(srcdir)/lisp.h $(srcdir)/buffer.h $(srcdir)/itree.h \
-	$(srcdir)/intervals.h $(srcdir)/charset.h $(srcdir)/bignum.h
+	$(srcdir)/intervals.h $(srcdir)/charset.h $(srcdir)/bignum.h \
+	$(srcdir)/regex-emacs.h
 ifeq ($(CHECK_STRUCTS),true)
 pdumper.o: dmpstruct.h
 endif
diff --git a/src/alloc.c b/src/alloc.c
index 48b170b866f..856393b54df 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -3467,6 +3467,16 @@ cleanup_vector (struct Lisp_Vector *vector)
 	  }
       }
       break;
+    case PVEC_REGEXP:
+      {
+	struct Lisp_Regexp *r = PSEUDOVEC_STRUCT (vector, Lisp_Regexp);
+	eassert (r->buffer->allocated > 0);
+	eassert (r->buffer->used <= r->buffer->allocated);
+	xfree (r->buffer->buffer);
+	xfree (r->buffer->fastmap);
+	xfree (r->buffer);
+      }
+      break;
     case PVEC_OBARRAY:
       {
 	struct Lisp_Obarray *o = PSEUDOVEC_STRUCT (vector, Lisp_Obarray);
@@ -5881,6 +5891,36 @@ make_pure_c_string (const char *data, ptrdiff_t nchars)
 
 static Lisp_Object purecopy (Lisp_Object obj);
 
+static struct re_pattern_buffer *
+make_pure_re_pattern_buffer (const struct re_pattern_buffer *bufp)
+{
+  struct re_pattern_buffer *pure = pure_alloc (sizeof *pure, -1);
+  *pure = *bufp;
+
+  pure->buffer = pure_alloc (bufp->used, -1);
+  memcpy (pure->buffer, bufp->buffer, bufp->used);
+  pure->allocated = bufp->used;
+  pure->fastmap = pure_alloc (FASTMAP_SIZE, -1);
+  memcpy (pure->fastmap, bufp->fastmap, FASTMAP_SIZE);
+  pure->translate = purecopy (bufp->translate);
+
+  return pure;
+}
+
+static Lisp_Object
+make_pure_regexp (const struct Lisp_Regexp *r)
+{
+  struct Lisp_Regexp *pure = pure_alloc (sizeof *pure, Lisp_Vectorlike);
+  *pure = *r;
+
+  pure->pattern = purecopy (r->pattern);
+  pure->whitespace_regexp = purecopy (r->whitespace_regexp);
+  pure->syntax_table = purecopy (r->syntax_table);
+  pure->buffer = make_pure_re_pattern_buffer (r->buffer);
+
+  return make_lisp_ptr (pure, Lisp_Vectorlike);
+}
+
 /* Return a cons allocated from pure space.  Give it pure copies
    of CAR as car and CDR as cdr.  */
 
@@ -6038,6 +6078,8 @@ purecopy (Lisp_Object obj)
     obj = make_pure_string (SSDATA (obj), SCHARS (obj),
 			    SBYTES (obj),
 			    STRING_MULTIBYTE (obj));
+  else if (REGEXP_P (obj))
+    obj = make_pure_regexp(XREGEXP (obj));
   else if (HASH_TABLE_P (obj))
     {
       struct Lisp_Hash_Table *table = XHASH_TABLE (obj);
@@ -7347,6 +7389,19 @@ #define CHECK_ALLOCATED_AND_LIVE_SYMBOL()		((void) 0)
 		  break;
 		}
 
+	      case PVEC_REGEXP:
+		{
+		  ptrdiff_t size = ptr->header.size;
+		  if (size & PSEUDOVECTOR_FLAG)
+		    size &= PSEUDOVECTOR_SIZE_MASK;
+		  set_vector_marked (ptr);
+		  mark_stack_push_values (ptr->contents, size);
+
+		  struct Lisp_Regexp *r = (struct Lisp_Regexp *)ptr;
+		  mark_stack_push_value (r->buffer->translate);
+		  break;
+		}
+
 	      case PVEC_CHAR_TABLE:
 	      case PVEC_SUB_CHAR_TABLE:
 		mark_char_table (ptr, (enum pvec_type) pvectype);
diff --git a/src/data.c b/src/data.c
index d947d200870..d5fff6af6b9 100644
--- a/src/data.c
+++ b/src/data.c
@@ -286,11 +286,13 @@ DEFUN ("cl-type-of", Fcl_type_of, Scl_type_of, 1, 1, 0,
 	  return Qtreesit_node;
 	case PVEC_TS_COMPILED_QUERY:
 	  return Qtreesit_compiled_query;
-        case PVEC_SQLITE:
-          return Qsqlite;
-        case PVEC_SUB_CHAR_TABLE:
-          return Qsub_char_table;
-        /* "Impossible" cases.  */
+	case PVEC_SQLITE:
+	  return Qsqlite;
+	case PVEC_SUB_CHAR_TABLE:
+	  return Qsub_char_table;
+	case PVEC_REGEXP:
+	  return Qregexp;
+	/* "Impossible" cases.  */
 	case PVEC_MISC_PTR:
         case PVEC_OTHER:
         case PVEC_FREE: ;
diff --git a/src/emacs.c b/src/emacs.c
index 37c8b28fc2c..560843f9c56 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -2349,6 +2349,7 @@ main (int argc, char **argv)
       syms_of_window ();
       syms_of_xdisp ();
       syms_of_sqlite ();
+      syms_of_regexp ();
       syms_of_font ();
 #ifdef HAVE_WINDOW_SYSTEM
       syms_of_fringe ();
diff --git a/src/lisp.h b/src/lisp.h
index 8ac65ca429c..c277c58d8b6 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -1048,6 +1048,7 @@ DEFINE_GDB_SYMBOL_END (PSEUDOVECTOR_FLAG)
   PVEC_TS_NODE,
   PVEC_TS_COMPILED_QUERY,
   PVEC_SQLITE,
+  PVEC_REGEXP,
 
   /* These should be last, for internal_equal and sxhash_obj.  */
   PVEC_CLOSURE,
@@ -2706,6 +2707,25 @@ XHASH_TABLE (Lisp_Object a)
   return XUNTAG (a, Lisp_Vectorlike, struct Lisp_Hash_Table);
 }
 
+INLINE bool
+REGEXP_P (Lisp_Object a)
+{
+  return PSEUDOVECTORP (a, PVEC_REGEXP);
+}
+
+INLINE struct Lisp_Regexp *
+XREGEXP (Lisp_Object a)
+{
+  eassert (REGEXP_P (a));
+  return XUNTAG (a, Lisp_Vectorlike, struct Lisp_Regexp);
+}
+
+INLINE void
+CHECK_REGEXP (Lisp_Object x)
+{
+  CHECK_TYPE (REGEXP_P (x), Qregexpp, x);
+}
+
 INLINE Lisp_Object
 make_lisp_hash_table (struct Lisp_Hash_Table *h)
 {
@@ -2955,6 +2975,16 @@ xmint_pointer (Lisp_Object a)
   bool is_statement;
 } GCALIGNED_STRUCT;
 
+struct Lisp_Regexp
+{
+  union vectorlike_header header;
+  Lisp_Object pattern;
+  Lisp_Object whitespace_regexp;
+  Lisp_Object syntax_table;
+  bool posix;
+  struct re_pattern_buffer *buffer;
+} GCALIGNED_STRUCT;
+
 struct Lisp_User_Ptr
 {
   union vectorlike_header header;
@@ -4470,6 +4500,9 @@ verify (FLT_RADIX == 2 || FLT_RADIX == 16);
 /* Defined in sqlite.c.  */
 extern void syms_of_sqlite (void);
 
+/* Defined in regex-emacs.c. */
+extern void syms_of_regexp (void);
+
 /* Defined in xsettings.c.  */
 extern void syms_of_xsettings (void);
 
diff --git a/src/pdumper.c b/src/pdumper.c
index 53bddf91f04..ae7fafc7d1c 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -1528,6 +1528,36 @@ dump_emacs_reloc_copy_from_dump (struct dump_context *ctx, dump_off dump_offset,
                     dump_off_to_lisp (size)));
 }
 
+static void
+dump_remember_fixup_ptr_raw (struct dump_context *ctx,
+			     dump_off dump_offset,
+			     dump_off new_dump_offset);
+
+/* Add a byte range relocation that copies arbitrary bytes from the dump.
+
+   When the dump is loaded, Emacs copies SIZE bytes into DUMP_OFFSET from
+   dump. Dump bytes are loaded from SOURCE. */
+static void
+dump_cold_bytes (struct dump_context *ctx, dump_off dump_offset,
+		 void* source, dump_off size)
+{
+  eassert (size >= 0);
+  eassert (size < (1 << EMACS_RELOC_LENGTH_BITS));
+
+  if (!ctx->flags.dump_object_contents)
+    return;
+
+  if (size == 0)
+    {
+      eassert (source == NULL);
+      return;
+    }
+  eassert (source != NULL);
+
+  dump_remember_fixup_ptr_raw (ctx, dump_offset, ctx->offset);
+  dump_write (ctx, source, size);
+}
+
 /* Add an Emacs relocation that sets values to arbitrary bytes.
 
    When the dump is loaded, Emacs copies SIZE bytes from the
@@ -2125,6 +2155,74 @@ dump_marker (struct dump_context *ctx, const struct Lisp_Marker *marker)
   return finish_dump_pvec (ctx, &out->header);
 }
 
+static dump_off
+dump_re_pattern_buffer (struct dump_context *ctx, struct re_pattern_buffer *bufp)
+{
+#if CHECK_STRUCTS && !defined (HASH_re_pattern_buffer_36714DF24A)
+# error "re_pattern_buffer changed. See CHECK_STRUCTS comment in config.h."
+#endif
+  struct re_pattern_buffer out;
+  dump_object_start (ctx, &out, sizeof (out));
+  if (bufp->buffer)
+    dump_field_fixup_later (ctx, &out, bufp, &bufp->buffer);
+  DUMP_FIELD_COPY (&out, bufp, allocated);
+  DUMP_FIELD_COPY (&out, bufp, used);
+  DUMP_FIELD_COPY (&out, bufp, charset_unibyte);
+  if (bufp->fastmap)
+    dump_field_fixup_later (ctx, &out, bufp, &bufp->fastmap);
+  dump_field_lv (ctx, &out, bufp, &bufp->translate, WEIGHT_NORMAL);
+  DUMP_FIELD_COPY (&out, bufp, re_nsub);
+  DUMP_FIELD_COPY (&out, bufp, can_be_null);
+  DUMP_FIELD_COPY (&out, bufp, regs_allocated);
+  DUMP_FIELD_COPY (&out, bufp, fastmap_accurate);
+  DUMP_FIELD_COPY (&out, bufp, used_syntax);
+  DUMP_FIELD_COPY (&out, bufp, multibyte);
+  DUMP_FIELD_COPY (&out, bufp, target_multibyte);
+  dump_off offset = dump_object_finish (ctx, &out, sizeof (out));
+  if (bufp->buffer)
+    {
+      eassert (bufp->allocated > 0);
+      if (bufp->allocated > DUMP_OFF_MAX - 1)
+	error ("regex pattern buffer too large");
+      dump_off total_size = ptrdiff_t_to_dump_off (bufp->allocated);
+      eassert (total_size > 0);
+      dump_cold_bytes
+	(ctx,
+	 offset + dump_offsetof (struct re_pattern_buffer, buffer),
+	 bufp->buffer,
+	 total_size);
+    }
+  if (bufp->fastmap)
+    {
+      eassert (FASTMAP_SIZE <= DUMP_OFF_MAX - 1);
+      dump_off fastmap_size = FASTMAP_SIZE;
+      dump_cold_bytes
+	(ctx,
+	 offset + dump_offsetof (struct re_pattern_buffer, fastmap),
+	 bufp->fastmap,
+	 fastmap_size);
+    }
+  return offset;
+}
+
+static dump_off
+dump_regexp (struct dump_context *ctx, const struct Lisp_Regexp *regexp)
+{
+#if CHECK_STRUCTS && !defined (HASH_Lisp_Regexp_A3381A7F05)
+# error "Lisp_Regexp changed. See CHECK_STRUCTS comment in config.h."
+#endif
+  START_DUMP_PVEC (ctx, &regexp->header, struct Lisp_Regexp, out);
+  dump_pseudovector_lisp_fields (ctx, &out->header, &regexp->header);
+  DUMP_FIELD_COPY (out, regexp, posix);
+  dump_field_fixup_later (ctx, &out, regexp, &regexp->buffer);
+  dump_off offset = finish_dump_pvec (ctx, &out->header);
+  dump_remember_fixup_ptr_raw
+    (ctx,
+     offset + dump_offsetof (struct Lisp_Regexp, buffer),
+     dump_re_pattern_buffer (ctx, regexp->buffer));
+  return offset;
+}
+
 static dump_off
 dump_interval_node (struct dump_context *ctx, struct itree_node *node)
 {
@@ -2135,6 +2233,7 @@ dump_interval_node (struct dump_context *ctx, struct itree_node *node)
   dump_object_start (ctx, &out, sizeof (out));
   if (node->parent)
     dump_field_fixup_later (ctx, &out, node, &node->parent);
+  /* FIXME: should these both be &node->{left,right} instead of &node->parent? */
   if (node->left)
     dump_field_fixup_later (ctx, &out, node, &node->parent);
   if (node->right)
@@ -3065,7 +3164,7 @@ dump_vectorlike (struct dump_context *ctx,
                  Lisp_Object lv,
                  dump_off offset)
 {
-#if CHECK_STRUCTS && !defined HASH_pvec_type_99104541E2
+#if CHECK_STRUCTS && !defined HASH_pvec_type_E2BA8CE9B4
 # error "pvec_type changed. See CHECK_STRUCTS comment in config.h."
 #endif
   const struct Lisp_Vector *v = XVECTOR (lv);
@@ -3127,6 +3226,8 @@ dump_vectorlike (struct dump_context *ctx,
 #ifdef HAVE_TREE_SITTER
       return dump_treesit_compiled_query (ctx, XTS_COMPILED_QUERY (lv));
 #endif
+    case PVEC_REGEXP:
+      return dump_regexp (ctx, XREGEXP (lv));
     case PVEC_WINDOW_CONFIGURATION:
     case PVEC_OTHER:
     case PVEC_XWIDGET:
@@ -3462,11 +3563,11 @@ dump_cold_string (struct dump_context *ctx, Lisp_Object string)
     error ("string too large");
   dump_off total_size = ptrdiff_t_to_dump_off (SBYTES (string) + 1);
   eassert (total_size > 0);
-  dump_remember_fixup_ptr_raw
+  dump_cold_bytes
     (ctx,
      string_offset + dump_offsetof (struct Lisp_String, u.s.data),
-     ctx->offset);
-  dump_write (ctx, XSTRING (string)->u.s.data, total_size);
+     XSTRING (string)->u.s.data,
+     total_size);
 }
 
 static void
@@ -3475,12 +3576,12 @@ dump_cold_charset (struct dump_context *ctx, Lisp_Object data)
   /* Dump charset lookup tables.  */
   int cs_i = XFIXNUM (XCAR (data));
   dump_off cs_dump_offset = dump_off_from_lisp (XCDR (data));
-  dump_remember_fixup_ptr_raw
+  struct charset *cs = charset_table + cs_i;
+  dump_cold_bytes
     (ctx,
      cs_dump_offset + dump_offsetof (struct charset, code_space_mask),
-     ctx->offset);
-  struct charset *cs = charset_table + cs_i;
-  dump_write (ctx, cs->code_space_mask, 256);
+     cs->code_space_mask,
+     256);
 }
 
 static void
@@ -3501,11 +3602,11 @@ dump_cold_buffer (struct dump_context *ctx, Lisp_Object data)
     + 1;
   if (nbytes > DUMP_OFF_MAX)
     error ("buffer too large");
-  dump_remember_fixup_ptr_raw
+  dump_cold_bytes
     (ctx,
      buffer_offset + dump_offsetof (struct buffer, own_text.beg),
-     ctx->offset);
-  dump_write (ctx, b->own_text.beg, ptrdiff_t_to_dump_off (nbytes));
+     b->own_text.beg,
+     ptrdiff_t_to_dump_off (nbytes));
 }
 
 static void
diff --git a/src/print.c b/src/print.c
index 8f28b14e8b6..011b02d316f 100644
--- a/src/print.c
+++ b/src/print.c
@@ -2084,6 +2084,31 @@ print_vectorlike_unreadable (Lisp_Object obj, Lisp_Object printcharfun,
       }
       return;
 
+    case PVEC_REGEXP:
+      {
+	struct Lisp_Regexp *r = XREGEXP (obj);
+	print_c_string ("#<regexp pattern=", printcharfun);
+	print_object (r->pattern, printcharfun, escapeflag);
+	int i = sprintf (buf, " nsub=%ld", r->buffer->re_nsub);
+	strout (buf, i, i, printcharfun);
+	print_c_string (" translate=", printcharfun);
+	print_object (r->buffer->translate, printcharfun, escapeflag);
+	print_c_string (" whitespace=", printcharfun);
+	print_object (r->whitespace_regexp, printcharfun, escapeflag);
+	print_c_string (" syntax_table=", printcharfun);
+	print_object (r->syntax_table, printcharfun, escapeflag);
+	if (r->posix)
+	  {
+	    print_c_string (" posix=true", printcharfun);
+	  }
+	else
+	  {
+	    print_c_string (" posix=false", printcharfun);
+	  }
+	print_c_string (">", printcharfun);
+      }
+      return;
+
     case PVEC_OBARRAY:
       {
 	struct Lisp_Obarray *o = XOBARRAY (obj);
diff --git a/src/regex-emacs.c b/src/regex-emacs.c
index 92dbdbecbf1..113a5dd9ed1 100644
--- a/src/regex-emacs.c
+++ b/src/regex-emacs.c
@@ -32,6 +32,7 @@
 #include "character.h"
 #include "buffer.h"
 #include "syntax.h"
+#include "charset.h"
 #include "category.h"
 #include "dispextern.h"
 
@@ -468,6 +469,8 @@ print_fastmap (FILE *dest, char *fastmap)
   bool was_a_range = false;
   int i = 0;
 
+  /* FIXME: unify "1 << BYTEWIDTH" and "FASTMAP_SIZE" defines (both are
+     the same value = 256). */
   while (i < (1 << BYTEWIDTH))
     {
       if (fastmap[i++])
@@ -5353,3 +5356,71 @@ re_compile_pattern (const char *pattern, ptrdiff_t length,
     return NULL;
   return re_error_msgid[ret];
 }
+
+DEFUN ("make-regexp", Fmake_regexp, Smake_regexp, 1, 3, 0,
+       doc: /* Compile a regexp object from string PATTERN.
+
+POSIX is non-nil if we want full backtracking (POSIX style) for
+this pattern.
+TRANSLATE is a translation table for ignoring case, or t to look up from
+the current buffer if `case-fold-search' is on, or nil for none. */)
+  (Lisp_Object pattern, Lisp_Object posix, Lisp_Object translate)
+{
+  const char *whitespace_regexp = STRINGP (Vsearch_spaces_regexp) ?
+    SSDATA (Vsearch_spaces_regexp) : NULL;
+  char *val;
+  bool is_posix = !NILP (posix);
+  struct Lisp_Regexp *p;
+  struct re_pattern_buffer *bufp = NULL;
+
+  if (EQ(translate, Qt))
+    translate = (!NILP (Vcase_fold_search)
+		 ? BVAR (current_buffer, case_canon_table)
+		 : Qnil);
+
+  CHECK_STRING (pattern);
+
+  bufp = xzalloc (sizeof (*bufp));
+
+  bufp->fastmap = xzalloc (FASTMAP_SIZE);
+  bufp->translate = translate;
+  bufp->multibyte = STRING_MULTIBYTE (pattern);
+  bufp->charset_unibyte = charset_unibyte;
+
+  val = (char *) re_compile_pattern (SSDATA (pattern), SBYTES (pattern),
+				     is_posix, whitespace_regexp, bufp);
+
+  if (val)
+    {
+      xfree (bufp->buffer);
+      xfree (bufp->fastmap);
+      xfree (bufp);
+      xsignal1 (Qinvalid_regexp, build_string (val));
+    }
+
+  p = ALLOCATE_PSEUDOVECTOR (struct Lisp_Regexp, syntax_table,
+			     PVEC_REGEXP);
+  p->pattern = pattern;
+  p->whitespace_regexp = Vsearch_spaces_regexp;
+  /* If the compiled pattern hard codes some of the contents of the
+     syntax-table, it can only be reused with *this* syntax table.  */
+  p->syntax_table = bufp->used_syntax ? BVAR (current_buffer, syntax_table) : Qt;
+  p->posix = is_posix;
+  p->buffer = bufp;
+  return make_lisp_ptr (p, Lisp_Vectorlike);
+}
+
+DEFUN ("regexpp", Fregexpp, Sregexpp, 1, 1, 0,
+       doc: /* Say whether OBJECT is a compiled regexp object. */)
+  (Lisp_Object object)
+{
+  return REGEXP_P (object)? Qt: Qnil;
+}
+
+void syms_of_regexp (void)
+{
+  defsubr (&Smake_regexp);
+  defsubr (&Sregexpp);
+  DEFSYM (Qregexp, "regexp");
+  DEFSYM (Qregexpp, "regexpp");
+}
diff --git a/src/regex-emacs.h b/src/regex-emacs.h
index 2402e539e64..4bcbf1f29f9 100644
--- a/src/regex-emacs.h
+++ b/src/regex-emacs.h
@@ -52,6 +52,9 @@ #define EMACS_REGEX_H 1
 /* Roughly the maximum number of failure points on the stack.  */
 extern ptrdiff_t emacs_re_max_failures;
 
+/* The size of an allocation for a fastmap. */
+#define FASTMAP_SIZE 0400
+
 /* Amount of memory that we can safely stack allocate.  */
 extern ptrdiff_t emacs_re_safe_alloca;
 \f
diff --git a/src/search.c b/src/search.c
index 2ff8b0599c4..5710ff30005 100644
--- a/src/search.c
+++ b/src/search.c
@@ -50,7 +50,7 @@ #define REGEXP_CACHE_SIZE 20
      for any syntax-table.  */
   Lisp_Object syntax_table;
   struct re_pattern_buffer buf;
-  char fastmap[0400];
+  char fastmap[FASTMAP_SIZE];
   /* True means regexp was compiled to do full POSIX backtracking.  */
   bool posix;
   /* True means we're inside a buffer match.  */
@@ -181,6 +181,8 @@ freeze_pattern (struct regexp_cache *searchbuf)
 {
   eassert (!searchbuf->busy);
   record_unwind_protect_ptr (unfreeze_pattern, searchbuf);
+  eassert (searchbuf != NULL);
+  assume (searchbuf);
   searchbuf->busy = true;
 }
 
@@ -261,11 +263,13 @@ compile_pattern (Lisp_Object pattern, struct re_registers *regp,
 
 \f
 static Lisp_Object
-looking_at_1 (Lisp_Object string, bool posix, bool modify_data)
+looking_at_1 (Lisp_Object regexp, bool posix, bool modify_data)
 {
   Lisp_Object val;
   unsigned char *p1, *p2;
   ptrdiff_t s1, s2;
+  struct re_pattern_buffer *bufp = NULL;
+  struct regexp_cache *cache_entry = NULL;
   register ptrdiff_t i;
 
   if (running_asynch_code)
@@ -276,18 +280,22 @@ looking_at_1 (Lisp_Object string, bool posix, bool modify_data)
   set_char_table_extras (BVAR (current_buffer, case_canon_table), 2,
 			 BVAR (current_buffer, case_eqv_table));
 
-  CHECK_STRING (string);
+  if (REGEXP_P (regexp))
+    bufp = XREGEXP (regexp)->buffer;
+  else
+    CHECK_STRING (regexp);
 
   /* Snapshot in case Lisp changes the value.  */
   bool modify_match_data = NILP (Vinhibit_changing_match_data) && modify_data;
 
-  struct regexp_cache *cache_entry = compile_pattern (
-    string,
-    modify_match_data ? &search_regs : NULL,
-    (!NILP (Vcase_fold_search)
-     ? BVAR (current_buffer, case_canon_table) : Qnil),
-    posix,
-    !NILP (BVAR (current_buffer, enable_multibyte_characters)));
+  if (!bufp)
+      cache_entry = compile_pattern (
+	regexp,
+	modify_match_data ? &search_regs : NULL,
+	(!NILP (Vcase_fold_search)
+	 ? BVAR (current_buffer, case_canon_table) : Qnil),
+	posix,
+	!NILP (BVAR (current_buffer, enable_multibyte_characters)));
 
   /* Do a pending quit right away, to avoid paradoxical behavior */
   maybe_quit ();
@@ -313,9 +321,16 @@ looking_at_1 (Lisp_Object string, bool posix, bool modify_data)
 
   specpdl_ref count = SPECPDL_INDEX ();
   freeze_buffer_relocation ();
-  freeze_pattern (cache_entry);
+
+  if (!bufp)
+    {
+      eassert (cache_entry != NULL);
+      freeze_pattern (cache_entry);
+      bufp = &cache_entry->buf;
+    }
+
   re_match_object = Qnil;
-  i = re_match_2 (&cache_entry->buf, (char *) p1, s1, (char *) p2, s2,
+  i = re_match_2 (bufp, (char *) p1, s1, (char *) p2, s2,
 		  PT_BYTE - BEGV_BYTE,
 		  modify_match_data ? &search_regs : NULL,
 		  ZV_BYTE - BEGV_BYTE);
@@ -374,11 +389,16 @@ string_match_1 (Lisp_Object regexp, Lisp_Object string, Lisp_Object start,
   EMACS_INT pos;
   ptrdiff_t pos_byte, i;
   bool modify_match_data = NILP (Vinhibit_changing_match_data) && modify_data;
+  struct re_pattern_buffer* bufp = NULL;
 
   if (running_asynch_code)
     save_search_regs ();
 
-  CHECK_STRING (regexp);
+  if (REGEXP_P (regexp))
+    bufp = XREGEXP (regexp)->buffer;
+  else
+    CHECK_STRING (regexp);
+
   CHECK_STRING (string);
 
   if (NILP (start))
@@ -402,17 +422,23 @@ string_match_1 (Lisp_Object regexp, Lisp_Object string, Lisp_Object start,
 			 BVAR (current_buffer, case_eqv_table));
 
   specpdl_ref count = SPECPDL_INDEX ();
-  struct regexp_cache *cache_entry
-    = compile_pattern (regexp,
-		       modify_match_data ? &search_regs : NULL,
-		       (!NILP (Vcase_fold_search)
-			? BVAR (current_buffer, case_canon_table)
-			: Qnil),
-		       posix,
-		       STRING_MULTIBYTE (string));
-  freeze_pattern (cache_entry);
+
+  if (!bufp)
+    {
+      struct regexp_cache *cache_entry
+	= compile_pattern (regexp,
+			   modify_match_data ? &search_regs : NULL,
+			   (!NILP (Vcase_fold_search)
+			    ? BVAR (current_buffer, case_canon_table)
+			    : Qnil),
+			   posix,
+			   STRING_MULTIBYTE (string));
+      freeze_pattern (cache_entry);
+      bufp = &cache_entry->buf;
+    }
+
   re_match_object = string;
-  val = re_search (&cache_entry->buf, SSDATA (string),
+  val = re_search (bufp, SSDATA (string),
 		   SBYTES (string), pos_byte,
 		   SBYTES (string) - pos_byte,
 		   (modify_match_data ? &search_regs : NULL));
@@ -485,12 +511,21 @@ DEFUN ("posix-string-match", Fposix_string_match, Sposix_string_match, 2, 4, 0,
 fast_string_match_internal (Lisp_Object regexp, Lisp_Object string,
 			    Lisp_Object table)
 {
+  struct re_pattern_buffer* bufp;
   re_match_object = string;
   specpdl_ref count = SPECPDL_INDEX ();
-  struct regexp_cache *cache_entry
-    = compile_pattern (regexp, 0, table, 0, STRING_MULTIBYTE (string));
-  freeze_pattern (cache_entry);
-  ptrdiff_t val = re_search (&cache_entry->buf, SSDATA (string),
+
+  if (REGEXP_P (regexp))
+    bufp = XREGEXP (regexp)->buffer;
+  else
+    {
+      struct regexp_cache *cache_entry
+	= compile_pattern (regexp, 0, table, 0, STRING_MULTIBYTE (string));
+      freeze_pattern (cache_entry);
+      bufp = &cache_entry->buf;
+    }
+
+  ptrdiff_t val = re_search (bufp, SSDATA (string),
 			     SBYTES (string), 0,
 			     SBYTES (string), 0);
   unbind_to (count, Qnil);
@@ -509,18 +544,27 @@ fast_c_string_match_internal (Lisp_Object regexp,
 			      const char *string, ptrdiff_t len,
 			      Lisp_Object table)
 {
-  /* FIXME: This is expensive and not obviously correct when it makes
-     a difference. I.e., no longer "fast", and may hide bugs.
-     Something should be done about this.  */
-  regexp = string_make_unibyte (regexp);
+  struct re_pattern_buffer *bufp;
   /* Record specpdl index because freeze_pattern pushes an
      unwind-protect on the specpdl.  */
   specpdl_ref count = SPECPDL_INDEX ();
-  struct regexp_cache *cache_entry
-    = compile_pattern (regexp, 0, table, 0, 0);
-  freeze_pattern (cache_entry);
+
+  if (REGEXP_P (regexp))
+    bufp = XREGEXP (regexp)->buffer;
+  else
+    {
+      /* FIXME: This is expensive and not obviously correct when it makes
+	 a difference. I.e., no longer "fast", and may hide bugs.
+	 Something should be done about this.  */
+      regexp = string_make_unibyte (regexp);
+      struct regexp_cache *cache_entry
+	= compile_pattern (regexp, 0, table, 0, 0);
+      freeze_pattern (cache_entry);
+      bufp = &cache_entry->buf;
+    }
+
   re_match_object = Qt;
-  ptrdiff_t val = re_search (&cache_entry->buf, string, len, 0, len, 0);
+  ptrdiff_t val = re_search (bufp, string, len, 0, len, 0);
   unbind_to (count, Qnil);
   return val;
 }
@@ -539,6 +583,8 @@ fast_looking_at (Lisp_Object regexp, ptrdiff_t pos, ptrdiff_t pos_byte,
   unsigned char *p1, *p2;
   ptrdiff_t s1, s2;
   ptrdiff_t len;
+  struct re_pattern_buffer *bufp = NULL;
+  struct regexp_cache *cache_entry = NULL;
 
   if (STRINGP (string))
     {
@@ -578,13 +624,24 @@ fast_looking_at (Lisp_Object regexp, ptrdiff_t pos, ptrdiff_t pos_byte,
       multibyte = ! NILP (BVAR (current_buffer, enable_multibyte_characters));
     }
 
-  struct regexp_cache *cache_entry =
-    compile_pattern (regexp, 0, Qnil, 0, multibyte);
+  if (REGEXP_P (regexp))
+    bufp = XREGEXP (regexp)->buffer;
+  else
+    cache_entry =
+      compile_pattern (regexp, 0, Qnil, 0, multibyte);
+
   specpdl_ref count = SPECPDL_INDEX ();
   freeze_buffer_relocation ();
-  freeze_pattern (cache_entry);
+
+  if (!bufp)
+    {
+      eassert (cache_entry != NULL);
+      freeze_pattern (cache_entry);
+      bufp = &cache_entry->buf;
+    }
+
   re_match_object = STRINGP (string) ? string : Qnil;
-  len = re_match_2 (&cache_entry->buf, (char *) p1, s1, (char *) p2, s2,
+  len = re_match_2 (bufp, (char *) p1, s1, (char *) p2, s2,
 		    pos_byte, NULL, limit_byte);
 
   unbind_to (count, Qnil);
@@ -1157,9 +1214,9 @@ while (0)
 static struct re_registers search_regs_1;
 
 static EMACS_INT
-search_buffer_re (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
-                  ptrdiff_t lim, ptrdiff_t lim_byte, EMACS_INT n,
-                  Lisp_Object trt, Lisp_Object inverse_trt, bool posix)
+search_buffer_re (Lisp_Object regexp, ptrdiff_t pos, ptrdiff_t pos_byte,
+		  ptrdiff_t lim, ptrdiff_t lim_byte, EMACS_INT n,
+		  Lisp_Object trt, Lisp_Object inverse_trt, bool posix)
 {
   unsigned char *p1, *p2;
   ptrdiff_t s1, s2;
@@ -1167,12 +1224,17 @@ search_buffer_re (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
   /* Snapshot in case Lisp changes the value.  */
   bool preserve_match_data = NILP (Vinhibit_changing_match_data);
 
-  struct regexp_cache *cache_entry =
-    compile_pattern (string,
-                     preserve_match_data ? &search_regs : &search_regs_1,
-                     trt, posix,
-                     !NILP (BVAR (current_buffer, enable_multibyte_characters)));
-  struct re_pattern_buffer *bufp = &cache_entry->buf;
+  struct re_pattern_buffer *bufp = NULL;
+  struct regexp_cache *cache_entry = NULL;
+
+  if (REGEXP_P (regexp))
+    bufp = XREGEXP (regexp)->buffer;
+  else
+    cache_entry =
+      compile_pattern (regexp,
+		       preserve_match_data ? &search_regs : &search_regs_1,
+		       trt, posix,
+		       !NILP (BVAR (current_buffer, enable_multibyte_characters)));
 
   maybe_quit ();		/* Do a pending quit right away,
 				   to avoid paradoxical behavior */
@@ -1197,7 +1259,13 @@ search_buffer_re (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
 
   specpdl_ref count = SPECPDL_INDEX ();
   freeze_buffer_relocation ();
-  freeze_pattern (cache_entry);
+
+  if (!bufp)
+    {
+      eassert (cache_entry != NULL);
+      freeze_pattern (cache_entry);
+      bufp = &cache_entry->buf;
+    }
 
   while (n < 0)
     {
@@ -3405,16 +3473,24 @@ DEFUN ("re--describe-compiled", Fre__describe_compiled, Sre__describe_compiled,
 If RAW is non-nil, just return the actual bytecode.  */)
   (Lisp_Object regexp, Lisp_Object raw)
 {
-  struct regexp_cache *cache_entry
-    = compile_pattern (regexp, NULL,
-                       (!NILP (Vcase_fold_search)
-                        ? BVAR (current_buffer, case_canon_table) : Qnil),
-                       false,
-                       !NILP (BVAR (current_buffer,
-                                    enable_multibyte_characters)));
+  struct re_pattern_buffer *bufp;
+
+  if (REGEXP_P (regexp))
+    bufp = XREGEXP (regexp)->buffer;
+  else
+    {
+      struct regexp_cache *cache_entry
+	= compile_pattern (regexp, NULL,
+			   (!NILP (Vcase_fold_search)
+			    ? BVAR (current_buffer, case_canon_table) : Qnil),
+			   false,
+			   !NILP (BVAR (current_buffer,
+					enable_multibyte_characters)));
+      bufp = &cache_entry->buf;
+    }
+
   if (!NILP (raw))
-    return make_unibyte_string ((char *) cache_entry->buf.buffer,
-                                cache_entry->buf.used);
+    return make_unibyte_string ((char *) bufp->buffer, bufp->used);
   else
     {                           /* FIXME: Why ENABLE_CHECKING?  */
 #if !defined ENABLE_CHECKING
@@ -3424,8 +3500,8 @@ DEFUN ("re--describe-compiled", Fre__describe_compiled, Sre__describe_compiled,
       size_t size = 0;
       FILE* f = open_memstream (&buffer, &size);
       if (!f)
-        report_file_error ("open_memstream failed", regexp);
-      print_compiled_pattern (f, &cache_entry->buf);
+	report_file_error ("open_memstream failed", regexp);
+      print_compiled_pattern (f, bufp);
       fclose (f);
       if (!buffer)
         return Qnil;
@@ -3433,7 +3509,7 @@ DEFUN ("re--describe-compiled", Fre__describe_compiled, Sre__describe_compiled,
       free (buffer);
       return description;
 #else /* ENABLE_CHECKING && !HAVE_OPEN_MEMSTREAM */
-      print_compiled_pattern (stderr, &cache_entry->buf);
+      print_compiled_pattern (stderr, bufp);
       return build_string ("Description was sent to standard error");
 #endif /* !ENABLE_CHECKING */
     }




^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-08-14  1:32 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30  5:08 [PATCH] add compiled regexp primitive lisp object Danny McClanahan
2024-07-30 13:02 ` Philip Kaludercic
2024-07-31 22:33   ` dmcc2
2024-08-01  1:04 ` Pip Cet
2024-08-04 23:38   ` Danny McClanahan
2024-08-05  3:47     ` dmcc2
2024-08-05  4:39   ` Danny McClanahan
2024-08-05  7:15     ` Danny McClanahan
2024-08-05 17:55     ` Pip Cet
2024-08-06 15:15       ` Danny McClanahan
2024-08-06 15:57         ` Eli Zaretskii
2024-08-07  4:28           ` Danny McClanahan
2024-08-06 18:18         ` Pip Cet
2024-08-06 18:38           ` Eli Zaretskii
2024-08-07  4:23             ` Danny McClanahan
2024-08-07 12:00               ` Eli Zaretskii
2024-08-07 12:43                 ` Helmut Eller
2024-08-07 13:40                   ` Augusto Stoffel
2024-08-07 15:23                     ` Danny McClanahan
2024-08-14  1:32                     ` Stefan Monnier
2024-08-07 15:02                 ` Danny McClanahan
2024-08-07 15:23                   ` Eli Zaretskii
2024-08-14  1:25               ` Stefan Monnier
2024-08-07  7:59           ` Danny McClanahan
2024-08-06 12:08     ` Eli Zaretskii
2024-08-01  8:30 ` Andrea Corallo
2024-08-01 10:06   ` Gerd Möllmann
2024-08-06 13:47   ` Danny McClanahan
2024-08-06 13:57     ` Danny McClanahan
2024-08-07  7:21     ` Andrea Corallo
2024-08-07  8:27       ` Danny McClanahan

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).