unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattias.engdegard@gmail.com>
To: Ihor Radchenko <yantar92@posteo.net>
Cc: 63225@debbugs.gnu.org
Subject: bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c)
Date: Sun, 7 May 2023 14:45:36 +0200	[thread overview]
Message-ID: <BFB9B4B3-0B55-4D1F-897B-7C7B66173B8D@gmail.com> (raw)
In-Reply-To: <EFFDF31B-2B58-44E8-9B05-6039A98331D3@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 591 bytes --]

5 maj 2023 kl. 18.26 skrev Mattias Engdegård <mattias.engdegard@gmail.com>:

> Stupid printf-debugging actually, nothing fancier than that.

Here is some of that stupidity I promised. You probably want to use it with

  (set-regexp-trace-file "re.log")
  (unwind-protect
      (do-something-interesting)
    (set-regexp-trace-file nil))

so that you don't trace more than necessary. The file may become large, but it's useful data for off-line analysis, scripted or just looking at it in an editor.
The first letter of each line indicates a regexp cache hit (H) or miss (M).


[-- Attachment #2: 0003-Regexp-tracing-add-set-regexp-trace-file.patch --]
[-- Type: application/octet-stream, Size: 3353 bytes --]

From cd66a560a74d2ed94202cab278455544f0c9337c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Sun, 7 May 2023 14:05:31 +0200
Subject: [PATCH 3/3] Regexp tracing: add set-regexp-trace-file

---
 src/search.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/src/search.c b/src/search.c
index c454d5e1ca9..b378db152a2 100644
--- a/src/search.c
+++ b/src/search.c
@@ -34,6 +34,10 @@ Copyright (C) 1985-1987, 1993-1994, 1997-1999, 2001-2023 Free Software
 
 #include "regex-emacs.h"
 
+#include <stdio.h>
+
+static FILE *regexp_trace_file = NULL;
+
 #define DEFAULT_REGEXP_CACHE_SIZE 20
 
 /* If the regexp is non-nil, then the buffer contains the compiled form
@@ -200,6 +204,7 @@ compile_pattern (Lisp_Object pattern, struct re_registers *regp,
 {
   struct regexp_cache *cp, **cpp, **lru_nonbusy;
 
+  bool cache_hit = false;
   for (cpp = &searchbuf_head, lru_nonbusy = NULL; ; cpp = &cp->next)
     {
       cp = *cpp;
@@ -224,6 +229,7 @@ compile_pattern (Lisp_Object pattern, struct re_registers *regp,
 	  && cp->buf.charset_unibyte == charset_unibyte)
         {
           regexp_cache_hit++;
+	  cache_hit = true;
           break;
         }
 
@@ -243,6 +249,26 @@ compile_pattern (Lisp_Object pattern, struct re_registers *regp,
 	}
     }
 
+  if (regexp_trace_file) {
+    fprintf(regexp_trace_file, "%c \"", cache_hit ? 'H' : 'M');
+    ptrdiff_t n = SBYTES (pattern);
+    for (ptrdiff_t i = 0; i < n; i++) {
+      unsigned char c = SREF (pattern, i);
+      switch (c) {
+      case '"': case '\\': fprintf(regexp_trace_file, "\\%c", c); break;
+      case '\n': fprintf(regexp_trace_file, "\\n"); break;
+      case '\t': fprintf(regexp_trace_file, "\\t"); break;
+      default:
+	if (c < 32 || c == 127)
+	  fprintf(regexp_trace_file, "\\x%02x", c);
+	else
+	  putc(c, regexp_trace_file);
+	break;
+      }
+    }
+    fprintf(regexp_trace_file, "\"\n");
+  }
+
   /* When we get here, cp (aka *cpp) contains the compiled pattern,
      either because we found it in the cache or because we just compiled it.
      Move it to the front of the queue to mark it as most recently used.  */
@@ -3424,6 +3450,27 @@ DEFUN ("set-regexp-cache-size", Fset_regexp_cache_size, Sset_regexp_cache_size,
   return Qnil;
 }
 
+DEFUN ("set-regexp-trace-file", Fset_regexp_trace_file, Sset_regexp_trace_file,
+       1, 1, 0,
+       doc: /* Set the regexp trace file to FILE.  Internal use only.
+Use `nil' as argument to stop tracing.  */)
+  (Lisp_Object file)
+{
+  if (NILP (file)) {
+    fclose (regexp_trace_file);
+    regexp_trace_file = NULL;
+  } else {
+    CHECK_STRING (file);
+    if (regexp_trace_file)
+      Fset_regexp_trace_file (Qnil);
+    FILE *f = fopen (SSDATA (file), "a");
+    if (!f)
+      report_file_error ("opening regexp trace file", file);
+    regexp_trace_file = f;
+  }
+  return Qnil;
+}
+
 void
 mark_regexp_cache (void)
 {
@@ -3514,6 +3561,7 @@ syms_of_search (void)
   defsubr (&Snewline_cache_check);
   defsubr (&Sregexp_cache_size);
   defsubr (&Sset_regexp_cache_size);
+  defsubr (&Sset_regexp_trace_file);
 
   pdumper_do_now_and_after_load (syms_of_search_for_pdumper);
 }
-- 
2.32.0 (Apple Git-132)


  parent reply	other threads:[~2023-05-07 12:45 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-02  7:37 bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c) Ihor Radchenko
2023-05-02 14:33 ` Mattias Engdegård
2023-05-02 15:25   ` Eli Zaretskii
2023-05-02 15:28     ` Mattias Engdegård
2023-05-02 17:30       ` Eli Zaretskii
2023-05-02 17:58         ` Ihor Radchenko
2023-05-02 16:14   ` Ihor Radchenko
2023-05-02 21:00     ` Mattias Engdegård
2023-05-02 21:21       ` Ihor Radchenko
2023-05-03  8:39         ` Mattias Engdegård
2023-05-03  9:36           ` Ihor Radchenko
2023-05-03 13:59             ` Mattias Engdegård
2023-05-03 15:05               ` Ihor Radchenko
2023-05-03 15:20                 ` Mattias Engdegård
2023-05-03 16:02                   ` Ihor Radchenko
2023-05-04  9:24                     ` Mattias Engdegård
2023-05-05 10:31                       ` Ihor Radchenko
2023-05-05 16:26                         ` Mattias Engdegård
2023-05-06 13:38                           ` Ihor Radchenko
2023-05-07 10:32                             ` Mattias Engdegård
2023-05-08 11:58                               ` Ihor Radchenko
2023-05-08 18:21                                 ` Mattias Engdegård
2023-05-08 19:38                                   ` Ihor Radchenko
2023-05-08 19:53                                     ` Mattias Engdegård
2023-05-09  8:36                                       ` bug#63225: Using char table-based finite-state machines as a replacement for re-search-forward (was: bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c)) Ihor Radchenko
2023-05-09 12:02                                       ` bug#63225: Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c) Ihor Radchenko
2023-05-09 15:05                                         ` Mattias Engdegård
2023-05-09 15:56                                           ` Ihor Radchenko
2023-05-09 15:57                                             ` Mattias Engdegård
2023-05-07 12:45                           ` Mattias Engdegård [this message]
2023-05-08 13:56                             ` Ihor Radchenko
2023-05-08 19:32                               ` Mattias Engdegård
2023-05-08 19:44                                 ` Ihor Radchenko
2023-05-04 12:58               ` Ihor Radchenko
2023-05-02 23:36   ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BFB9B4B3-0B55-4D1F-897B-7C7B66173B8D@gmail.com \
    --to=mattias.engdegard@gmail.com \
    --cc=63225@debbugs.gnu.org \
    --cc=yantar92@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).