bug#62368: 29.0.60; Evaluating predicates before creating captured nodes in treesit-query-capture

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* bug#62368: 29.0.60; Evaluating predicates before creating captured nodes in treesit-query-capture
@ 2023-03-22  4:49 Yuan Fu
  2023-03-23  0:42 ` Dmitry Gutov
  0 siblings, 1 reply; 5+ messages in thread
From: Yuan Fu @ 2023-03-22  4:49 UTC (permalink / raw)
  To: 62368; +Cc: dgutov

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]

X-Debbugs-CC: dgutov@yandex.ru

Dmitry, when you have time, could you try your benchmark in bug#60953
with this patch? I made predicates evaluate before we create any nodes,
so #equal and #match should be more efficient now, when there are a lot
of rejections. In the same time #pred is made slightly worst since they
now create a lisp node and discard it. (But this can be fixed with a
little more complexity.)

Yuan


[-- Attachment #2: pred-first.patch --]
[-- Type: application/octet-stream, Size: 28281 bytes --]

From 312063d389ea4e4f17ab82ab3da093c172e4142d Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 21 Mar 2023 16:03:08 -0700
Subject: [PATCH 1/3] Refactor Ftreesit_query_capture

Refactor some part of Ftreesit_query_capture out into separate
functions, to pave the way for other query-based functions.

* src/treesit.c (treesit_resolve_node): New function.
(treesit_initialize_query): New function.
(Ftreesit_query_capture): Refactor some part into new functions.
---
 src/treesit.c | 153 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 97 insertions(+), 56 deletions(-)

diff --git a/src/treesit.c b/src/treesit.c
index 5a4fe3e8803..e728d697c9d 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -2631,8 +2631,8 @@ DEFUN ("treesit-query-compile",
       Lisp_Object signal_symbol = Qnil;
       Lisp_Object signal_data = Qnil;
       TSQuery *treesit_query = treesit_ensure_query_compiled (lisp_query,
-							 &signal_symbol,
-							 &signal_data);
+							      &signal_symbol,
+							      &signal_data);
 
       if (treesit_query == NULL)
 	xsignal (signal_symbol, signal_data);
@@ -2641,6 +2641,92 @@ DEFUN ("treesit-query-compile",
     }
 }
 
+/* Resolve OBJ into a tree-sitter node Lisp_Object.  OBJ can be a
+   node, a parser, or a language symbol.  Note that this function can
+   signal.  */
+static Lisp_Object treesit_resolve_node (Lisp_Object obj)
+{
+  if (TS_NODEP (obj))
+    {
+      treesit_check_node (obj); /* Check if up-to-date.  */
+      return obj;
+    }
+  else if (TS_PARSERP (obj))
+    {
+      treesit_check_parser (obj); /* Check if deleted.  */
+      return Ftreesit_parser_root_node (obj);
+    }
+  else if (SYMBOLP (obj))
+    {
+      Lisp_Object parser
+	= Ftreesit_parser_create (obj, Fcurrent_buffer (), Qnil);
+      return Ftreesit_parser_root_node (parser);
+    }
+  else
+    xsignal2 (Qwrong_type_argument,
+	      list4 (Qor, Qtreesit_node_p, Qtreesit_parser_p, Qsymbolp),
+	      obj);
+}
+
+/* Create and initialize QUERY.  When success, initialize TS_QUERY,
+   CURSOR, and NEED_FREE, and return true; if failed, initialize
+   SIGNAL_SYMBOL and SIGNAL_DATA, and return false.  If NEED_FREE is
+   initialized to true, the TS_QUERY and CURSOR needs to be freed
+   after use; otherwise they shouldn't be freed by hand.
+
+   Basically this function looks at QUERY and check its type, if QUERY
+   is a compiled query, this function takes out its query and cursor;
+   if QUERY is a string or a cons, this function creates a new query
+   and cursor (so they need to be manually freed).
+
+   This function assumes QUERY is either a compiled query, a string or
+   a cons, the caller should make sure QUERY is valid.
+
+   LANG is the language to use if we need to create the query and
+   cursor.  */
+static bool
+treesit_initialize_query (Lisp_Object query, const TSLanguage *lang,
+			  TSQuery **ts_query, TSQueryCursor **cursor,
+			  bool *need_free, Lisp_Object *signal_symbol,
+			  Lisp_Object *signal_data)
+{
+  if (TS_COMPILED_QUERY_P (query))
+    {
+      *ts_query = treesit_ensure_query_compiled (query, signal_symbol,
+						 signal_data);
+      *cursor = XTS_COMPILED_QUERY (query)->cursor;
+      /* We don't need to free ts_query and cursor because they
+	 are stored in a lisp object, which is tracked by gc.  */
+      *need_free = false;
+      return (*ts_query != NULL);
+    }
+  else
+    {
+      /* Since query is not TS_COMPILED_QUERY, it can only be a string
+	 or a cons.  */
+      if (CONSP (query))
+	query = Ftreesit_query_expand (query);
+      char *query_string = SSDATA (query);
+      uint32_t error_offset;
+      TSQueryError error_type;
+      *ts_query = ts_query_new (lang, query_string, strlen (query_string),
+				&error_offset, &error_type);
+      if (*ts_query == NULL)
+	{
+	  *signal_symbol = Qtreesit_query_error;
+	  *signal_data = treesit_compose_query_signal_data (error_offset,
+							    error_type, query);
+	  return false;
+	}
+      else
+	{
+	  *cursor = ts_query_cursor_new ();
+	  *need_free = true;
+	  return true;
+	}
+    }
+}
+
 DEFUN ("treesit-query-capture",
        Ftreesit_query_capture,
        Streesit_query_capture, 2, 5, 0,
@@ -2681,27 +2767,7 @@ DEFUN ("treesit-query-capture",
   treesit_initialize ();
 
   /* Resolve NODE into an actual node.  */
-  Lisp_Object lisp_node;
-  if (TS_NODEP (node))
-    {
-      treesit_check_node (node); /* Check if up-to-date.  */
-      lisp_node = node;
-    }
-  else if (TS_PARSERP (node))
-    {
-      treesit_check_parser (node); /* Check if deleted.  */
-      lisp_node = Ftreesit_parser_root_node (node);
-    }
-  else if (SYMBOLP (node))
-    {
-      Lisp_Object parser
-	= Ftreesit_parser_create (node, Fcurrent_buffer (), Qnil);
-      lisp_node = Ftreesit_parser_root_node (parser);
-    }
-  else
-    xsignal2 (Qwrong_type_argument,
-	      list4 (Qor, Qtreesit_node_p, Qtreesit_parser_p, Qsymbolp),
-	      node);
+  Lisp_Object lisp_node = treesit_resolve_node (node);
 
   /* Extract C values from Lisp objects.  */
   TSNode treesit_node
@@ -2725,40 +2791,15 @@ DEFUN ("treesit-query-capture",
   TSQuery *treesit_query;
   TSQueryCursor *cursor;
   bool needs_to_free_query_and_cursor;
-  if (TS_COMPILED_QUERY_P (query))
-    {
-      Lisp_Object signal_symbol = Qnil;
-      Lisp_Object signal_data = Qnil;
-      treesit_query = treesit_ensure_query_compiled (query, &signal_symbol,
-						     &signal_data);
-      cursor = XTS_COMPILED_QUERY (query)->cursor;
-      /* We don't need to free ts_query and cursor because they
-	 are stored in a lisp object, which is tracked by gc.  */
-      needs_to_free_query_and_cursor = false;
-      if (treesit_query == NULL)
-	xsignal (signal_symbol, signal_data);
-    }
-  else
-    {
-      /* Since query is not TS_COMPILED_QUERY, it can only be a string
-	 or a cons.  */
-      if (CONSP (query))
-	query = Ftreesit_query_expand (query);
-      char *query_string = SSDATA (query);
-      uint32_t error_offset;
-      TSQueryError error_type;
-      treesit_query = ts_query_new (lang, query_string, strlen (query_string),
-				    &error_offset, &error_type);
-      if (treesit_query == NULL)
-	xsignal (Qtreesit_query_error,
-		 treesit_compose_query_signal_data (error_offset,
-						    error_type, query));
-      cursor = ts_query_cursor_new ();
-      needs_to_free_query_and_cursor = true;
-    }
+  Lisp_Object signal_symbol;
+  Lisp_Object signal_data;
+  if (!treesit_initialize_query (query, lang, &treesit_query, &cursor,
+				 &needs_to_free_query_and_cursor,
+				 &signal_symbol, &signal_data))
+    xsignal (signal_symbol, signal_data);
 
-  /* WARN: After this point, free treesit_query and cursor before every
-     signal and return.  */
+  /* WARN: After this point, free TREESIT_QUERY and CURSOR before every
+     signal and return if NEEDS_TO_FREE_QUERY_AND_CURSOR is true.  */
 
   /* Set query range.  */
   if (!NILP (beg) && !NILP (end))
-- 
2.33.1


From b166cb228cfe23ede54718d4b5ab6d1753d8337e Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 21 Mar 2023 16:13:23 -0700
Subject: [PATCH 2/3] ; Minor refactor of Ftreesit_query_capture

* src/treesit.c (Ftreesit_query_capture): Move around some variable
initialization.
---
 src/treesit.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/src/treesit.c b/src/treesit.c
index e728d697c9d..cd98ff38293 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -2770,12 +2770,9 @@ DEFUN ("treesit-query-capture",
   Lisp_Object lisp_node = treesit_resolve_node (node);
 
   /* Extract C values from Lisp objects.  */
-  TSNode treesit_node
-    = XTS_NODE (lisp_node)->node;
-  Lisp_Object lisp_parser
-    = XTS_NODE (lisp_node)->parser;
-  ptrdiff_t visible_beg
-    = XTS_PARSER (XTS_NODE (lisp_node)->parser)->visible_beg;
+  TSNode treesit_node = XTS_NODE (lisp_node)->node;
+  Lisp_Object lisp_parser = XTS_NODE (lisp_node)->parser;
+
   const TSLanguage *lang
     = ts_parser_language (XTS_PARSER (lisp_parser)->parser);
 
@@ -2804,6 +2801,8 @@ DEFUN ("treesit-query-capture",
   /* Set query range.  */
   if (!NILP (beg) && !NILP (end))
     {
+      ptrdiff_t visible_beg
+	= XTS_PARSER (XTS_NODE (lisp_node)->parser)->visible_beg;
       ptrdiff_t beg_byte = CHAR_TO_BYTE (XFIXNUM (beg));
       ptrdiff_t end_byte = CHAR_TO_BYTE (XFIXNUM (end));
       /* We never let tree-sitter run on buffers too large, so these
-- 
2.33.1


From 5c6c111a92c1da0a81b97a42a0b84fe8d9f03524 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Tue, 21 Mar 2023 21:10:59 -0700
Subject: [PATCH 3/3] Evaluate tree-sitter predicate functions before create
 node

By evaluating predicates before creating nodes, we can eliminate some
wasteful consing when there are many rejections (bug#60953).  The one
down side is that #pred predicate now creates superfluous nodes: it
creates a node to feed to the function, but then discard it.  We can
improve it later if it proves problematic.

* src/treesit.c:
(capture_range): Remove.
(predicate_context): New struct.
(treesit_predicates_for_pattern): Capture names are
not symbols anymore, but rather index numbers.
(treesit_capture_id_to_name): New subroutine function that converts
an index number to the capture name.  This is only used by error
reporting.
(treesit_predicate_capture_name_to_node): Remove.
(treesit_capture_id_to_node): New function.
(treesit_predicate_capture_name_to_text): Remove.
(treesit_capture_id_to_text): New function.
(treesit_predicate_equal)
(treesit_predicate_match)
(treesit_predicate_pred)
(treesit_eval_predicates): Change to the new index-based format.  Also
now we return the signal data rather than signaling directly.
(Ftreesit_query_capture): Now we evaluate predicates before creating
nodes.  We also set the current buffer to that of the node we are
working on, so that the predicate functions don't need to switch
buffers back-and-forth.  Also, handle signals gracefully so we don't
leak memory.
---
 src/treesit.c | 337 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 208 insertions(+), 129 deletions(-)

diff --git a/src/treesit.c b/src/treesit.c
index cd98ff38293..c751223fc19 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -2349,23 +2349,38 @@ DEFUN ("treesit-query-expand",
   return Fmapconcat (Qtreesit_pattern_expand, query, Vtreesit_str_space);
 }
 
-/* This struct is used for passing captures to be check against
-   predicates.  Captures we check for are the ones in START before
-   END.  For example, if START and END are
+/* This struct is used passing context for evaluating predicates.
+   QUERY and MATCH are self-explanatory, ROOT_NODE is the node we are
+   querying.
 
-   START       END
+   START and END points to the list of captured nodes we are returning
+   from Ftreesit_query_capture.  They mark the nodes we captured in
+   this match.  Because we evaluate predicates before adding captured
+   nodes to the result list, right now START simply equal to END.
+
+   But if we later decide we want to optimize for #pred predicate,
+   these two pointer will be useful.
+
+   An example:
+
+    START          END
     v              v
-   (1 . (2 . (3 . (4 . (5 . (6 . nil))))))
+   (1 . (2 . (3 . (4 . (5 . (6 . nil)))))) -> complete result list
 
-   We only look at captures 1 2 3.  */
-struct capture_range
+   1 2 3 are the nodes captured by the current match.  */
+struct predicate_context
 {
+  TSQuery *query;
+  TSQueryMatch *match;
+  Lisp_Object root_node;
   Lisp_Object start;
   Lisp_Object end;
 };
 
 /* Collect predicates for this match and return them in a list.  Each
-   predicate is a list of strings and symbols.  */
+   predicate is a list of strings and numbers.  A number represents
+   the id of a capture name, the same thing as the index attribute of
+   a TSQueryCapture.  */
 static Lisp_Object
 treesit_predicates_for_pattern (TSQuery *query, uint32_t pattern_index)
 {
@@ -2381,12 +2396,7 @@ treesit_predicates_for_pattern (TSQuery *query, uint32_t pattern_index)
 	{
 	case TSQueryPredicateStepTypeCapture:
 	  {
-	    uint32_t str_len;
-	    const char *str = ts_query_capture_name_for_id (query,
-							    step.value_id,
-							    &str_len);
-	    predicate = Fcons (intern_c_string_1 (str, str_len),
-			       predicate);
+	    predicate = Fcons (make_fixnum (step.value_id), predicate);
 	    break;
 	  }
 	case TSQueryPredicateStepTypeString:
@@ -2407,118 +2417,159 @@ treesit_predicates_for_pattern (TSQuery *query, uint32_t pattern_index)
   return Fnreverse (result);
 }
 
-/* Translate a capture NAME (symbol) to a node.
-   Signals treesit-query-error if such node is not captured.  */
 static Lisp_Object
-treesit_predicate_capture_name_to_node (Lisp_Object name,
-					struct capture_range captures)
+treesit_capture_id_to_name (TSQuery *query, ptrdiff_t id)
 {
-  Lisp_Object node = Qnil;
-  for (Lisp_Object tail = captures.start; !EQ (tail, captures.end);
-       tail = XCDR (tail))
-    {
-      if (EQ (XCAR (XCAR (tail)), name))
-	{
-	  node = XCDR (XCAR (tail));
-	  break;
-	}
-    }
-
-  if (NILP (node))
-    xsignal3 (Qtreesit_query_error,
-	      build_string ("Cannot find captured node"),
-	      name, build_string ("A predicate can only refer"
-		                  " to captured nodes in the "
-		                  "same pattern"));
-  return node;
+  uint32_t len;
+  const char* name_str = ts_query_capture_name_for_id (query, (uint32_t) id,
+						       &len);
+  return make_string (name_str, (ptrdiff_t) len);
 }
 
-/* Translate a capture NAME (symbol) to the text of the captured node.
-   Signals treesit-query-error if such node is not captured.  */
-static Lisp_Object
-treesit_predicate_capture_name_to_text (Lisp_Object name,
-					struct capture_range captures)
+/* Translate a capture id (fixnum) into the captured TSNode and set to
+   NODE.  If we can't find the captured node, set SIGNAL_DATA and
+   return false, otherwise return true.  */
+static bool
+treesit_capture_id_to_node (TSNode *node, Lisp_Object id,
+			    struct predicate_context *context,
+			    Lisp_Object *signal_data)
 {
-  Lisp_Object node = treesit_predicate_capture_name_to_node (name, captures);
+  const TSQueryCapture *captures = context->match->captures;
+  for (int idx = 0; idx < context->match->capture_count; idx++)
+    {
+      if (captures[idx].index == XFIXNUM (id))
+	{
+	  *node = captures->node;
+	  return true;
+	}
+    }
+  Lisp_Object name = treesit_capture_id_to_name (context->query, XFIXNUM (id));
+  *signal_data = list3 (build_string ("Cannot find captured node"),
+			name, build_string ("A predicate can only refer"
+					    " to captured nodes in the "
+					    "same pattern"));
+  return false;
+}
 
-  struct buffer *old_buffer = current_buffer;
-  set_buffer_internal (XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer));
-  Lisp_Object text = Fbuffer_substring (Ftreesit_node_start (node),
-					Ftreesit_node_end (node));
-  set_buffer_internal (old_buffer);
-  return text;
+/* Translate a capture id (fixnum) into the text of the captured node
+   in the current buffer, and populate TEXT with it.  (Assumes current
+   buffer is the buffer in which the captured node and ROOT_NODE is).
+   ROOT_NODE is the root node of the query.  If we can't find the
+   captured node, set SIGNAL_DATA and return false, otherwise return
+   true.  */
+static bool
+treesit_capture_id_to_text (Lisp_Object *text, Lisp_Object id,
+			    struct predicate_context *context,
+			    Lisp_Object *signal_data)
+{
+  TSNode node;
+  if (!treesit_capture_id_to_node (&node, id, context, signal_data))
+    return false;
+
+  Lisp_Object parser = XTS_NODE (context->root_node)->parser;
+  ptrdiff_t begin_v = XTS_PARSER (parser)->visible_beg;
+
+  ptrdiff_t start_byte = (ptrdiff_t) ts_node_start_byte (node) + begin_v;
+  ptrdiff_t end_byte = (ptrdiff_t) ts_node_end_byte (node) + begin_v;
+
+  ptrdiff_t start = BYTE_TO_CHAR (start_byte);
+  ptrdiff_t end = BYTE_TO_CHAR (end_byte);
+
+  *text = make_buffer_string_both (start, start_byte,
+				   end, end_byte, false);
+  return true;
 }
 
 /* Handles predicate (#equal A B).  Return true if A equals B; return
-   false otherwise.  A and B can be either string, or a capture name.
-   The capture name evaluates to the text its captured node spans in
-   the buffer.  */
+   false otherwise.  A and B can be a either string or a capture id.
+   The capture id evaluates to the text spanned by the captured node
+   in the buffer.  Assume current buffer is the buffer of the captured
+   node and ROOT_NODE.  If something went wrong, set signal_data,
+   otherwise set it to Qnil.  IOW, caller should check the nullness of
+   signal_data for errors.  */
 static bool
-treesit_predicate_equal (Lisp_Object args, struct capture_range captures)
+treesit_predicate_equal (Lisp_Object args, struct predicate_context *context,
+			 Lisp_Object *signal_data)
 {
+  *signal_data = Qnil;
   if (XFIXNUM (Flength (args)) != 2)
-    xsignal2 (Qtreesit_query_error,
-	      build_string ("Predicate `equal' requires "
-		            "two arguments but only given"),
-	      Flength (args));
-
+    {
+      *signal_data = list2 (build_string ("Predicate `equal' requires "
+		                          "two arguments but was given"),
+			    Flength (args));
+      return false;
+    }
   Lisp_Object arg1 = XCAR (args);
   Lisp_Object arg2 = XCAR (XCDR (args));
-  Lisp_Object text1 = (STRINGP (arg1)
-		       ? arg1
-		       : treesit_predicate_capture_name_to_text (arg1,
-								 captures));
-  Lisp_Object text2 = (STRINGP (arg2)
-		       ? arg2
-		       : treesit_predicate_capture_name_to_text (arg2,
-								 captures));
+  Lisp_Object text1 = arg1;
+  Lisp_Object text2 = arg2;
+  if (FIXNUMP (arg1))
+    {
+      if (!treesit_capture_id_to_text (&text1, arg1, context, signal_data))
+	return false;
 
+    }
+  if (FIXNUMP (arg2))
+    {
+      if (!treesit_capture_id_to_text (&text2, arg2, context, signal_data))
+	return false;
+    }
   return !NILP (Fstring_equal (text1, text2));
 }
 
 /* Handles predicate (#match "regexp" @node).  Return true if "regexp"
-   matches the text spanned by @node; return false otherwise.  Matching
-   is case-sensitive.  */
+   matches the text spanned by @node; return false otherwise.
+   Matching is case-sensitive.  Assume current buffer is the buffer of
+   the captured node and ROOT_NODE.  If something went wrong, set
+   signal_data, otherwise set it to Qnil.  IOW, caller should check
+   the nullness of signal_data for errors.  */
 static bool
-treesit_predicate_match (Lisp_Object args, struct capture_range captures)
+treesit_predicate_match (Lisp_Object args, struct predicate_context *context,
+			 Lisp_Object *signal_data)
 {
+  *signal_data = Qnil;
   if (XFIXNUM (Flength (args)) != 2)
-    xsignal2 (Qtreesit_query_error,
-	      build_string ("Predicate `match' requires two "
-		            "arguments but only given"),
-	      Flength (args));
+    {
+      *signal_data = list2 (build_string ("Predicate `match' requires two "
+		                          "arguments but only given"),
+			    Flength (args));
+      return false;
+    }
 
   Lisp_Object regexp = XCAR (args);
-  Lisp_Object capture_name = XCAR (XCDR (args));
+  Lisp_Object capture_id = XCAR (XCDR (args));
 
   /* It's probably common to get the argument order backwards.  Catch
      this mistake early and show helpful explanation, because Emacs
      loves you.  (We put the regexp first because that's what
      string-match does.)  */
   if (!STRINGP (regexp))
-    xsignal1 (Qtreesit_query_error,
-	      build_string ("The first argument to `match' should "
-		            "be a regexp string, not a capture name"));
-  if (!SYMBOLP (capture_name))
-    xsignal1 (Qtreesit_query_error,
-	      build_string ("The second argument to `match' should "
-		            "be a capture name, not a string"));
+    {
+      *signal_data = build_string ("The first argument to `match' should "
+	                           "be a regexp string, not a capture name");
+      return false;
+    }
+  if (!FIXNUMP (capture_id))
+    {
+      *signal_data = build_string ("The second argument to `match' should "
+	                           "be a capture name, not a string");
+      return false;
+    }
 
-  Lisp_Object node = treesit_predicate_capture_name_to_node (capture_name,
-							     captures);
+  TSNode treesit_node;
+  if (!treesit_capture_id_to_node (&treesit_node, capture_id, context,
+				   signal_data))
+    return false;
 
-  struct buffer *old_buffer = current_buffer;
-  struct buffer *buffer = XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
-  set_buffer_internal (buffer);
-
-  TSNode treesit_node = XTS_NODE (node)->node;
-  ptrdiff_t visible_beg = XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+  Lisp_Object parser = XTS_NODE (context->root_node)->parser;
+  ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg;
   uint32_t start_byte_offset = ts_node_start_byte (treesit_node);
   uint32_t end_byte_offset = ts_node_end_byte (treesit_node);
-  ptrdiff_t start_byte = visible_beg + start_byte_offset;
-  ptrdiff_t end_byte = visible_beg + end_byte_offset;
+  ptrdiff_t start_byte = visible_beg + (ptrdiff_t) start_byte_offset;
+  ptrdiff_t end_byte = visible_beg + (ptrdiff_t) end_byte_offset;
   ptrdiff_t start_pos = BYTE_TO_CHAR (start_byte);
   ptrdiff_t end_pos = BYTE_TO_CHAR (end_byte);
+
   ptrdiff_t old_begv = BEGV;
   ptrdiff_t old_begv_byte = BEGV_BYTE;
   ptrdiff_t old_zv = ZV;
@@ -2537,63 +2588,78 @@ treesit_predicate_match (Lisp_Object args, struct capture_range captures)
   ZV = old_zv;
   ZV_BYTE = old_zv_byte;
 
-  set_buffer_internal (old_buffer);
-
   return (val > 0);
 }
 
 /* Handles predicate (#pred FN ARG...).  Return true if FN returns
    non-nil; return false otherwise.  The arity of FN must match the
-   number of ARGs  */
+   number of ARGs.  If something went wrong, set signal_data,
+   otherwise set it to Qnil.  IOW, caller should check the nullness of
+   signal_data for errors.  */
 static bool
-treesit_predicate_pred (Lisp_Object args, struct capture_range captures)
+treesit_predicate_pred (Lisp_Object args, struct predicate_context *context,
+			Lisp_Object *signal_data)
 {
+  *signal_data = Qnil;
   if (XFIXNUM (Flength (args)) < 2)
-    xsignal2 (Qtreesit_query_error,
-	      build_string ("Predicate `pred' requires "
-		            "at least two arguments, "
-		            "but was only given"),
-	      Flength (args));
-
+    {
+      *signal_data = list2 (build_string ("Predicate `pred' requires "
+		                          "at least two arguments, "
+		                          "but was only given"),
+			    Flength (args));
+      return false;
+    }
   Lisp_Object fn = Fintern (XCAR (args), Qnil);
   Lisp_Object nodes = Qnil;
   Lisp_Object tail = XCDR (args);
+  Lisp_Object parser = XTS_NODE (context->root_node)->parser;
   FOR_EACH_TAIL (tail)
-    nodes = Fcons (treesit_predicate_capture_name_to_node (XCAR (tail),
-							   captures),
-		   nodes);
+  {
+    TSNode ts_node;
+    if (!treesit_capture_id_to_node (&ts_node, XCAR (tail), context,
+				     signal_data))
+      return false;
+    Lisp_Object lisp_node = make_treesit_node (parser, ts_node);
+    nodes = Fcons (lisp_node, nodes);
+  }
   nodes = Fnreverse (nodes);
 
   return !NILP (CALLN (Fapply, fn, nodes));
 }
 
 /* If all predicates in PREDICATES passes, return true; otherwise
-   return false.  */
+   return false.  If something went wrong, set signal_data, otherwise
+   set it to Qnil.  IOW, caller should check the nullness of
+   signal_data for errors.  */
 static bool
-treesit_eval_predicates (struct capture_range captures, Lisp_Object predicates)
+treesit_eval_predicates (Lisp_Object predicates,
+			 struct predicate_context *context,
+			 Lisp_Object *signal_data)
 {
   bool pass = true;
+  *signal_data = Qnil;
   /* Evaluate each predicates.  */
   for (Lisp_Object tail = predicates;
-       !NILP (tail); tail = XCDR (tail))
+       !NILP (tail) && pass; tail = XCDR (tail))
     {
       Lisp_Object predicate = XCAR (tail);
       Lisp_Object fn = XCAR (predicate);
       Lisp_Object args = XCDR (predicate);
       if (!NILP (Fstring_equal (fn, Vtreesit_str_equal)))
-	pass &= treesit_predicate_equal (args, captures);
+	pass &= treesit_predicate_equal (args, context, signal_data);
       else if (!NILP (Fstring_equal (fn, Vtreesit_str_match)))
-	pass &= treesit_predicate_match (args, captures);
+	pass &= treesit_predicate_match (args, context, signal_data);
       else if (!NILP (Fstring_equal (fn, Vtreesit_str_pred)))
-	pass &= treesit_predicate_pred (args, captures);
+	pass &= treesit_predicate_pred (args, context, signal_data);
       else
-	xsignal3 (Qtreesit_query_error,
-		  build_string ("Invalid predicate"),
-		  fn, build_string ("Currently Emacs only supports"
-		                    " equal, match, and pred"
-		                    " predicate"));
+	{
+	  *signal_data = list3 (build_string ("Invalid predicate"),
+				fn, build_string ("Currently Emacs only supports"
+		                                  " equal, match, and pred"
+				                  " predicate"));
+	  break;
+	}
     }
-  /* If all predicates passed, add captures to result list.  */
   return pass;
 }
 
@@ -2831,10 +2897,34 @@ DEFUN ("treesit-query-capture",
   Lisp_Object result = Qnil;
   Lisp_Object prev_result = result;
   Lisp_Object predicates_table = make_vector (patterns_count, Qt);
+
+  struct buffer *old_buf = current_buffer;
+  set_buffer_internal (buf);
+
+  signal_data = Qnil;
   while (ts_query_cursor_next_match (cursor, &match))
     {
       /* Record the checkpoint that we may roll back to.  */
       prev_result = result;
+
+      /* Get predicates.  */
+      Lisp_Object predicates = AREF (predicates_table, match.pattern_index);
+      if (EQ (predicates, Qt))
+	{
+	  predicates = treesit_predicates_for_pattern (treesit_query,
+						       match.pattern_index);
+	  ASET (predicates_table, match.pattern_index, predicates);
+	}
+
+      /* Evaluate predicates.  */
+      struct predicate_context context
+	= { treesit_query, &match, lisp_node, result, prev_result };
+      bool pass = treesit_eval_predicates (predicates, &context, &signal_data);
+      if (!NILP (signal_data))
+	break;
+      else if (!pass)
+	continue;
+
       /* Get captured nodes.  */
       const TSQueryCapture *captures = match.captures;
       for (int idx = 0; idx < match.capture_count; idx++)
@@ -2858,26 +2948,15 @@ DEFUN ("treesit-query-capture",
 
 	  result = Fcons (cap, result);
 	}
-      /* Get predicates.  */
-      Lisp_Object predicates = AREF (predicates_table, match.pattern_index);
-      if (EQ (predicates, Qt))
-	{
-	  predicates = treesit_predicates_for_pattern (treesit_query,
-						       match.pattern_index);
-	  ASET (predicates_table, match.pattern_index, predicates);
-	}
-
-      /* captures_lisp = Fnreverse (captures_lisp); */
-      struct capture_range captures_range = { result, prev_result };
-      if (!treesit_eval_predicates (captures_range, predicates))
-	/* Predicates didn't pass, roll back.  */
-	result = prev_result;
     }
+  set_buffer_internal (old_buf);
   if (needs_to_free_query_and_cursor)
     {
       ts_query_delete (treesit_query);
       ts_query_cursor_delete (cursor);
     }
+  if (!NILP (signal_data))
+    xsignal (Qtreesit_query_error, signal_data);
   return Fnreverse (result);
 }
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* bug#62368: 29.0.60; Evaluating predicates before creating captured nodes in treesit-query-capture
  2023-03-22  4:49 bug#62368: 29.0.60; Evaluating predicates before creating captured nodes in treesit-query-capture Yuan Fu
@ 2023-03-23  0:42 ` Dmitry Gutov
  2023-03-23  3:16   ` Yuan Fu
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Gutov @ 2023-03-23  0:42 UTC (permalink / raw)
  To: Yuan Fu, 62368

Hi Yuan!

On 22/03/2023 06:49, Yuan Fu wrote:
> X-Debbugs-CC:dgutov@yandex.ru
> 
> Dmitry, when you have time, could you try your benchmark in bug#60953
> with this patch? I made predicates evaluate before we create any nodes,
> so #equal and #match should be more efficient now, when there are a lot
> of rejections. In the same time #pred is made slightly worst since they
> now create a lisp node and discard it. (But this can be fixed with a
> little more complexity.)

Thank you, I was curious what would the improvement be if we could delay 
allocation of node structures until :match is checked.

But for my benchmark the difference is on the order of 4-5%. It seems we 
are scraping the barrel in terms of improving allocations/reducing GC 
because according to 'benchmark-run', where the whole run of a 100 
iterations of the scenario takes ~1.1s, the time spent in GC is 0.150s. 
And the improved version takes like 1.04s, with 0.1s in GC.

So if you ask me, I think I'd prefer to hold off on applying this patch 
until we either find scenarios where the improvement is more 
significant, or we find and eliminate some other bigger bottleneck 
first, after which these 5% grow to become 10-20% or more, of remaining 
runtime. The current approach is pretty Lisp-y, so I kind of like it.

And there's the issue of #pred, of course, which which could swing the 
difference in the other direction (I didn't test any code which uses it).

We could also try a smaller change: where the initial list of conses for 
result is build with capture_id's in car's, and then substituted with 
capture_name if the predicates all match. Then tthe treesit_node 
pseudovectors would still be created eagerly, though.

Here's the current perf report for my benchmark, most of the time is 
spent in libtree-sitter:

   17.02%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_status              ◆
   10.94%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_next_sibling           ▒
    9.93%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_first_child            ▒
    9.55%  emacs         emacs                       [.] 
process_mark_stack                         ▒
    4.56%  emacs         libtree-sitter.so.0.0       [.] 
ts_node_start_point                        ▒
    3.90%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_parent_node                 ▒
    3.69%  emacs         emacs                       [.] 
re_match_2_internal                        ▒
    3.08%  emacs         libtree-sitter.so.0.0       [.] 
ts_language_symbol_metadata                ▒
    1.61%  emacs         emacs                       [.] exec_byte_code 
                            ▒
    1.47%  emacs         libtree-sitter.so.0.0       [.] 
ts_node_end_point                          ▒
    1.44%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_node                ▒
    1.13%  emacs         emacs                       [.] 
allocate_vectorlike                        ▒
    1.11%  emacs         emacs                       [.] sweep_strings 
                            ▒
    1.04%  emacs         libtree-sitter.so.0.0       [.] 
ts_node_end_byte                           ▒
    0.94%  emacs         emacs                       [.] next_interval 
                            ▒
    0.91%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_parent                 ▒
    0.88%  emacs         emacs                       [.] 
lookup_char_property                       ▒
    0.81%  emacs         emacs                       [.] find_interval 
                            ▒
    0.68%  emacs         emacs                       [.] 
pdumper_marked_p_impl                      ▒
    0.67%  emacs         emacs                       [.] assq_no_quit 
                            ▒
    0.56%  emacs         libtree-sitter.so.0.0       [.] ts_node_symbol 
                            ▒
    0.56%  emacs         emacs                       [.] mark_char_table 
                            ▒
    0.55%  emacs         emacs                       [.] execute_charset 
                            ▒
    0.49%  emacs         libtree-sitter.so.0.0       [.] 
0x000000000001ae3e                         ▒
    0.49%  emacs         emacs                       [.] re_search_2 
                            ▒
    0.48%  emacs         emacs                       [.] funcall_subr 
                            ▒
    0.46%  emacs         libc.so.6                   [.] __strncmp_sse42 
                            ▒
    0.42%  emacs         libtree-sitter.so.0.0       [.] 
ts_language_public_symbol                  ▒
    0.41%  emacs         libtree-sitter.so.0.0       [.] 
ts_node_is_named                           ▒
    0.40%  emacs         libtree-sitter.so.0.0       [.] ts_node_new 
                            ▒
    0.34%  emacs         emacs                       [.] Fassq 
                            ▒
    0.34%  emacs         emacs                       [.] sweep_vectors





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#62368: 29.0.60; Evaluating predicates before creating captured nodes in treesit-query-capture
  2023-03-23  0:42 ` Dmitry Gutov
@ 2023-03-23  3:16   ` Yuan Fu
  2023-09-12  0:05     ` Stefan Kangas
  0 siblings, 1 reply; 5+ messages in thread
From: Yuan Fu @ 2023-03-23  3:16 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 62368



> On Mar 22, 2023, at 5:42 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
> 
> Hi Yuan!
> 
> On 22/03/2023 06:49, Yuan Fu wrote:
>> X-Debbugs-CC:dgutov@yandex.ru
>> Dmitry, when you have time, could you try your benchmark in bug#60953
>> with this patch? I made predicates evaluate before we create any nodes,
>> so #equal and #match should be more efficient now, when there are a lot
>> of rejections. In the same time #pred is made slightly worst since they
>> now create a lisp node and discard it. (But this can be fixed with a
>> little more complexity.)
> 
> Thank you, I was curious what would the improvement be if we could delay allocation of node structures until :match is checked.
> 
> But for my benchmark the difference is on the order of 4-5%. It seems we are scraping the barrel in terms of improving allocations/reducing GC because according to 'benchmark-run', where the whole run of a 100 iterations of the scenario takes ~1.1s, the time spent in GC is 0.150s. And the improved version takes like 1.04s, with 0.1s in GC.
> 
> So if you ask me, I think I'd prefer to hold off on applying this patch until we either find scenarios where the improvement is more significant, or we find and eliminate some other bigger bottleneck first, after which these 5% grow to become 10-20% or more, of remaining runtime. The current approach is pretty Lisp-y, so I kind of like it.
> 
> And there's the issue of #pred, of course, which which could swing the difference in the other direction (I didn't test any code which uses it).
> 
> We could also try a smaller change: where the initial list of conses for result is build with capture_id's in car's, and then substituted with capture_name if the predicates all match. Then tthe treesit_node pseudovectors would still be created eagerly, though.

Thank you very much! Yeah, it doesn’t seem to worth it. I guess we can keep this in our back sleeve for now ;-)

I think using symbols is fine for now, since I don’t think it would make much difference.

Yuan




^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#62368: 29.0.60; Evaluating predicates before creating captured nodes in treesit-query-capture
  2023-03-23  3:16   ` Yuan Fu
@ 2023-09-12  0:05     ` Stefan Kangas
  2023-09-12  0:37       ` Yuan Fu
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Kangas @ 2023-09-12  0:05 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 62368, Dmitry Gutov

Yuan Fu <casouri@gmail.com> writes:

>> On Mar 22, 2023, at 5:42 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
>>
>> Hi Yuan!
>>
>> On 22/03/2023 06:49, Yuan Fu wrote:
>>> X-Debbugs-CC:dgutov@yandex.ru Dmitry, when you have time, could you
>>> try your benchmark in bug#60953 with this patch? I made predicates
>>> evaluate before we create any nodes, so #equal and #match should be
>>> more efficient now, when there are a lot of rejections. In the same
>>> time #pred is made slightly worst since they now create a lisp node
>>> and discard it. (But this can be fixed with a little more
>>> complexity.)
>>
>> Thank you, I was curious what would the improvement be if we could
>> delay allocation of node structures until :match is checked.
>>
>> But for my benchmark the difference is on the order of 4-5%. It seems
>> we are scraping the barrel in terms of improving allocations/reducing
>> GC because according to 'benchmark-run', where the whole run of a 100
>> iterations of the scenario takes ~1.1s, the time spent in GC is
>> 0.150s. And the improved version takes like 1.04s, with 0.1s in GC.
>>
>> So if you ask me, I think I'd prefer to hold off on applying this
>> patch until we either find scenarios where the improvement is more
>> significant, or we find and eliminate some other bigger bottleneck
>> first, after which these 5% grow to become 10-20% or more, of
>> remaining runtime. The current approach is pretty Lisp-y, so I kind
>> of like it.
>>
>> And there's the issue of #pred, of course, which which could swing
>> the difference in the other direction (I didn't test any code which
>> uses it).
>>
>> We could also try a smaller change: where the initial list of conses
>> for result is build with capture_id's in car's, and then substituted
>> with capture_name if the predicates all match. Then tthe treesit_node
>> pseudovectors would still be created eagerly, though.
>
> Thank you very much! Yeah, it doesn’t seem to worth it. I guess we can
> keep this in our back sleeve for now ;-)
>
> I think using symbols is fine for now, since I don’t think it would
> make much difference.

So should this bug be closed, then?





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#62368: 29.0.60; Evaluating predicates before creating captured nodes in treesit-query-capture
  2023-09-12  0:05     ` Stefan Kangas
@ 2023-09-12  0:37       ` Yuan Fu
  0 siblings, 0 replies; 5+ messages in thread
From: Yuan Fu @ 2023-09-12  0:37 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: 62368-done, Dmitry Gutov



> On Sep 11, 2023, at 5:05 PM, Stefan Kangas <stefankangas@gmail.com> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> On Mar 22, 2023, at 5:42 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
>>> 
>>> Hi Yuan!
>>> 
>>> On 22/03/2023 06:49, Yuan Fu wrote:
>>>> X-Debbugs-CC:dgutov@yandex.ru Dmitry, when you have time, could you
>>>> try your benchmark in bug#60953 with this patch? I made predicates
>>>> evaluate before we create any nodes, so #equal and #match should be
>>>> more efficient now, when there are a lot of rejections. In the same
>>>> time #pred is made slightly worst since they now create a lisp node
>>>> and discard it. (But this can be fixed with a little more
>>>> complexity.)
>>> 
>>> Thank you, I was curious what would the improvement be if we could
>>> delay allocation of node structures until :match is checked.
>>> 
>>> But for my benchmark the difference is on the order of 4-5%. It seems
>>> we are scraping the barrel in terms of improving allocations/reducing
>>> GC because according to 'benchmark-run', where the whole run of a 100
>>> iterations of the scenario takes ~1.1s, the time spent in GC is
>>> 0.150s. And the improved version takes like 1.04s, with 0.1s in GC.
>>> 
>>> So if you ask me, I think I'd prefer to hold off on applying this
>>> patch until we either find scenarios where the improvement is more
>>> significant, or we find and eliminate some other bigger bottleneck
>>> first, after which these 5% grow to become 10-20% or more, of
>>> remaining runtime. The current approach is pretty Lisp-y, so I kind
>>> of like it.
>>> 
>>> And there's the issue of #pred, of course, which which could swing
>>> the difference in the other direction (I didn't test any code which
>>> uses it).
>>> 
>>> We could also try a smaller change: where the initial list of conses
>>> for result is build with capture_id's in car's, and then substituted
>>> with capture_name if the predicates all match. Then tthe treesit_node
>>> pseudovectors would still be created eagerly, though.
>> 
>> Thank you very much! Yeah, it doesn’t seem to worth it. I guess we can
>> keep this in our back sleeve for now ;-)
>> 
>> I think using symbols is fine for now, since I don’t think it would
>> make much difference.
> 
> So should this bug be closed, then?

We can close this for now. Thanks Stefan.

Yuan






^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-12  0:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-22  4:49 bug#62368: 29.0.60; Evaluating predicates before creating captured nodes in treesit-query-capture Yuan Fu
2023-03-23  0:42 ` Dmitry Gutov
2023-03-23  3:16   ` Yuan Fu
2023-09-12  0:05     ` Stefan Kangas
2023-09-12  0:37       ` Yuan Fu

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.