* How to add pseudo vector types
@ 2021-07-14 17:37 Yuan Fu
2021-07-14 17:44 ` Eli Zaretskii
2021-07-14 17:47 ` Stefan Monnier
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-14 17:37 UTC (permalink / raw)
To: emacs-devel
Say I want to expose tree-sitter’s parser to lisp, and I define it as a new pseudo vector.
struct Lisp_TS_Parser
{
union vectorlike_header header;
Lisp_Object buffer;
TSParser *parser;
TSTree *tree;
TSInput input;
};
Now if I want to return a Lisp_Object, do I initialize this struct and cast it into a Lisp_Object and return it? Like:
Lisp_TS_parser lisp_parser;
...
return (Lisp_Object)lisp_parser;
And how do I use a USER_PTR? Do I cast it into (struct Lisp_User_Ptr) and use it normally, or is there some helper function that I should use?
Are there examples of using pseudo vectors? Thanks
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-14 17:37 How to add pseudo vector types Yuan Fu
@ 2021-07-14 17:44 ` Eli Zaretskii
2021-07-14 17:47 ` Stefan Monnier
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-14 17:44 UTC (permalink / raw)
To: Yuan Fu; +Cc: emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 14 Jul 2021 13:37:47 -0400
>
> Say I want to expose tree-sitter’s parser to lisp, and I define it as a new pseudo vector.
>
> struct Lisp_TS_Parser
> {
> union vectorlike_header header;
> Lisp_Object buffer;
> TSParser *parser;
> TSTree *tree;
> TSInput input;
> };
Inside Emacs, or in a module? I assume the former.
> Now if I want to return a Lisp_Object, do I initialize this struct and cast it into a Lisp_Object and return it? Like:
>
> Lisp_TS_parser lisp_parser;
> ...
> return (Lisp_Object)lisp_parser;
No, you need to define a proper Lisp_Object, and then define
functions/macros to make a Lisp_Object that represents the struct, and
vice versa.
> And how do I use a USER_PTR? Do I cast it into (struct Lisp_User_Ptr) and use it normally, or is there some helper function that I should use?
Look in lisp.h, you will find some infrastructure there.
> Are there examples of using pseudo vectors?
Every buffer, window, frame, and overlay is a pseudo vector. Look how
these are handled in lisp.h and in the rest of the code, and you will
find a lot of examples.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-14 17:37 How to add pseudo vector types Yuan Fu
2021-07-14 17:44 ` Eli Zaretskii
@ 2021-07-14 17:47 ` Stefan Monnier
2021-07-14 23:48 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-07-14 17:47 UTC (permalink / raw)
To: Yuan Fu; +Cc: emacs-devel
Yuan Fu [2021-07-14 13:37:47] wrote:
> Say I want to expose tree-sitter’s parser to lisp, and I define it as a new pseudo vector.
>
> struct Lisp_TS_Parser
> {
> union vectorlike_header header;
> Lisp_Object buffer;
> TSParser *parser;
> TSTree *tree;
> TSInput input;
> };
>
> Now if I want to return a Lisp_Object, do I initialize this struct and cast
> it into a Lisp_Object and return it? Like:
>
> Lisp_TS_parser lisp_parser;
> ...
> return (Lisp_Object)lisp_parser;
>
>
> And how do I use a USER_PTR? Do I cast it into (struct Lisp_User_Ptr) and
> use it normally, or is there some helper function that I should use?
Most likely you'll want some of your functions to take objects that
should be "tree sitter parsers", but you'll only receive a Lisp_Object
so you'll need to be able to *test* that the object you received is
indeed a "tree sitter parser".
For that reason you'll probably want to add a new entry to `pvec_type`
rather than use a USER_PTR.
> Are there examples of using pseudo vectors? Thanks
Lots of them: actual vectors, processes, threads, mutexes, overlays, you
name it.
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-14 17:47 ` Stefan Monnier
@ 2021-07-14 23:48 ` Yuan Fu
2021-07-15 0:26 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-14 23:48 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 974 bytes --]
>>
>> And how do I use a USER_PTR? Do I cast it into (struct Lisp_User_Ptr) and
>> use it normally, or is there some helper function that I should use?
>
> Most likely you'll want some of your functions to take objects that
> should be "tree sitter parsers", but you'll only receive a Lisp_Object
> so you'll need to be able to *test* that the object you received is
> indeed a "tree sitter parser".
>
> For that reason you'll probably want to add a new entry to `pvec_type`
> rather than use a USER_PTR.
Actually, what is the correct way to provide a pointer from a dynamic module to Emacs core? I tried to use USER_PTR, but the dynamic module can only return an emacs_value, and to convert an emacs_value to a Lisp_Object, I need to use value_to_lisp, which is not exposed by emacs-module.c.
I want to provide individual tree-sitter language definitions from dynamic modules so that one don’t need to compile Emacs with language definitions.
Yuan
[-- Attachment #2: Type: text/html, Size: 6845 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-14 23:48 ` Yuan Fu
@ 2021-07-15 0:26 ` Yuan Fu
2021-07-15 2:48 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-15 0:26 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1159 bytes --]
> On Jul 14, 2021, at 7:48 PM, Yuan Fu <casouri@gmail.com> wrote:
>
>>>
>>> And how do I use a USER_PTR? Do I cast it into (struct Lisp_User_Ptr) and
>>> use it normally, or is there some helper function that I should use?
>>
>> Most likely you'll want some of your functions to take objects that
>> should be "tree sitter parsers", but you'll only receive a Lisp_Object
>> so you'll need to be able to *test* that the object you received is
>> indeed a "tree sitter parser".
>>
>> For that reason you'll probably want to add a new entry to `pvec_type`
>> rather than use a USER_PTR.
>
>
> Actually, what is the correct way to provide a pointer from a dynamic module to Emacs core? I tried to use USER_PTR, but the dynamic module can only return an emacs_value, and to convert an emacs_value to a Lisp_Object, I need to use value_to_lisp, which is not exposed by emacs-module.c.
>
> I want to provide individual tree-sitter language definitions from dynamic modules so that one don’t need to compile Emacs with language definitions.
I just realized that I can regard emacs_value just as Lisp_Object. Is that right?
Yuan
[-- Attachment #2: Type: text/html, Size: 7416 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 0:26 ` Yuan Fu
@ 2021-07-15 2:48 ` Yuan Fu
2021-07-15 6:39 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-15 2:48 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 930 bytes --]
I defined two pseudo vectors for tree-sitter's parser and node and packaged a dynamic module for tree-sitter’s json language definition. I also wrapped a few tree-sitter functions just to test if everything works. Please have a look. I’m sure there are some problems because I mainly wrote by copy, paste and modifying from other code I found in Emacs source.
To try out this patch, get tree-sitter from https://github.com/tree-sitter/tree-sitter.git <https://github.com/tree-sitter/tree-sitter.git>, make and make install it. Then unzip json-module.zip to get the source of the json dynamic module. If my Makefile is correct, make'ing it should produce a tree-sitter-json.so. Then if you apply ts.patch, compile emacs, and run this snippet, you should get a string representation of the root node.
(require 'tree-sitter-json)
(tree-sitter-node-string (tree-sitter-parse "[1,2]" (tree-sitter-json)))
Yuan
[-- Attachment #2.1: Type: text/html, Size: 1359 bytes --]
[-- Attachment #2.2: ts.patch --]
[-- Type: application/octet-stream, Size: 15270 bytes --]
From 85baf92975224ea99b7f68d5854342803c61f1d1 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Wed, 14 Jul 2021 22:26:42 -0400
Subject: [PATCH] checkpoint
---
configure.ac | 27 ++++++++-
src/Makefile.in | 11 +++-
src/alloc.c | 13 +++++
src/emacs.c | 4 ++
src/lisp.h | 2 +
src/print.c | 17 ++++++
src/tree_sitter.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++
src/tree_sitter.h | 87 ++++++++++++++++++++++++++++
8 files changed, 302 insertions(+), 4 deletions(-)
create mode 100644 src/tree_sitter.c
create mode 100644 src/tree_sitter.h
diff --git a/configure.ac b/configure.ac
index 830f33844b..42d2d43455 100644
--- a/configure.ac
+++ b/configure.ac
@@ -454,6 +454,7 @@ AC_DEFUN
OPTION_DEFAULT_OFF([imagemagick],[compile with ImageMagick image support])
OPTION_DEFAULT_ON([native-image-api], [don't use native image APIs (GDI+ on Windows)])
OPTION_DEFAULT_IFAVAILABLE([json], [compile with native JSON support])
+OPTION_DEFAULT_IFAVAILABLE([tree-sitter], [compile with tree-sitter])
OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([harfbuzz],[don't use HarfBuzz for text shaping])
@@ -2963,6 +2964,23 @@ AC_DEFUN
AC_SUBST(JSON_CFLAGS)
AC_SUBST(JSON_OBJ)
+HAVE_TREE_SITTER=no
+TREE_SITTER_OBJ=
+
+if test "${with_tree_sitter}" != "no"; then
+ EMACS_CHECK_MODULES([TREE_SITTER], [tree-sitter >= 0.0],
+ [HAVE_TREE_SITTER=yes], [HAVE_TREE_SITTER=no])
+ if test "${HAVE_TREE_SITTER}" = yes; then
+ AC_DEFINE(HAVE_TREE_SITTER, 1, [Define if using tree-sitter.])
+ TREE_SITTER_LIBS=-ltree-sitter
+ TREE_SITTER_OBJ="tree_sitter.o"
+ fi
+fi
+
+AC_SUBST(TREE_SITTER_LIBS)
+AC_SUBST(TREE_SITTER_CFLAGS)
+AC_SUBST(TREE_SITTER_OBJ)
+
NOTIFY_OBJ=
NOTIFY_SUMMARY=no
@@ -4028,6 +4046,12 @@ AC_DEFUN
*) MISSING="$MISSING json"
WITH_IFAVAILABLE="$WITH_IFAVAILABLE --with-json=ifavailable";;
esac
+case $with_tree_sitter,$HAVE_TREE_SITTER in
+ no,* | ifavailable,* | *,yes) ;;
+ *) MISSING="$MISSING tree-sitter"
+ WITH_IFAVAILABLE="$WITH_IFAVAILABLE --with-tree-sitter=ifavailable";;
+esac
+
if test "X${MISSING}" != X; then
# If we have a missing library, and we don't have pkg-config installed,
# the missing pkg-config may be the reason. Give the user a hint.
@@ -5833,7 +5857,7 @@ AC_DEFUN
optsep=
emacs_config_features=
for opt in ACL CAIRO DBUS FREETYPE GCONF GIF GLIB GMP GNUTLS GPM GSETTINGS \
- HARFBUZZ IMAGEMAGICK JPEG JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 \
+ HARFBUZZ IMAGEMAGICK JPEG JSON TREE-SITTER LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 \
M17N_FLT MODULES NATIVE_COMP NOTIFY NS OLDXMENU PDUMPER PNG RSVG SECCOMP \
SOUND THREADS TIFF \
TOOLKIT_SCROLL_BARS UNEXEC X11 XAW3D XDBE XFT XIM XPM XWIDGETS X_TOOLKIT \
@@ -5902,6 +5926,7 @@ AC_DEFUN
Does Emacs use -lxft? ${HAVE_XFT}
Does Emacs use -lsystemd? ${HAVE_LIBSYSTEMD}
Does Emacs use -ljansson? ${HAVE_JSON}
+ Does Emacs use -ltree-sitter? ${HAVE_TREE_SITTER}
Does Emacs use the GMP library? ${HAVE_GMP}
Does Emacs directly use zlib? ${HAVE_ZLIB}
Does Emacs have dynamic modules support? ${HAVE_MODULES}
diff --git a/src/Makefile.in b/src/Makefile.in
index 79cddb35b5..bfdfda566e 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -320,6 +320,10 @@ JSON_LIBS =
JSON_CFLAGS = @JSON_CFLAGS@
JSON_OBJ = @JSON_OBJ@
+TREE_SITTER_LIBS = @TREE_SITTER_LIBS@
+TREE_SITTER_FLAGS = @TREE_SITTER_FLAGS@
+TREE_SITTER_OBJ = @TREE_SITTER_OBJ@
+
INTERVALS_H = dispextern.h intervals.h composite.h
GETLOADAVG_LIBS = @GETLOADAVG_LIBS@
@@ -372,7 +376,7 @@ EMACS_CFLAGS=
$(WEBKIT_CFLAGS) $(LCMS2_CFLAGS) \
$(SETTINGS_CFLAGS) $(FREETYPE_CFLAGS) $(FONTCONFIG_CFLAGS) \
$(HARFBUZZ_CFLAGS) $(LIBOTF_CFLAGS) $(M17N_FLT_CFLAGS) $(DEPFLAGS) \
- $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) \
+ $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) $(TREE_SITTER_CFLAGS) \
$(LIBGNUTLS_CFLAGS) $(NOTIFY_CFLAGS) $(CAIRO_CFLAGS) \
$(WERROR_CFLAGS)
ALL_CFLAGS = $(EMACS_CFLAGS) $(WARN_CFLAGS) $(CFLAGS)
@@ -406,7 +410,8 @@ base_obj =
thread.o systhread.o \
$(if $(HYBRID_MALLOC),sheap.o) \
$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \
- $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) $(JSON_OBJ)
+ $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) $(JSON_OBJ) \
+ $(TREE_SITTER_OBJ)
obj = $(base_obj) $(NS_OBJC_OBJ)
## Object files used on some machine or other.
@@ -516,7 +521,7 @@ LIBES =
$(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(HARFBUZZ_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \
$(LIBGNUTLS_LIBS) $(LIB_PTHREAD) $(GETADDRINFO_A_LIBS) $(LCMS2_LIBS) \
$(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS) \
- $(JSON_LIBS) $(LIBGMP) $(LIBGCCJIT)
+ $(JSON_LIBS) $(LIBGMP) $(LIBGCCJIT) $(TREE_SITTER_LIBS)
## FORCE it so that admin/unidata can decide whether this file is
## up-to-date. Although since charprop depends on bootstrap-emacs,
diff --git a/src/alloc.c b/src/alloc.c
index 76d8c7ddd1..f144e053f2 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -50,6 +50,10 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2021 Free Software
#include TERM_HEADER
#endif /* HAVE_WINDOW_SYSTEM */
+#ifdef HAVE_TREE_SITTER
+#include "tree_sitter.h"
+#endif
+
#include <flexmember.h>
#include <verify.h>
#include <execinfo.h> /* For backtrace. */
@@ -3144,6 +3148,15 @@ cleanup_vector (struct Lisp_Vector *vector)
if (uptr->finalizer)
uptr->finalizer (uptr->p);
}
+#ifdef HAVE_TREE_SITTER
+ else if (PSEUDOVECTOR_TYPEP (&vector->header, PVEC_TS_PARSER))
+ {
+ struct Lisp_TS_Parser *lisp_parser
+ = PSEUDOVEC_STRUCT (vector, Lisp_TS_Parser);
+ ts_tree_delete(lisp_parser->tree);
+ ts_parser_delete(lisp_parser->parser);
+ }
+#endif
#ifdef HAVE_MODULES
else if (PSEUDOVECTOR_TYPEP (&vector->header, PVEC_MODULE_FUNCTION))
{
diff --git a/src/emacs.c b/src/emacs.c
index 60a57a693c..ede390231d 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -85,6 +85,7 @@ #define MAIN_PROGRAM
#include "intervals.h"
#include "character.h"
#include "buffer.h"
+#include "tree_sitter.h"
#include "window.h"
#include "xwidget.h"
#include "atimer.h"
@@ -2057,6 +2058,9 @@ main (int argc, char **argv)
syms_of_floatfns ();
syms_of_buffer ();
+ #ifdef HAVE_TREE_SITTER
+ syms_of_tree_sitter ();
+ #endif
syms_of_bytecode ();
syms_of_callint ();
syms_of_casefiddle ();
diff --git a/src/lisp.h b/src/lisp.h
index 4fb8923678..e439447283 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -1070,6 +1070,8 @@ DEFINE_GDB_SYMBOL_END (PSEUDOVECTOR_FLAG)
PVEC_CONDVAR,
PVEC_MODULE_FUNCTION,
PVEC_NATIVE_COMP_UNIT,
+ PVEC_TS_PARSER,
+ PVEC_TS_NODE,
/* These should be last, for internal_equal and sxhash_obj. */
PVEC_COMPILED,
diff --git a/src/print.c b/src/print.c
index d4301fd7b6..e20a1d065a 100644
--- a/src/print.c
+++ b/src/print.c
@@ -48,6 +48,10 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2021 Free Software
# include <sys/socket.h> /* for F_DUPFD_CLOEXEC */
#endif
+#ifdef HAVE_TREE_SITTER
+#include "tree_sitter.h"
+#endif
+
struct terminal;
/* Avoid actual stack overflow in print. */
@@ -1853,6 +1857,19 @@ print_vectorlike (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag,
}
break;
#endif
+
+#ifdef HAVE_TREE_SITTER
+ case PVEC_TS_PARSER:
+ print_c_string ("#<tree-sitter-parser in ", printcharfun);
+ print_string (BVAR (XTS_PARSER (obj)->buffer, name), printcharfun);
+ printchar ('>', printcharfun);
+ break;
+ case PVEC_TS_NODE:
+ print_c_string ("#<tree-sitter-node", printcharfun);
+ printchar ('>', printcharfun);
+ break;
+#endif
+
default:
emacs_abort ();
}
diff --git a/src/tree_sitter.c b/src/tree_sitter.c
new file mode 100644
index 0000000000..f2134c571a
--- /dev/null
+++ b/src/tree_sitter.c
@@ -0,0 +1,145 @@
+/* Tree-sitter integration for GNU Emacs.
+
+Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/param.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include "buffer.h"
+#include "coding.h"
+#include "tree_sitter.h"
+
+/* parser.h defines a macro ADVANCE that conflicts with alloc.c. */
+#include <tree_sitter/parser.h>
+
+Lisp_Object
+make_ts_parser (struct buffer *buffer, TSParser *parser, TSTree *tree)
+{
+ struct Lisp_TS_Parser *lisp_parser
+ = ALLOCATE_PLAIN_PSEUDOVECTOR (struct Lisp_TS_Parser, PVEC_TS_PARSER);
+ lisp_parser->buffer = buffer;
+ lisp_parser->parser = parser;
+ lisp_parser->tree = tree;
+ // TODO TSInput.
+ return make_lisp_ptr (lisp_parser, Lisp_Vectorlike);
+}
+
+Lisp_Object
+make_ts_node (Lisp_Object parser, TSNode node)
+{
+ struct Lisp_TS_Node *lisp_node
+ = ALLOCATE_PSEUDOVECTOR (struct Lisp_TS_Node, parser, PVEC_TS_NODE);
+ lisp_node->parser = parser;
+ lisp_node->node = node;
+ return make_lisp_ptr (lisp_node, Lisp_Vectorlike);
+}
+
+
+/* Tree-sitter parser. */
+
+DEFUN ("tree-sitter-parse", Ftree_sitter_parse, Stree_sitter_parse,
+ 2, 2, 0,
+ doc: /* Parse STRING and return a parser object.
+LANGUAGE should be the language provided by a tree-sitter language
+dynamic module. */)
+ (Lisp_Object string, Lisp_Object language)
+{
+ CHECK_STRING (string);
+
+ /* LANGUAGE is a USER_PTR that contains the pointer to a
+ TSLanguage struct. */
+ TSParser *parser = ts_parser_new ();
+ TSLanguage *lang = (XUSER_PTR (language)->p);
+ ts_parser_set_language (parser, lang);
+
+ TSTree *tree = ts_parser_parse_string (parser, NULL,
+ SSDATA (string),
+ strlen (SSDATA (string)));
+
+ /* See comment for ts_parser_parse in tree_sitter/api.h
+ for possible reasons for a failure. */
+ if (tree == NULL)
+ signal_error ("Failed to parse STRING", string);
+
+ TSNode root_node = ts_tree_root_node (tree);
+
+ Lisp_Object lisp_parser = make_ts_parser (NULL, parser, tree);
+ Lisp_Object lisp_node = make_ts_node (lisp_parser, root_node);
+
+ return lisp_node;
+}
+
+DEFUN ("tree-sitter-node-string",
+ Ftree_sitter_node_string, Stree_sitter_node_string, 1, 1, 0,
+ doc: /* Return the string representation of NODE. */)
+ (Lisp_Object node)
+{
+ TSNode ts_node = XTS_NODE (node)->node;
+ char *string = ts_node_string(ts_node);
+ return make_string(string, strlen (string));
+}
+
+DEFUN ("tree-sitter-node-parent",
+ Ftree_sitter_node_parent, Stree_sitter_node_parent, 1, 1, 0,
+ doc: /* Return the immediate parent of NODE.
+Return nil if couldn't find any. */)
+ (Lisp_Object node)
+{
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode parent = ts_node_parent(ts_node);
+
+ if (ts_node_is_null(parent))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, parent);
+}
+
+DEFUN ("tree-sitter-node-child",
+ Ftree_sitter_node_child, Stree_sitter_node_child, 2, 2, 0,
+ doc: /* Return the Nth child of NODE.
+Return nil if couldn't find any. */)
+ (Lisp_Object node, Lisp_Object n)
+{
+ CHECK_INTEGER (n);
+ EMACS_INT idx = XFIXNUM (n);
+ TSNode ts_node = XTS_NODE (node)->node;
+ // FIXME: Is this cast ok?
+ TSNode child = ts_node_child(ts_node, (uint32_t) idx);
+
+ if (ts_node_is_null(child))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, child);
+}
+
+/* Initialize the tree-sitter routines. */
+void
+syms_of_tree_sitter (void)
+{
+ defsubr (&Stree_sitter_parse);
+ defsubr (&Stree_sitter_node_string);
+ defsubr (&Stree_sitter_node_parent);
+ defsubr (&Stree_sitter_node_child);
+}
diff --git a/src/tree_sitter.h b/src/tree_sitter.h
new file mode 100644
index 0000000000..3c9e03475f
--- /dev/null
+++ b/src/tree_sitter.h
@@ -0,0 +1,87 @@
+/* Header file for the tree-sitter integration.
+
+Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#ifndef EMACS_TREE_SITTER_H
+#define EMACS_TREE_SITTER_H
+
+#include <sys/types.h>
+
+#include "lisp.h"
+
+#include <tree_sitter/api.h>
+
+INLINE_HEADER_BEGIN
+
+struct Lisp_TS_Parser
+{
+ union vectorlike_header header;
+ struct buffer *buffer;
+ TSParser *parser;
+ TSTree *tree;
+ TSInput input;
+};
+
+struct Lisp_TS_Node
+{
+ union vectorlike_header header;
+ /* This should prevent the gc from collecting the parser before the
+ node is done with it. TSNode contains a pointer to the tree it
+ belongs to, and the parser object, when collected by gc, will
+ free that tree. */
+ Lisp_Object parser;
+ TSNode node;
+};
+
+INLINE bool
+TS_PARSERP (Lisp_Object x)
+{
+ return PSEUDOVECTORP (x, PVEC_TS_PARSER);
+}
+
+INLINE struct Lisp_TS_Parser *
+XTS_PARSER (Lisp_Object a)
+{
+ eassert (TS_PARSERP (a));
+ return XUNTAG (a, Lisp_Vectorlike, struct Lisp_TS_Parser);
+}
+
+INLINE bool
+TS_NODEP (Lisp_Object x)
+{
+ return PSEUDOVECTORP (x, PVEC_TS_NODE);
+}
+
+INLINE struct Lisp_TS_Node *
+XTS_NODE (Lisp_Object a)
+{
+ eassert (TS_NODEP (a));
+ return XUNTAG (a, Lisp_Vectorlike, struct Lisp_TS_Node);
+}
+
+Lisp_Object
+make_ts_parser (struct buffer *buffer, TSParser *parser, TSTree *tree);
+
+Lisp_Object
+make_ts_node (Lisp_Object parser, TSNode node);
+
+extern void syms_of_tree_sitter (void);
+
+INLINE_HEADER_END
+
+#endif /* EMACS_TREE_SITTER_H */
--
2.24.3 (Apple Git-128)
[-- Attachment #2.3: Type: text/html, Size: 133 bytes --]
[-- Attachment #2.4: json-module.zip --]
[-- Type: application/zip, Size: 8797 bytes --]
[-- Attachment #2.5: Type: text/html, Size: 184 bytes --]
^ permalink raw reply related [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 2:48 ` Yuan Fu
@ 2021-07-15 6:39 ` Eli Zaretskii
2021-07-15 13:37 ` Fu Yuan
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-15 6:39 UTC (permalink / raw)
To: Yuan Fu; +Cc: monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 14 Jul 2021 22:48:30 -0400
> Cc: emacs-devel <emacs-devel@gnu.org>
>
> I defined two pseudo vectors for tree-sitter's parser and node and packaged a dynamic module for tree-sitter’s json language definition. I also wrapped a few tree-sitter functions just to test if everything works. Please have a look. I’m sure there are some problems because I mainly wrote by copy, paste and modifying from other code I found in Emacs source.
Thanks, but why does it parse only strings, not buffer text?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 6:39 ` Eli Zaretskii
@ 2021-07-15 13:37 ` Fu Yuan
2021-07-15 14:18 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Fu Yuan @ 2021-07-15 13:37 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: monnier, emacs-devel
> 在 2021年7月15日,上午2:39,Eli Zaretskii <eliz@gnu.org> 写道:
>
>
>>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Wed, 14 Jul 2021 22:48:30 -0400
>> Cc: emacs-devel <emacs-devel@gnu.org>
>>
>> I defined two pseudo vectors for tree-sitter's parser and node and packaged a dynamic module for tree-sitter’s json language definition. I also wrapped a few tree-sitter functions just to test if everything works. Please have a look. I’m sure there are some problems because I mainly wrote by copy, paste and modifying from other code I found in Emacs source.
>
> Thanks, but why does it parse only strings, not buffer text?
I haven’t written it yet. I want to make sure the pseudo vector definition and configure files are right before going further. IIRC the contribution guide recommends sending small patches and update along the way.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 13:37 ` Fu Yuan
@ 2021-07-15 14:18 ` Eli Zaretskii
2021-07-15 15:17 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-15 14:18 UTC (permalink / raw)
To: Fu Yuan; +Cc: monnier, emacs-devel
> From: Fu Yuan <casouri@gmail.com>
> Date: Thu, 15 Jul 2021 09:37:27 -0400
> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
>
> > Thanks, but why does it parse only strings, not buffer text?
>
> I haven’t written it yet. I want to make sure the pseudo vector definition and configure files are right before going further. IIRC the contribution guide recommends sending small patches and update along the way.
Great, then please try also to liberate the implementation from using
JSON, it's a major slowdown factor.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 14:18 ` Eli Zaretskii
@ 2021-07-15 15:17 ` Yuan Fu
2021-07-15 15:50 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-15 15:17 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel
> On Jul 15, 2021, at 10:18 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Fu Yuan <casouri@gmail.com>
>> Date: Thu, 15 Jul 2021 09:37:27 -0400
>> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
>>
>>> Thanks, but why does it parse only strings, not buffer text?
>>
>> I haven’t written it yet. I want to make sure the pseudo vector definition and configure files are right before going further. IIRC the contribution guide recommends sending small patches and update along the way.
>
> Great, then please try also to liberate the implementation from using
> JSON, it's a major slowdown factor.
JSON? I didn’t write anything involving JSON.
While you are looking at the patch, here are some questions for integrating tree-sitter with out buffer implementation. What I envisioned is for each buffer to have a `parser-list’, and on buffer change, we update each parser’s tree. I think modifying signal_after_change is enough to cover al the cases? And, for tree-sitter to take the buffer’s content directly, we need to tell it to skip the gap. I only need to modify gap_left, gap_right, make_gap_smaller and make_gap_larger, right?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 15:17 ` Yuan Fu
@ 2021-07-15 15:50 ` Eli Zaretskii
2021-07-15 16:19 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-15 15:50 UTC (permalink / raw)
To: Yuan Fu; +Cc: monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 15 Jul 2021 11:17:02 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> > Great, then please try also to liberate the implementation from using
> > JSON, it's a major slowdown factor.
>
> JSON? I didn’t write anything involving JSON.
Then what is json-module.zip about?
> While you are looking at the patch, here are some questions for integrating tree-sitter with out buffer implementation. What I envisioned is for each buffer to have a `parser-list’, and on buffer change, we update each parser’s tree. I think modifying signal_after_change is enough to cover al the cases?
Why do you need to do this when a buffer is updated? why not use
display as the trigger? Large portions of a buffer will never be
displayed, and some buffers will not be displayed at all. Why waste
cycles on them? Redisplay is perfectly equipped to tell you when some
chunk of buffer text is going to be redrawn, and it already knows to
do nothing if the buffer haven't changed.
> And, for tree-sitter to take the buffer’s content directly, we need to tell it to skip the gap.
AFAIR, tree-sitter allows the calling package to provide a function to
access the text, isn't that so? If so, you could write a function
that accesses buffer text via BYTE_POS_ADDR etc., and that knows how
to skip the gap already.
> I only need to modify gap_left, gap_right, make_gap_smaller and make_gap_larger, right?
Why would you need to _modify_ any of these?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 15:50 ` Eli Zaretskii
@ 2021-07-15 16:19 ` Yuan Fu
2021-07-15 16:26 ` Yuan Fu
2021-07-15 16:48 ` Eli Zaretskii
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-15 16:19 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: monnier, emacs-devel
> On Jul 15, 2021, at 11:50 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 15 Jul 2021 11:17:02 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> emacs-devel@gnu.org
>>
>>> Great, then please try also to liberate the implementation from using
>>> JSON, it's a major slowdown factor.
>>
>> JSON? I didn’t write anything involving JSON.
>
> Then what is json-module.zip about?
That’s a language definition for tree-sitter, so it tells tree-sitter how to parse a JSON file. There are definitions for Python, Ruby, C, etc. I just used JSON for an example. It’s named json-module because it is a dynamic module.
>
>> While you are looking at the patch, here are some questions for integrating tree-sitter with out buffer implementation. What I envisioned is for each buffer to have a `parser-list’, and on buffer change, we update each parser’s tree. I think modifying signal_after_change is enough to cover al the cases?
>
> Why do you need to do this when a buffer is updated? why not use
> display as the trigger? Large portions of a buffer will never be
> displayed, and some buffers will not be displayed at all. Why waste
> cycles on them? Redisplay is perfectly equipped to tell you when some
> chunk of buffer text is going to be redrawn, and it already knows to
> do nothing if the buffer haven't changed.
Tree-sitter expects you to tell it every single change to the parsed text. Say you have a buffer with some content and scrolled through it, so tree-sitter has parsed the whole buffer. Then some elisp edited some text outside the visible portion. Redisplay doesn’t happen, we don’t tell this edit to tree-sitter. Then I scroll to the place that has been edited. What now? I’ve lost the change information, and tree-sitter’s tree is out-dated.
We can fontify on-demand, but we can’t parse on-demand. What we can do is to only parse the portion from BOB to the visible portion. So we won’t parse the whole buffer unless you scroll to the bottom.
>
>> And, for tree-sitter to take the buffer’s content directly, we need to tell it to skip the gap.
>
> AFAIR, tree-sitter allows the calling package to provide a function to
> access the text, isn't that so? If so, you could write a function
> that accesses buffer text via BYTE_POS_ADDR etc., and that knows how
> to skip the gap already.
Yes, that function returns a char*. But what if the gap is in the middle of the portion that tree-sitter wants to read? Alternatively, we can copy the text out and pass it to tree-sitter, but you don’t like that, IIRC.
>
>> I only need to modify gap_left, gap_right, make_gap_smaller and make_gap_larger, right?
>
> Why would you need to _modify_ any of these?
Because I want to let tree-sitter to know where is the gap so it can avoid it when reading text.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 16:19 ` Yuan Fu
@ 2021-07-15 16:26 ` Yuan Fu
2021-07-15 16:50 ` Eli Zaretskii
2021-07-15 16:48 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-15 16:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 830 bytes --]
>
>>
>>> And, for tree-sitter to take the buffer’s content directly, we need to tell it to skip the gap.
>>
>> AFAIR, tree-sitter allows the calling package to provide a function to
>> access the text, isn't that so? If so, you could write a function
>> that accesses buffer text via BYTE_POS_ADDR etc., and that knows how
>> to skip the gap already.
>
> Yes, that function returns a char*. But what if the gap is in the middle of the portion that tree-sitter wants to read? Alternatively, we can copy the text out and pass it to tree-sitter, but you don’t like that, IIRC.
Or we can only copy out when the portion tree-sitter wants encompasses the gap, I expect this case to be relatively rare so we won’t copy out all the time, and most of the time tree-sitter just reads from the buffer directly.
Yuan
[-- Attachment #2: Type: text/html, Size: 3096 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 16:19 ` Yuan Fu
2021-07-15 16:26 ` Yuan Fu
@ 2021-07-15 16:48 ` Eli Zaretskii
2021-07-15 18:23 ` Yuan Fu
2021-07-20 16:25 ` Stephen Leake
1 sibling, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-15 16:48 UTC (permalink / raw)
To: Yuan Fu; +Cc: monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 15 Jul 2021 12:19:31 -0400
> Cc: monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
>
> > Why do you need to do this when a buffer is updated? why not use
> > display as the trigger? Large portions of a buffer will never be
> > displayed, and some buffers will not be displayed at all. Why waste
> > cycles on them? Redisplay is perfectly equipped to tell you when some
> > chunk of buffer text is going to be redrawn, and it already knows to
> > do nothing if the buffer haven't changed.
>
> Tree-sitter expects you to tell it every single change to the parsed text.
That cannot be true, because the parsed text could be in a state where
parsing it will fail. When you are in the middle of writing the code,
this is what will happen many times, even if you pass the whole buffer
to the parser. And since tree-sitter _must_ be able to deal with this
problem, it also must be able to receive incomplete parts of the
buffer text, and do the best it can with it.
> Say you have a buffer with some content and scrolled through it, so tree-sitter has parsed the whole buffer. Then some elisp edited some text outside the visible portion. Redisplay doesn’t happen, we don’t tell this edit to tree-sitter. Then I scroll to the place that has been edited. What now?
Now you call tree-sitter passing it the part of the buffer that needs
to be parsed (e.g., the chunk that is about to be displayed). If
tree-sitter needs to look back, it will.
> I’ve lost the change information, and tree-sitter’s tree is out-dated.
No information is lost because the updated buffer text is available.
> We can fontify on-demand, but we can’t parse on-demand.
Sorry, I don't believe this is true. tree-sitter _must_ be able to
deal with these situations, because it must be able to deal with
incomplete text that cannot be parsed without parse errors.
In addition, Emacs records (for redisplay purposes) two places in each
buffer related to changes: the minimum buffer position before which no
changes were done since last redisplay, and the maximum buffer
position beyond which there were no changes. This can also be used to
pass only a small part of the buffer to the parser, because the rest
didn't change.
> What we can do is to only parse the portion from BOB to the visible portion. So we won’t parse the whole buffer unless you scroll to the bottom.
My primary worry is the fact that you want to use buffer-change hooks
(and will soon enough want to use post-command-hook as well). They
slow down editing, sometimes tremendously, so I'd very much prefer not
to use those hooks for fontification/parsing. The original font-lock
mechanism in Emacs 19 used these hooks; we switched to jit-lock and
its redisplay-triggered fontifications because the original design had
problems which couldn't be solved reliably and with reasonable
performance. I hope we will not make the mistake of going back to
that sub-optimal design.
> >> And, for tree-sitter to take the buffer’s content directly, we need to tell it to skip the gap.
> >
> > AFAIR, tree-sitter allows the calling package to provide a function to
> > access the text, isn't that so? If so, you could write a function
> > that accesses buffer text via BYTE_POS_ADDR etc., and that knows how
> > to skip the gap already.
>
> Yes, that function returns a char*. But what if the gap is in the middle of the portion that tree-sitter wants to read?
If you provide the function that returns text one character at a time,
as AFAIR tree-sitter allows, you will be able to skip the gap
automagically by using BYTE_POS_ADDR. If that's not possible for some
reason, or not performant enough, we could ask tree-sitter developers
to add an API that access buffer text in two chunks, in which case it
will be called first with text before the gap, and then with text
after the gap. Like we do when we call regex search functions.
> Alternatively, we can copy the text out and pass it to tree-sitter, but you don’t like that, IIRC.
Yes, because it means memory allocation, which could be slow,
especially for large buffers. It could even fail if the buffer is
large enough and the system is under memory pressure.
> >> I only need to modify gap_left, gap_right, make_gap_smaller and make_gap_larger, right?
> >
> > Why would you need to _modify_ any of these?
>
> Because I want to let tree-sitter to know where is the gap so it can avoid it when reading text.
Knowing where is the gap doesn't need any changes to these functions.
See GPT_BYTE, GPT_SIZE, BUF_GPT_BYTE, and BUF_GPT_SIZE. And the gap
cannot move while tree-sitter accesses the buffer, because no other
part of the Lisp machine can run at that time.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 16:26 ` Yuan Fu
@ 2021-07-15 16:50 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-15 16:50 UTC (permalink / raw)
To: Yuan Fu; +Cc: monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 15 Jul 2021 12:26:25 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> Or we can only copy out when the portion tree-sitter wants encompasses the gap, I expect this case to be
> relatively rare so we won’t copy out all the time, and most of the time tree-sitter just reads from the buffer
> directly.
Actually, I expect this to happen quite frequently, because the gap is
usually where the editing happens.
We could, of course, move the gap out of the way temporarily, but
that's somewhat expensive, so it is better to avoid it.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 16:48 ` Eli Zaretskii
@ 2021-07-15 18:23 ` Yuan Fu
2021-07-16 7:30 ` Eli Zaretskii
2021-07-20 16:27 ` Stephen Leake
2021-07-20 16:25 ` Stephen Leake
1 sibling, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-15 18:23 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel
> On Jul 15, 2021, at 12:48 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 15 Jul 2021 12:19:31 -0400
>> Cc: monnier@iro.umontreal.ca,
>> emacs-devel@gnu.org
>>
>>> Why do you need to do this when a buffer is updated? why not use
>>> display as the trigger? Large portions of a buffer will never be
>>> displayed, and some buffers will not be displayed at all. Why waste
>>> cycles on them? Redisplay is perfectly equipped to tell you when some
>>> chunk of buffer text is going to be redrawn, and it already knows to
>>> do nothing if the buffer haven't changed.
>>
>> Tree-sitter expects you to tell it every single change to the parsed text.
>
> That cannot be true, because the parsed text could be in a state where
> parsing it will fail. When you are in the middle of writing the code,
> this is what will happen many times, even if you pass the whole buffer
> to the parser. And since tree-sitter _must_ be able to deal with this
> problem, it also must be able to receive incomplete parts of the
> buffer text, and do the best it can with it.
>
>> Say you have a buffer with some content and scrolled through it, so tree-sitter has parsed the whole buffer. Then some elisp edited some text outside the visible portion. Redisplay doesn’t happen, we don’t tell this edit to tree-sitter. Then I scroll to the place that has been edited. What now?
>
> Now you call tree-sitter passing it the part of the buffer that needs
> to be parsed (e.g., the chunk that is about to be displayed). If
> tree-sitter needs to look back, it will.
>
>> I’ve lost the change information, and tree-sitter’s tree is out-dated.
>
> No information is lost because the updated buffer text is available.
>
>> We can fontify on-demand, but we can’t parse on-demand.
>
> Sorry, I don't believe this is true. tree-sitter _must_ be able to
> deal with these situations, because it must be able to deal with
> incomplete text that cannot be parsed without parse errors.
>
I think my assertion was too strong. By “can’t parse on-demand” I mean we can’t easily pass tree-sitter a random chunk of text and not letting it to parse from BOB.
> In addition, Emacs records (for redisplay purposes) two places in each
> buffer related to changes: the minimum buffer position before which no
> changes were done since last redisplay, and the maximum buffer
> position beyond which there were no changes. This can also be used to
> pass only a small part of the buffer to the parser, because the rest
> didn't change.
>
>> What we can do is to only parse the portion from BOB to the visible portion. So we won’t parse the whole buffer unless you scroll to the bottom.
>
> My primary worry is the fact that you want to use buffer-change hooks
> (and will soon enough want to use post-command-hook as well). They
> slow down editing, sometimes tremendously, so I'd very much prefer not
> to use those hooks for fontification/parsing. The original font-lock
> mechanism in Emacs 19 used these hooks; we switched to jit-lock and
> its redisplay-triggered fontifications because the original design had
> problems which couldn't be solved reliably and with reasonable
> performance. I hope we will not make the mistake of going back to
> that sub-optimal design.
I understand. I want to point out that parsing is separated from fontification, and syntax-pass flushes its cache in before-change-hook. I was hoping to use the parse tree for more than fontification, e.g., motion commands like sexp-forward/backward or structural editing commands like expand-region. Another scenario: some elisp edited some text before the visible portion, the tree is not updated, now I want to select the node at point (like expand-region), I look for the leave node that contains the byte position of point. However, because the tree is out-dated, the byte position of point will not correspond to the node I want.
We can still fontify with jit-lock, it’s just parsing cannot easily work like fontification, I expect tree-sitter to work similarly to syntax-pass rather than jit-lock.
>
>>>> And, for tree-sitter to take the buffer’s content directly, we need to tell it to skip the gap.
>>>
>>> AFAIR, tree-sitter allows the calling package to provide a function to
>>> access the text, isn't that so? If so, you could write a function
>>> that accesses buffer text via BYTE_POS_ADDR etc., and that knows how
>>> to skip the gap already.
>>
>> Yes, that function returns a char*. But what if the gap is in the middle of the portion that tree-sitter wants to read?
>
> If you provide the function that returns text one character at a time,
> as AFAIR tree-sitter allows, you will be able to skip the gap
> automagically by using BYTE_POS_ADDR. If that's not possible for some
> reason, or not performant enough, we could ask tree-sitter developers
> to add an API that access buffer text in two chunks, in which case it
> will be called first with text before the gap, and then with text
> after the gap. Like we do when we call regex search functions.
Yes, I make a mistake reading the api. Indeed we can read one character at a time, and gap is not an issue anymore.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 18:23 ` Yuan Fu
@ 2021-07-16 7:30 ` Eli Zaretskii
2021-07-16 14:27 ` Yuan Fu
2021-07-20 16:28 ` Stephen Leake
2021-07-20 16:27 ` Stephen Leake
1 sibling, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-16 7:30 UTC (permalink / raw)
To: Yuan Fu; +Cc: monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 15 Jul 2021 14:23:02 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> >> Say you have a buffer with some content and scrolled through it, so tree-sitter has parsed the whole buffer. Then some elisp edited some text outside the visible portion. Redisplay doesn’t happen, we don’t tell this edit to tree-sitter. Then I scroll to the place that has been edited. What now?
> >
> > Now you call tree-sitter passing it the part of the buffer that needs
> > to be parsed (e.g., the chunk that is about to be displayed). If
> > tree-sitter needs to look back, it will.
> >
> >> I’ve lost the change information, and tree-sitter’s tree is out-dated.
> >
> > No information is lost because the updated buffer text is available.
> >
> >> We can fontify on-demand, but we can’t parse on-demand.
> >
> > Sorry, I don't believe this is true. tree-sitter _must_ be able to
> > deal with these situations, because it must be able to deal with
> > incomplete text that cannot be parsed without parse errors.
> >
> I think my assertion was too strong. By “can’t parse on-demand” I mean we can’t easily pass tree-sitter a random chunk of text and not letting it to parse from BOB.
You must start from BOB only in languages that require that; not every
language does.
And even with languages that require starting from BOB, you could do
that only once, the first time a buffer needs parsing; thereafter, you
can only pass to tree-sitter the parts that were changed since the
last time. Emacs records that information for the display engine, see
BEG_UNCHANGED and END_UNCHANGED. If that is not enough, we could
record more information about changes to buffer text.
The main issue here is to pass the buffer text to tree-sitter lazily,
only when and as much as needed.
> I understand. I want to point out that parsing is separated from fontification, and syntax-pass flushes its cache in before-change-hook. I was hoping to use the parse tree for more than fontification, e.g., motion commands like sexp-forward/backward or structural editing commands like expand-region. Another scenario: some elisp edited some text before the visible portion, the tree is not updated, now I want to select the node at point (like expand-region), I look for the leave node that contains the byte position of point. However, because the tree is out-dated, the byte position of point will not correspond to the node I want.
Each command/feature that needs an updated TS tree will take care of
updating TS with the relevant information. We should record whatever
we need for that as side effect of primitives that change buffer text
(in insdel.c), and use the recorded info to update TS. But the actual
passing of text to TS should happen lazily, when we actually need its
re-parsing, not when the changes to buffer text are done.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-16 7:30 ` Eli Zaretskii
@ 2021-07-16 14:27 ` Yuan Fu
2021-07-16 14:33 ` Stefan Monnier
2021-07-16 15:27 ` Eli Zaretskii
2021-07-20 16:28 ` Stephen Leake
1 sibling, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-16 14:27 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel
>
> Each command/feature that needs an updated TS tree will take care of
> updating TS with the relevant information. We should record whatever
> we need for that as side effect of primitives that change buffer text
> (in insdel.c), and use the recorded info to update TS. But the actual
> passing of text to TS should happen lazily, when we actually need its
> re-parsing, not when the changes to buffer text are done.
Ok, I will write it like that. Another question, how do I add a new field in struct buffer? I tried to add
Lisp_Object ts_parser_list_;
Before
Lisp_Object cursor_in_non_selected_windows_;
But that wouldn't dump.
I want to put the parsers in a field rather than in a buffer local variable because I don’t want users to add/remove parsers from this list freely, otherwise the parsers could go out of sync. I plan to provide functions like add-parser, remove-parser, buffer-parser-list for users to access this list.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-16 14:27 ` Yuan Fu
@ 2021-07-16 14:33 ` Stefan Monnier
2021-07-16 14:53 ` Yuan Fu
2021-07-16 15:27 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-07-16 14:33 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel
> I want to put the parsers in a field rather than in a buffer local variable
> because I don’t want users to add/remove parsers from this list freely,
> otherwise the parsers could go out of sync.
I wouldn't worry 'bout that: Emacs generally doesn't try to stop people
shooting themselves in the foot. So we want to provide a convenient and
safe API but we don't have to hide its inner workings.
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-16 14:33 ` Stefan Monnier
@ 2021-07-16 14:53 ` Yuan Fu
0 siblings, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-16 14:53 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel
> On Jul 16, 2021, at 10:33 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
>> I want to put the parsers in a field rather than in a buffer local variable
>> because I don’t want users to add/remove parsers from this list freely,
>> otherwise the parsers could go out of sync.
>
> I wouldn't worry 'bout that: Emacs generally doesn't try to stop people
> shooting themselves in the foot.
I should’ve figured that out by now ;-)
> So we want to provide a convenient and
> safe API but we don't have to hide its inner workings.
>
Ok, local variable then.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-16 14:27 ` Yuan Fu
2021-07-16 14:33 ` Stefan Monnier
@ 2021-07-16 15:27 ` Eli Zaretskii
2021-07-16 15:51 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-16 15:27 UTC (permalink / raw)
To: Yuan Fu; +Cc: monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 16 Jul 2021 10:27:36 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> Another question, how do I add a new field in struct buffer? I tried to add
>
> Lisp_Object ts_parser_list_;
>
> Before
>
> Lisp_Object cursor_in_non_selected_windows_;
>
> But that wouldn't dump.
Did you see in init_buffer_once what we do with built-in fields of
struct buffer?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-16 15:27 ` Eli Zaretskii
@ 2021-07-16 15:51 ` Yuan Fu
2021-07-17 2:05 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-16 15:51 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel
> On Jul 16, 2021, at 11:27 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 16 Jul 2021 10:27:36 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> emacs-devel@gnu.org
>>
>> Another question, how do I add a new field in struct buffer? I tried to add
>>
>> Lisp_Object ts_parser_list_;
>>
>> Before
>>
>> Lisp_Object cursor_in_non_selected_windows_;
>>
>> But that wouldn't dump.
>
> Did you see in init_buffer_once what we do with built-in fields of
> struct buffer?
I did not, that must be why, thanks. Though I’ve changed to use a buffer-local variable as Stefan suggested.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-16 15:51 ` Yuan Fu
@ 2021-07-17 2:05 ` Yuan Fu
2021-07-17 2:23 ` Clément Pit-Claudel
2021-07-17 6:56 ` Eli Zaretskii
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-17 2:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2008 bytes --]
Please have a look at the second patch that applies on top of the first one. This time I added after-change hooks, so if you create a parser for a buffer and edit that buffer, the parser is kept updated lazily.
In summary, the parser parses the whole buffer on the first time when the user asks for the parse tree. In after-change-hook, no parsing is done, but we do update the trees with position changes. On the next time when the user asks for the parse tree, the whole buffer is re-parsed incrementally. (I didn’t read the paper, but I assume it knows where are the bits to re-parse because we updated the tree with position changes.)
Maybe this is not lazy enough, and I should do a benchmark. This is a simple benchmark that I did:
Benchmark 1: 22M json file, opened in literary mode, try parse the whole buffer, took 17s and uses 3G memory.
Benchmark2: 1.6M json file, opened in fundamental mode, first parsed the whole buffer, took 1.039s, no gc. Then ran this:
(benchmark-run 1000
(dotimes (_ 1000)
(insert
"1,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,\n"))
(dotimes (_ 1000)
(backward-delete-char
(length
"1,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,\n"))))
Result: (39.302071 8 4.3011029999999995) and many gc trimming. Then removes the parser, ran again,
Result: (33.589416 8 4.405495999999999)
No parsing is done in either run (because parsing is lazy, and I didn’t ask for the parse tree). The only difference is that, in the first run, after-change-hook updates the tree with position change. My conclusion is that after-change-hook is pretty insignificant, and the initial parse is a bit slow (on large files).
I’m running this on a 1.4 GHz Quad-Core Intel Core i5 with 16G memory.
Of course, I’m open to suggestions for a better benchmark. The amateur log of the benchmark is in benchmark.el. The json file I used in the second benchmark is benchmark.2.json. The patch is ts.2.patch.
[-- Attachment #2: ts.2.patch --]
[-- Type: application/octet-stream, Size: 11087 bytes --]
From 180aea41cdce11b9b4bdc7da0964c14c0bf8a5f0 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Fri, 16 Jul 2021 21:11:29 -0400
Subject: [PATCH] checkpoint 2: add change-hooks
---
src/insdel.c | 16 +++++
src/tree_sitter.c | 163 ++++++++++++++++++++++++++++++++++++++++++++--
src/tree_sitter.h | 10 +++
3 files changed, 182 insertions(+), 7 deletions(-)
diff --git a/src/insdel.c b/src/insdel.c
index e38b091f54..3c1e13d38b 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -31,6 +31,10 @@
#include "region-cache.h"
#include "pdumper.h"
+#ifdef HAVE_TREE_SITTER
+#include "tree_sitter.h"
+#endif
+
static void insert_from_string_1 (Lisp_Object, ptrdiff_t, ptrdiff_t, ptrdiff_t,
ptrdiff_t, bool, bool);
static void insert_from_buffer_1 (struct buffer *, ptrdiff_t, ptrdiff_t, bool);
@@ -2152,6 +2156,11 @@ signal_before_change (ptrdiff_t start_int, ptrdiff_t end_int,
run_hook (Qfirst_change_hook);
}
+#ifdef HAVE_TREE_SITTER
+ /* FIXME: Is this the best place? */
+ ts_before_change (start_int, end_int);
+#endif
+
/* Now run the before-change-functions if any. */
if (!NILP (Vbefore_change_functions))
{
@@ -2205,6 +2214,13 @@ signal_after_change (ptrdiff_t charpos, ptrdiff_t lendel, ptrdiff_t lenins)
if (inhibit_modification_hooks)
return;
+#ifdef HAVE_TREE_SITTER
+ /* We disrespect combine-after-change, because if we don't record
+ this change, the information that we need (the end byte position
+ of the change) will be lost. */
+ ts_after_change (charpos, lendel, lenins);
+#endif
+
/* If we are deferring calls to the after-change functions
and there are no before-change functions,
just record the args that we were going to use. */
diff --git a/src/tree_sitter.c b/src/tree_sitter.c
index f2134c571a..7d1225161c 100644
--- a/src/tree_sitter.c
+++ b/src/tree_sitter.c
@@ -27,6 +27,7 @@ Copyright (C) 2021 Free Software Foundation, Inc.
#include <stdlib.h>
#include <unistd.h>
+#include "lisp.h"
#include "buffer.h"
#include "coding.h"
#include "tree_sitter.h"
@@ -34,6 +35,98 @@ Copyright (C) 2021 Free Software Foundation, Inc.
/* parser.h defines a macro ADVANCE that conflicts with alloc.c. */
#include <tree_sitter/parser.h>
+/* Record the byte position of the end of the (to-be) changed text.
+We have to record it now, because by the time we get to after-change
+hook, the _byte_ position of the end is lost. */
+void
+ts_before_change (ptrdiff_t start_int, ptrdiff_t end_int)
+{
+ /* Iterate through each parser in 'tree-sitter-parser-list' and
+ record the byte position. There could be better ways to record
+ it than storing the same position in every parser, but this is
+ the most fool-proof way, and I expect a buffer to have only one
+ parser most of the time anyway. */
+ ptrdiff_t beg_byte = CHAR_TO_BYTE (start_int);
+ ptrdiff_t old_end_byte = CHAR_TO_BYTE (end_int);
+ Lisp_Object parser_list = Fsymbol_value (Qtree_sitter_parser_list);
+ while (!NILP (parser_list))
+ {
+ Lisp_Object lisp_parser = Fcar (parser_list);
+ XTS_PARSER (lisp_parser)->edit.start_byte = beg_byte;
+ XTS_PARSER (lisp_parser)->edit.old_end_byte = old_end_byte;
+ parser_list = Fcdr (parser_list);
+ }
+}
+
+/* Update each parser's tree after the user made an edit. This
+function does not parse the buffer and only updates the tree. (So it
+should be very fast.) */
+void
+ts_after_change (ptrdiff_t charpos, ptrdiff_t lendel, ptrdiff_t lenins)
+{
+ ptrdiff_t new_end_byte = CHAR_TO_BYTE (charpos + lenins);
+ Lisp_Object parser_list = Fsymbol_value (Qtree_sitter_parser_list);
+ while (!NILP (parser_list))
+ {
+ Lisp_Object lisp_parser = Fcar (parser_list);
+ TSTree *tree = XTS_PARSER (lisp_parser)->tree;
+ XTS_PARSER (lisp_parser)->edit.new_end_byte = new_end_byte;
+ if (tree != NULL)
+ ts_tree_edit (tree, &XTS_PARSER (lisp_parser)->edit);
+ parser_list = Fcdr (parser_list);
+ }
+}
+
+/* Parse the buffer. We don't parse until we have to. When we have
+to, we call this function to parse and update the tree. */
+void
+ts_ensure_parsed (Lisp_Object parser)
+{
+ TSParser *ts_parser = XTS_PARSER (parser)->parser;
+ TSTree *tree = XTS_PARSER(parser)->tree;
+ TSInput input = XTS_PARSER (parser)->input;
+ TSTree *new_tree = ts_parser_parse(ts_parser, tree, input);
+ XTS_PARSER (parser)->tree = new_tree;
+}
+
+/* This is the read function provided to tree-sitter to read from a
+ buffer. It reads one character at a time and automatically skip
+ the gap. */
+const char*
+ts_read_buffer (void *buffer, uint32_t byte_index,
+ TSPoint position, uint32_t *bytes_read)
+{
+ if (! BUFFER_LIVE_P ((struct buffer *) buffer))
+ error ("BUFFER is not live");
+
+ ptrdiff_t byte_pos = byte_index + 1;
+
+ // FIXME: Add some boundary checks?
+ /* I believe we can get away with only setting current-buffer
+ and not actually switching to it, like what we did in
+ 'make_gap_1'. */
+ struct buffer *old_buffer = current_buffer;
+ current_buffer = (struct buffer *) buffer;
+
+ /* Read one character. */
+ char *beg;
+ int len;
+ if (byte_pos >= Z_BYTE)
+ {
+ beg = "";
+ len = 0;
+ }
+ else
+ {
+ beg = (char *) BYTE_POS_ADDR (byte_pos);
+ len = next_char_len(byte_pos);
+ }
+ *bytes_read = (uint32_t) len;
+ current_buffer = old_buffer;
+ return beg;
+}
+
+/* Wrap the parser in a Lisp_Object to be used in the Lisp machine. */
Lisp_Object
make_ts_parser (struct buffer *buffer, TSParser *parser, TSTree *tree)
{
@@ -42,10 +135,15 @@ make_ts_parser (struct buffer *buffer, TSParser *parser, TSTree *tree)
lisp_parser->buffer = buffer;
lisp_parser->parser = parser;
lisp_parser->tree = tree;
- // TODO TSInput.
+ TSInput input = {buffer, ts_read_buffer, TSInputEncodingUTF8};
+ lisp_parser->input = input;
+ TSPoint dummy_point = {0, 0};
+ TSInputEdit edit = {0, 0, 0, dummy_point, dummy_point, dummy_point};
+ lisp_parser->edit = edit;
return make_lisp_ptr (lisp_parser, Lisp_Vectorlike);
}
+/* Wrap the node in a Lisp_Object to be used in the Lisp machine. */
Lisp_Object
make_ts_node (Lisp_Object parser, TSNode node)
{
@@ -57,19 +155,59 @@ make_ts_node (Lisp_Object parser, TSNode node)
}
-/* Tree-sitter parser. */
+DEFUN ("tree-sitter-create-parser",
+ Ftree_sitter_create_parser, Stree_sitter_create_parser,
+ 2, 2, 0,
+ doc: /* Create and return a parser in BUFFER for LANGUAGE.
+The parser is automatically added to BUFFER's
+`tree-sitter-parser-list'. LANGUAGE should be the language provided
+by a tree-sitter language dynamic module. */)
+ (Lisp_Object buffer, Lisp_Object language)
+{
+ CHECK_BUFFER(buffer);
+
+ /* LANGUAGE is a USER_PTR that contains the pointer to a TSLanguage
+ struct. */
+ TSParser *parser = ts_parser_new ();
+ TSLanguage *lang = (XUSER_PTR (language)->p);
+ ts_parser_set_language (parser, lang);
+
+ Lisp_Object lisp_parser
+ = make_ts_parser (XBUFFER(buffer), parser, NULL);
+
+ // FIXME: Is this the correct way to set a buffer-local variable?
+ struct buffer *old_buffer = current_buffer;
+ set_buffer_internal (XBUFFER (buffer));
+
+ Fset (Qtree_sitter_parser_list,
+ Fcons (lisp_parser, Fsymbol_value (Qtree_sitter_parser_list)));
+
+ set_buffer_internal (old_buffer);
+ return lisp_parser;
+}
+
+DEFUN ("tree-sitter-parser-root-node",
+ Ftree_sitter_parser_root_node, Stree_sitter_parser_root_node,
+ 1, 1, 0,
+ doc: /* Return the root node of PARSER. */)
+ (Lisp_Object parser)
+{
+ ts_ensure_parsed(parser);
+ TSNode root_node = ts_tree_root_node (XTS_PARSER (parser)->tree);
+ return make_ts_node (parser, root_node);
+}
DEFUN ("tree-sitter-parse", Ftree_sitter_parse, Stree_sitter_parse,
2, 2, 0,
- doc: /* Parse STRING and return a parser object.
+ doc: /* Parse STRING and return the root node.
LANGUAGE should be the language provided by a tree-sitter language
dynamic module. */)
(Lisp_Object string, Lisp_Object language)
{
CHECK_STRING (string);
- /* LANGUAGE is a USER_PTR that contains the pointer to a
- TSLanguage struct. */
+ /* LANGUAGE is a USER_PTR that contains the pointer to a TSLanguage
+ struct. */
TSParser *parser = ts_parser_new ();
TSLanguage *lang = (XUSER_PTR (language)->p);
ts_parser_set_language (parser, lang);
@@ -104,7 +242,7 @@ DEFUN ("tree-sitter-node-string",
DEFUN ("tree-sitter-node-parent",
Ftree_sitter_node_parent, Stree_sitter_node_parent, 1, 1, 0,
doc: /* Return the immediate parent of NODE.
-Return nil if couldn't find any. */)
+Return nil if we couldn't find any. */)
(Lisp_Object node)
{
TSNode ts_node = XTS_NODE (node)->node;
@@ -119,7 +257,7 @@ DEFUN ("tree-sitter-node-parent",
DEFUN ("tree-sitter-node-child",
Ftree_sitter_node_child, Stree_sitter_node_child, 2, 2, 0,
doc: /* Return the Nth child of NODE.
-Return nil if couldn't find any. */)
+Return nil if we couldn't find any. */)
(Lisp_Object node, Lisp_Object n)
{
CHECK_INTEGER (n);
@@ -138,6 +276,17 @@ DEFUN ("tree-sitter-node-child",
void
syms_of_tree_sitter (void)
{
+ DEFSYM (Qtree_sitter_parser_list, "tree-sitter-parser-list");
+ DEFVAR_LISP ("ts-parser-list", Vtree_sitter_parser_list,
+ doc: /* A list of tree-sitter parsers.
+// TODO: more doc.
+If you removed a parser from this list, do not put it back in. */);
+ Vtree_sitter_parser_list = Qnil;
+ Fmake_variable_buffer_local (Qtree_sitter_parser_list);
+
+
+ defsubr (&Stree_sitter_create_parser);
+ defsubr (&Stree_sitter_parser_root_node);
defsubr (&Stree_sitter_parse);
defsubr (&Stree_sitter_node_string);
defsubr (&Stree_sitter_node_parent);
diff --git a/src/tree_sitter.h b/src/tree_sitter.h
index 3c9e03475f..0606f336cc 100644
--- a/src/tree_sitter.h
+++ b/src/tree_sitter.h
@@ -28,6 +28,8 @@ #define EMACS_TREE_SITTER_H
INLINE_HEADER_BEGIN
+/* A wrapper for a tree-sitter parser, but also contains a parse tree
+ and other goodies for convenience. */
struct Lisp_TS_Parser
{
union vectorlike_header header;
@@ -35,8 +37,10 @@ #define EMACS_TREE_SITTER_H
TSParser *parser;
TSTree *tree;
TSInput input;
+ TSInputEdit edit;
};
+/* A wrapper around a tree-sitter node. */
struct Lisp_TS_Node
{
union vectorlike_header header;
@@ -74,6 +78,12 @@ XTS_NODE (Lisp_Object a)
return XUNTAG (a, Lisp_Vectorlike, struct Lisp_TS_Node);
}
+void
+ts_before_change (ptrdiff_t charpos, ptrdiff_t lendel);
+
+void
+ts_after_change (ptrdiff_t charpos, ptrdiff_t lendel, ptrdiff_t lenins);
+
Lisp_Object
make_ts_parser (struct buffer *buffer, TSParser *parser, TSTree *tree);
--
2.24.3 (Apple Git-128)
[-- Attachment #3: benchmark.2.json --]
[-- Type: application/json, Size: 1689073 bytes --]
[-- Attachment #4: benchmark.el --]
[-- Type: application/octet-stream, Size: 1134 bytes --]
checkpoint 2 - benchmark.1.json (22M) - open literally
(benchmark-run 10
(tree-sitter-parser-root-node
(tree-sitter-create-parser
(current-buffer) (tree-sitter-json))))
RESULT: stuck, used all my memory (14G and still growing)
(benchmark-run 1
(tree-sitter-parser-root-node
(tree-sitter-create-parser
(current-buffer) (tree-sitter-json))))
17s, 3G memory.
\f
checkpoint 2 - benchmark.2.json (1.6M) - fundamental-mode
(benchmark-run 1
(tree-sitter-parser-root-node
(tree-sitter-create-parser
(current-buffer) (tree-sitter-json))))
(1.039289 0 0.0)
(benchmark-run 1000
(dotimes (_ 1000)
(insert
"1,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,\n"))
(dotimes (_ 1000)
(backward-delete-char
(length
"1,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,\n"))))
With parser: (39.302071 8 4.3011029999999995)
Without parser: (33.589416 8 4.405495999999999)
Note: Warning (undo): Buffer ‘benchmark.2.json’ undo info was
27188988 bytes long. The undo info was discarded because it
exceeded `undo-outer-limit'.
[-- Attachment #5: Type: text/plain, Size: 8 bytes --]
Yuan
^ permalink raw reply related [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 2:05 ` Yuan Fu
@ 2021-07-17 2:23 ` Clément Pit-Claudel
2021-07-17 3:12 ` Yuan Fu
` (2 more replies)
2021-07-17 6:56 ` Eli Zaretskii
1 sibling, 3 replies; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-17 2:23 UTC (permalink / raw)
To: emacs-devel
On 7/16/21 10:05 PM, Yuan Fu wrote:
> My conclusion is that after-change-hook is pretty insignificant, and the initial parse is a bit slow (on large files).
I have no idea if it makes sense, but: does the initial parse need to be synchronous, or could you instead run the parsing in one thread, and the rest of Emacs in another? (I'm talking about concurrent execution, not cooperative threading).
In most cases there should be very limited contention, if at at all: in large buffers most of Emacs' activity will be focused on the (relatively few) characters around the gap, and most of the parser's activity will be reading from the buffer at other positions. You do need to be careful to not read the garbage data from the gap, but otherwise seeing stale or even inconsistent data from the parser thread shouldn't be an issue, since tree-sitter is supposed to be robust to bad parses.
In fact, depending on how robust tree-sitter is, you might even be able to do the concurrency-control optimistically (parse everything up to close to the gap, check that the gap hasn't moved into the region that you read, and then resume reading or rollback).
Alternatively, maybe you could even do a full parse with minimal concurrency control: you'd make sure that the Emacs thread records not just changes to the buffer text but also movements of the gap, and then you could use that list of changes for the next parse?
Anyway, thanks for working on this!
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 2:23 ` Clément Pit-Claudel
@ 2021-07-17 3:12 ` Yuan Fu
2021-07-17 7:18 ` Eli Zaretskii
2021-07-17 7:16 ` Eli Zaretskii
2021-07-17 17:30 ` Stefan Monnier
2 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-17 3:12 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> On Jul 16, 2021, at 10:23 PM, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
>
> On 7/16/21 10:05 PM, Yuan Fu wrote:
>> My conclusion is that after-change-hook is pretty insignificant, and the initial parse is a bit slow (on large files).
>
> I have no idea if it makes sense, but: does the initial parse need to be synchronous, or could you instead run the parsing in one thread, and the rest of Emacs in another? (I'm talking about concurrent execution, not cooperative threading).
>
> In most cases there should be very limited contention, if at at all: in large buffers most of Emacs' activity will be focused on the (relatively few) characters around the gap, and most of the parser's activity will be reading from the buffer at other positions. You do need to be careful to not read the garbage data from the gap, but otherwise seeing stale or even inconsistent data from the parser thread shouldn't be an issue, since tree-sitter is supposed to be robust to bad parses.
>
> In fact, depending on how robust tree-sitter is, you might even be able to do the concurrency-control optimistically (parse everything up to close to the gap, check that the gap hasn't moved into the region that you read, and then resume reading or rollback).
>
> Alternatively, maybe you could even do a full parse with minimal concurrency control: you'd make sure that the Emacs thread records not just changes to the buffer text but also movements of the gap, and then you could use that list of changes for the next parse?
Another way I thought about is to only “expose” the portion of buffer from BOB to some point to tree-sitter. And when a user asks for a parse tree, he also specifies to which point of the buffer he needs the parse tree for. For example, for fortification, jit-lock only needs the tree up to the end of the visible window. And for structure editing, asking for the portion up to window-end + a few thousand characters might be enough. However this heuristic could have problems in practice. (Maybe a giant comment section of thousands of characters follows, and instead of jumping to the end of it, we wrongly jump to middle of that comment section, because tree-sitter only “sees” to that point.) So I don’t know if it’s a good idea.
>
> Anyway, thanks for working on this!
>
I figure that this is low-tech enough that an amateur like me could possibly do it ;-)
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 2:05 ` Yuan Fu
2021-07-17 2:23 ` Clément Pit-Claudel
@ 2021-07-17 6:56 ` Eli Zaretskii
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-17 6:56 UTC (permalink / raw)
To: Yuan Fu; +Cc: monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 16 Jul 2021 22:05:01 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> Please have a look at the second patch that applies on top of the first one. This time I added after-change hooks, so if you create a parser for a buffer and edit that buffer, the parser is kept updated lazily.
Instead of using the hook machinery, it is better to augment the
insdel.c functions to directly update the information you need to
keep. If you do it directly in the insdel.c functions that modify the
buffer text, you can be much more accurate in updating the information
about the changes, because each insdel.c function performs a
well-defined operation of the buffer text. By contrast, buffer-change
hooks are higher-level functionality, meant for Lisp programs, so they
don't necessarily make it easy to reverse-engineer the specific
changes. All you have there is some higher-level information about
which part of the buffer changed. Moreover, the hooks are sometimes
called more times than they should be, to be on the safe side.
As a trivial (but not insignificant!) optimization, primitive insdel.c
functions always know both the character and the byte positions in the
buffer they change, so this code you needed in your hooks:
> + ptrdiff_t beg_byte = CHAR_TO_BYTE (start_int);
> + ptrdiff_t old_end_byte = CHAR_TO_BYTE (end_int);
could be avoided. Converting character to byte positions can
sometimes significantly slow down the code, for example if there are a
lot of markers in the buffer.
So I urge you to record the change information directly in the
primitive functions of insdel.c, not in the hooks.
> In summary, the parser parses the whole buffer on the first time when the user asks for the parse tree. In after-change-hook, no parsing is done, but we do update the trees with position changes. On the next time when the user asks for the parse tree, the whole buffer is re-parsed incrementally. (I didn’t read the paper, but I assume it knows where are the bits to re-parse because we updated the tree with position changes.)
Why do you update the entire parser list for every modification? This
comment:
> +void
> +ts_before_change (ptrdiff_t start_int, ptrdiff_t end_int)
> +{
> + /* Iterate through each parser in 'tree-sitter-parser-list' and
> + record the byte position. There could be better ways to record
> + it than storing the same position in every parser, but this is
> + the most fool-proof way, and I expect a buffer to have only one
> + parser most of the time anyway. */
already says that there are better ways: record the change info just
once in one place. Then propagate that to the entire list, if you
need, when you actually need to call TS. It is possible that at the
call time you will know which parser needs to be called, and will be
able to update only that parser, not the entire list.
> + // FIXME: Add some boundary checks?
> + /* I believe we can get away with only setting current-buffer
> + and not actually switching to it, like what we did in
> + 'make_gap_1'. */
> + struct buffer *old_buffer = current_buffer;
> + current_buffer = (struct buffer *) buffer;
This looks unnecessary: we have BUF_BYTE_ADDRESS, which accepts the
buffer as its argument, and the corresponding buf_next_char_len. IOW,
why did you need to switch to the buffer?
> + /* Read one character. */
> + char *beg;
> + int len;
> + if (byte_pos >= Z_BYTE)
> + {
> + beg = "";
> + len = 0;
> + }
Is getting an empty string what TS wants when it attempts to read
beyond EOB?
Also, why do you test Z_BYTE and not ZV_BYTE (actually, BUF_ZV_BYTE)?
Emacs in general behaves as if text beyond point-max didn't exist, why
should code supported by the TS parser behave differently?
> + beg = (char *) BYTE_POS_ADDR (byte_pos);
> + len = next_char_len(byte_pos);
This is sub-optimal: next_char_len also calls BYTE_POS_ADDR. Why not
use BYTES_BY_CHAR_HEAD instead?
Thanks for working on this.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 2:23 ` Clément Pit-Claudel
2021-07-17 3:12 ` Yuan Fu
@ 2021-07-17 7:16 ` Eli Zaretskii
2021-07-20 20:36 ` Clément Pit-Claudel
2021-07-17 17:30 ` Stefan Monnier
2 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-17 7:16 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Fri, 16 Jul 2021 22:23:26 -0400
>
> On 7/16/21 10:05 PM, Yuan Fu wrote:
> > My conclusion is that after-change-hook is pretty insignificant, and the initial parse is a bit slow (on large files).
>
> I have no idea if it makes sense, but: does the initial parse need to be synchronous, or could you instead run the parsing in one thread, and the rest of Emacs in another? (I'm talking about concurrent execution, not cooperative threading).
You cannot have a thread freely accessing buffer text when the Lisp
machine is allowed to run concurrently with this, because the Lisp
machine can change the buffer text.
> In most cases there should be very limited contention, if at at all: in large buffers most of Emacs' activity will be focused on the (relatively few) characters around the gap, and most of the parser's activity will be reading from the buffer at other positions.
When Emacs moves or enlarges/shrinks the gap, that affects the entire
buffer text after the gap, regardless of where the gap is. So it will
affect the TS reader if it reads stuff after the gap.
> You do need to be careful to not read the garbage data from the gap, but otherwise seeing stale or even inconsistent data from the parser thread shouldn't be an issue, since tree-sitter is supposed to be robust to bad parses.
What would be the purpose of calling the parser if we know in advance
it will fail when it gets to the "garbage" caused by async access to
the buffer text?
And besides, current Emacs primitives that access buffer text don't
necessarily do that atomically, since the assumption built into their
design is that no one should access that text at the same time. So
you could have windows where the buffer text is in inconsistent state,
like if the gap was moved, but the variables which tell where the gap
is were not yet updated, or windows where a multibyte character was
not yet completely written or deleted to/from the buffer, resulting in
invalid multibyte sequences and inconsistent values of EOB.
So I don't see how this could be done without some inter-locking.
And what do you want the code which requested parsing do while the
parse thread runs? The requesting code is in the main thread, so if
it just waits, you don't gain anything.
> In fact, depending on how robust tree-sitter is, you might even be able to do the concurrency-control optimistically (parse everything up to close to the gap, check that the gap hasn't moved into the region that you read, and then resume reading or rollback).
I don't understand what you suggest here. For starters, the gap could
move (assuming you are still talking about a separate thread that does
the parsing), and what do we do then?
> Alternatively, maybe you could even do a full parse with minimal concurrency control: you'd make sure that the Emacs thread records not just changes to the buffer text but also movements of the gap, and then you could use that list of changes for the next parse?
I don't understand what could recording the gap solve. The stuff in
the gap is generally garbage, and can easily include invalid multibyte
sequences. I don't think it's a good idea to pass that to TS. Also,
recording the gap changes in the main thread and accessing that
information from a concurrent thread again opens a window for races,
and requires synchronization.
Bottom line, I think what you are suggesting is premature
optimization: we don't yet know that we will need this. If the TS
performance information is reliable, it should be fast enough for our
purposes; we just need to come up with an optimal way of calling it so
that we don't impose unnecessary delays.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 3:12 ` Yuan Fu
@ 2021-07-17 7:18 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-17 7:18 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 16 Jul 2021 23:12:00 -0400
> Cc: emacs-devel@gnu.org
>
> Another way I thought about is to only “expose” the portion of buffer from BOB to some point to tree-sitter. And when a user asks for a parse tree, he also specifies to which point of the buffer he needs the parse tree for. For example, for fortification, jit-lock only needs the tree up to the end of the visible window. And for structure editing, asking for the portion up to window-end + a few thousand characters might be enough.
Yes, I think we should only ask TS to parse what we need, not more.
> However this heuristic could have problems in practice. (Maybe a giant comment section of thousands of characters follows, and instead of jumping to the end of it, we wrongly jump to middle of that comment section, because tree-sitter only “sees” to that point.) So I don’t know if it’s a good idea.
It's definitely a good idea that should be pursued. Even if in some
specific situation you'd need to pass to TS a large part of buffer
text, it will help in the other cases.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 2:23 ` Clément Pit-Claudel
2021-07-17 3:12 ` Yuan Fu
2021-07-17 7:16 ` Eli Zaretskii
@ 2021-07-17 17:30 ` Stefan Monnier
2021-07-17 17:54 ` Eli Zaretskii
` (2 more replies)
2 siblings, 3 replies; 370+ messages in thread
From: Stefan Monnier @ 2021-07-17 17:30 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
In your benchmark , you give numbers for:
- initial full-text parse (a bit above 1MB/s)
- cost of update-without-reparse
but I think it would be nice to see the cost of the reparse after
those updates (should be much faster than the initial parse).
Clément said:
> I have no idea if it makes sense, but: does the initial parse need to be
> synchronous, or could you instead run the parsing in one thread, and the
> rest of Emacs in another? (I'm talking about concurrent execution, not
> cooperative threading).
If we copy the buffer's content to a freshly malloc area before passing
that to TS, then there should be no problem running TS in a separate
concurrent thread, indeed.
Eli said:
> Why do you update the entire parser list for every modification?
> This comment:
If having multiple parsers in a single buffer is a not-uncommon case,
then indeed we'll need to do better, but if we assume this is an
anomalous situation, then Yuan's code is optimal ;-)
> Yes, I think we should only ask TS to parse what we need, not more.
We'll need to experiment with that. Using an approach like
`syntax-ppss` where we only parse up to some high-watermark might be
a good approach, but it's also possible that it will work poorly: if TS
assumes it works on the whole buffer, then it will see the truncated
text as a syntax error and while it is supposed to handle syntax errors
nicely it may still lead to suboptimal behavior when parts of perfectly
valid code is misparsed because the parser was not allowed to see the
closing braces that make it "perfectly valid".
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 17:30 ` Stefan Monnier
@ 2021-07-17 17:54 ` Eli Zaretskii
2021-07-24 14:08 ` Stefan Monnier
2021-07-19 15:16 ` Yuan Fu
2021-07-20 16:32 ` Stephen Leake
2 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-17 17:54 UTC (permalink / raw)
To: Stefan Monnier; +Cc: cpitclaudel, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org
> Date: Sat, 17 Jul 2021 13:30:40 -0400
>
> Clément said:
> > I have no idea if it makes sense, but: does the initial parse need to be
> > synchronous, or could you instead run the parsing in one thread, and the
> > rest of Emacs in another? (I'm talking about concurrent execution, not
> > cooperative threading).
>
> If we copy the buffer's content to a freshly malloc area before passing
> that to TS, then there should be no problem running TS in a separate
> concurrent thread, indeed.
Making a copy of the buffer is a non-starter from where I stand. It
doesn't scale, for starters. I don't see any reason to go to such a
complex design at this early stage.
> Eli said:
> > Why do you update the entire parser list for every modification?
> > This comment:
>
> If having multiple parsers in a single buffer is a not-uncommon case,
> then indeed we'll need to do better, but if we assume this is an
> anomalous situation, then Yuan's code is optimal ;-)
>
> > Yes, I think we should only ask TS to parse what we need, not more.
>
> We'll need to experiment with that.
We can experiment, but I think the basic design should be clean and
reasonable from the get-go.
> Using an approach like `syntax-ppss` where we only parse up to some
> high-watermark might be a good approach, but it's also possible that
> it will work poorly: if TS assumes it works on the whole buffer,
> then it will see the truncated text as a syntax error and while it
> is supposed to handle syntax errors nicely it may still lead to
> suboptimal behavior when parts of perfectly valid code is misparsed
> because the parser was not allowed to see the closing braces that
> make it "perfectly valid".
TS must be able to handle these situation well enough, because they
happen during editing all the time. I wouldn't worry about that,
definitely not at this stage.
Different uses of the parse results will need to pass different chunks
of buffer text, and that is okay.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 17:30 ` Stefan Monnier
2021-07-17 17:54 ` Eli Zaretskii
@ 2021-07-19 15:16 ` Yuan Fu
2021-07-22 3:10 ` Yuan Fu
2021-07-20 16:32 ` Stephen Leake
2 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-19 15:16 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Clément Pit-Claudel, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]
> On Jul 17, 2021, at 1:30 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> In your benchmark , you give numbers for:
> - initial full-text parse (a bit above 1MB/s)
> - cost of update-without-reparse
>
> but I think it would be nice to see the cost of the reparse after
> those updates (should be much faster than the initial parse).
I have done some more benchmark. Initially I thought tree-sitter doesn’t scale, because re-parsing my JSON file is unexpectedly slow, but then I retired with xdisp.c with tree-sitter's C parser, and that is really fast and matches my expectation of tree-sitter. So from now on I’ll use xdispf.c and the C parser for benchmarking. I guess the json parser is simply bad-written?
I benchmarked with a simple C program. The programs are in main-c.c and main-json.c, and the shell output of the measurements is in benchmark.3.txt.
JSON: Initial parse takes 1.2s, re-parse (with no change) takes 0.7s, uses 307MB memory
C: Initial parse takes 0.14s, re-parse (with no change) takes 0.009s, uses 20MB memory
Yuan
[-- Attachment #2: benchmark.3.txt --]
[-- Type: text/plain, Size: 2875 bytes --]
On benchmark.2.json (1.6M)
One full parse: 1.2s
________________________________________________________
Executed in 1.30 secs fish external
usr time 1210.81 millis 142.00 micros 1210.67 millis
sys time 87.40 millis 756.00 micros 86.65 millis
One full parse and a re-parse:
________________________________________________________
Executed in 2.40 secs fish external
usr time 1.95 secs 154.00 micros 1.95 secs
sys time 0.15 secs 763.00 micros 0.15 secs
Re-parse takes 1.95 - 1.21 = 0.74s
Memory usage of full-parse + re-parse:
2.17 real 2.00 user 0.16 sys
307269632 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
75035 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
463 involuntary context switches
14674957821 instructions retired
7838514409 cycles elapsed
306745344 peak memory footprint
307MB for two trees that "shares internal structure".
\f
On xdisp.c (1.2M)
One full paese: 0.139s
________________________________________________________
Executed in 478.23 millis fish external
usr time 139.69 millis 134.00 micros 139.55 millis
sys time 8.05 millis 829.00 micros 7.22 millis
Full parse and re-parse:
________________________________________________________
Executed in 456.58 millis fish external
usr time 148.23 millis 153.00 micros 148.08 millis
sys time 9.08 millis 791.00 micros 8.29 millis
148 - 139 = 0.009s
Memory usage of full-parse + re-parse:
0.16 real 0.15 user 0.00 sys
20131840 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
4932 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
28 involuntary context switches
1070525817 instructions retired
581557699 cycles elapsed
19271680 peak memory footprint
20MB
[-- Attachment #3: main-c.c --]
[-- Type: application/octet-stream, Size: 1180 bytes --]
#include <string.h>
#include <stdio.h>
#include <tree_sitter/api.h>
TSLanguage *tree_sitter_c();
struct buffer {
char *buf;
long len;
};
const char *read_file(void *payload, uint32_t byte_index,
TSPoint position, uint32_t *bytes_read) {
long len = ((struct buffer *) payload)->len;
if (byte_index >= len) {
*bytes_read = 0;
return (char *) "";
} else {
*bytes_read = len - byte_index;
return (char *) (((struct buffer *) payload)->buf) + byte_index;
}
}
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_c());
/* Copy the file into BUFFER. */
FILE *file = fopen("xdisp.c", "rb");
fseek(file, 0, SEEK_END);
long length = ftell (file);
fseek(file, 0, SEEK_SET);
char *buffer = malloc (length);
fread(buffer, 1, length, file);
fclose (file);
struct buffer buf = {buffer, length};
TSInput input = {&buf, read_file, TSInputEncodingUTF8};
TSTree *tree = ts_parser_parse(parser, NULL, input);
TSTree *new_tree = ts_parser_parse(parser, tree, input);
free(buffer);
ts_tree_delete(tree);
ts_tree_delete(new_tree);
ts_parser_delete(parser);
return 0;
}
[-- Attachment #4: main-json.c --]
[-- Type: application/octet-stream, Size: 1195 bytes --]
#include <string.h>
#include <stdio.h>
#include <tree_sitter/api.h>
TSLanguage *tree_sitter_json();
struct buffer {
char *buf;
long len;
};
const char *read_file(void *payload, uint32_t byte_index,
TSPoint position, uint32_t *bytes_read) {
long len = ((struct buffer *) payload)->len;
if (byte_index >= len) {
*bytes_read = 0;
return (char *) "";
} else {
*bytes_read = len - byte_index;
return (char *) (((struct buffer *) payload)->buf) + byte_index;
}
}
int main() {
TSParser *parser = ts_parser_new();
ts_parser_set_language(parser, tree_sitter_json());
/* Copy the file into BUFFER. */
FILE *file = fopen("benchmark.3.json", "rb");
fseek(file, 0, SEEK_END);
long length = ftell (file);
fseek(file, 0, SEEK_SET);
char *buffer = malloc (length);
fread(buffer, 1, length, file);
fclose (file);
struct buffer buf = {buffer, length};
TSInput input = {&buf, read_file, TSInputEncodingUTF8};
TSTree *tree = ts_parser_parse(parser, NULL, input);
TSTree *new_tree = ts_parser_parse(parser, tree, input);
free(buffer);
ts_tree_delete(tree);
ts_tree_delete(new_tree);
ts_parser_delete(parser);
return 0;
}
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 16:48 ` Eli Zaretskii
2021-07-15 18:23 ` Yuan Fu
@ 2021-07-20 16:25 ` Stephen Leake
2021-07-20 16:45 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-20 16:25 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, monnier, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 15 Jul 2021 12:19:31 -0400
>> Cc: monnier@iro.umontreal.ca,
>> emacs-devel@gnu.org
>>
>> > Why do you need to do this when a buffer is updated? why not use
>> > display as the trigger? Large portions of a buffer will never be
>> > displayed, and some buffers will not be displayed at all. Why waste
>> > cycles on them? Redisplay is perfectly equipped to tell you when some
>> > chunk of buffer text is going to be redrawn, and it already knows to
>> > do nothing if the buffer haven't changed.
>>
>> Tree-sitter expects you to tell it every single change to the parsed text.
>
> That cannot be true, because the parsed text could be in a state where
> parsing it will fail.
You can relax this to "when a parse is requested, tree-sitter must be
given the net changes to the text". You can combine several changes into
one, if that saves time or something.
But tree-sitter does have to deal with incorrect syntax.
> When you are in the middle of writing the code, this is what will
> happen many times, even if you pass the whole buffer to the parser.
Yes.
> And since tree-sitter _must_ be able to deal with this problem, it
> also must be able to receive incomplete parts of the buffer text, and
> do the best it can with it.
That does not follow.
I took that approach with ada-mode, and the results are not good. Mostly
this is because Ada requires always parsing from BOB, so parsing only
part of the buffer is bound to give bad results.
Knowing the changes from a previous complete parse allows the parser to
do a much better job.
>> Say you have a buffer with some content and scrolled through it, so
>> tree-sitter has parsed the whole buffer. Then some elisp edited some
>> text outside the visible portion. Redisplay doesn’t happen, we don’t
>> tell this edit to tree-sitter. Then I scroll to the place that has
>> been edited. What now?
>
> Now you call tree-sitter passing it the part of the buffer that needs
> to be parsed (e.g., the chunk that is about to be displayed). If
> tree-sitter needs to look back, it will.
No, you pass tree-sitter the net list of changes since the last parse
was requested. Changes outside the visible region can easily affect the
visible region; consider inserting a comment or block start or end.
>> I’ve lost the change information, and tree-sitter’s tree is out-dated.
>
> No information is lost because the updated buffer text is available.
That is useful only if the previous buffer text is also available, so
you can diff it. It is more efficient to keep a list of changes.
Although if that list grows too large, it can be better to simply start
over, and parse the whole buffer again.
> In addition, Emacs records (for redisplay purposes) two places in each
> buffer related to changes: the minimum buffer position before which no
> changes were done since last redisplay, and the maximum buffer
> position beyond which there were no changes. This can also be used to
> pass only a small part of the buffer to the parser, because the rest
> didn't change.
Again, the input to tree-sitter is a list of changes, not a block of
text containing changes.
That is because of the way incremental parsing works.
The list of changes to the buffer text are used to edit the parse tree,
deleting nodes that represent deleted or modified text, lexing the new
text to create new nodes.
Then the parser is run on the edited tree, _not_ on the buffer text. The
parser adds new nodes as appropriate to arrive at a complete parse tree.
There's no point in trying to tell the parser how much to parse; any
non-edited portion of the original text will be represented in the
edited tree by one or a small number of nodes; the parser then consumes
those quickly.
>> What we can do is to only parse the portion from BOB to the visible
>> portion. So we won’t parse the whole buffer unless you scroll to the
>> bottom.
You can stop parsing at the end of a complete grammar production; in
languages that require parsing from BOB, that is always EOB. The parser
cannot stop at an arbitrary point in the text; that would leave an
incomplete tree.
The point of incremental parsing is that parsing unchanged text is very
fast, because it is represented by a small number of nodes in the edited
tree.
> My primary worry is the fact that you want to use buffer-change hooks
> (and will soon enough want to use post-command-hook as well). They
> slow down editing, sometimes tremendously, so I'd very much prefer not
> to use those hooks for fontification/parsing. The original font-lock
> mechanism in Emacs 19 used these hooks; we switched to jit-lock and
> its redisplay-triggered fontifications because the original design had
> problems which couldn't be solved reliably and with reasonable
> performance. I hope we will not make the mistake of going back to
> that sub-optimal design.
Ah. That could be a problem; incremental parsing fundamentally requires
a list of changes.
If the parser is in an Emacs module, so it has direct access to the
buffer, then the hooks only need to record the buffer positions of the
insertions and deletions, not the new text. That should be very fast.
Then the parse is only requested when the results are needed for
something, like indent or fontify.
That is how wisi works, except the parser is currently in an external
process, so the buffer change hooks also have to store the new text,
which can be large. Which is a good reason to improve wisi to support
the parser in a module.
In addition, the code that computes the requested information
(fontification or indentation) takes region bounds as input, and only
computes the information for that region (using the full parse tree);
that is much faster than always computing all information for the entire
buffer.
eglot, on the other hand, sends the change information to the LSP server
immediately (or after small delay), and then tries to do something with
the response, rather than waiting until some event triggers a need for
information from the server.
I'm guessing that font-lock ran the actual fontification functions from
the buffer-change hooks; that would be slow.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-15 18:23 ` Yuan Fu
2021-07-16 7:30 ` Eli Zaretskii
@ 2021-07-20 16:27 ` Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-20 16:27 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, Stefan Monnier, emacs-devel
Yuan Fu <casouri@gmail.com> writes:
> ... I was hoping to use the parse tree for more than
> fontification, e.g., motion commands like sexp-forward/backward or
> structural editing commands like expand-region.
wisi currently supports this.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-16 7:30 ` Eli Zaretskii
2021-07-16 14:27 ` Yuan Fu
@ 2021-07-20 16:28 ` Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-20 16:28 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, monnier, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
> Each command/feature that needs an updated TS tree will take care of
> updating TS with the relevant information. We should record whatever
> we need for that as side effect of primitives that change buffer text
> (in insdel.c), and use the recorded info to update TS. But the actual
> passing of text to TS should happen lazily, when we actually need its
> re-parsing, not when the changes to buffer text are done.
Yes.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 17:30 ` Stefan Monnier
2021-07-17 17:54 ` Eli Zaretskii
2021-07-19 15:16 ` Yuan Fu
@ 2021-07-20 16:32 ` Stephen Leake
2021-07-20 16:48 ` Eli Zaretskii
` (2 more replies)
2 siblings, 3 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-20 16:32 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Clément Pit-Claudel, emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
> In your benchmark , you give numbers for:
> - initial full-text parse (a bit above 1MB/s)
> - cost of update-without-reparse
>
> but I think it would be nice to see the cost of the reparse after
> those updates (should be much faster than the initial parse).
>
> Clément said:
>> I have no idea if it makes sense, but: does the initial parse need to be
>> synchronous, or could you instead run the parsing in one thread, and the
>> rest of Emacs in another? (I'm talking about concurrent execution, not
>> cooperative threading).
>
> If we copy the buffer's content to a freshly malloc area before passing
> that to TS, then there should be no problem running TS in a separate
> concurrent thread, indeed.
Except that the results will not be useful, since they won't apply to
the original buffer if it is changed. And if the original buffer is not
changed, then we do not need to run the parser asynchronously.
Computing fontification and indentation must be synchronous.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 16:25 ` Stephen Leake
@ 2021-07-20 16:45 ` Eli Zaretskii
2021-07-21 15:49 ` Stephen Leake
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-20 16:45 UTC (permalink / raw)
To: Stephen Leake; +Cc: casouri, monnier, emacs-devel
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: Yuan Fu <casouri@gmail.com>, monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
> Date: Tue, 20 Jul 2021 09:25:11 -0700
>
> > In addition, Emacs records (for redisplay purposes) two places in each
> > buffer related to changes: the minimum buffer position before which no
> > changes were done since last redisplay, and the maximum buffer
> > position beyond which there were no changes. This can also be used to
> > pass only a small part of the buffer to the parser, because the rest
> > didn't change.
>
> Again, the input to tree-sitter is a list of changes, not a block of
> text containing changes.
I fail to see the significance of the difference. Surely, you could
hand it a block of text with changes to mean that this block replaces
the previous version of that block. It might take the parser more
work to update the parse tree in this case, but if it's fast enough,
that won't be the problem. Right?
> If the parser is in an Emacs module, so it has direct access to the
> buffer, then the hooks only need to record the buffer positions of the
> insertions and deletions, not the new text. That should be very fast.
(You are talking about the undo-list.)
But even this is wasteful: it is quite customary to delete, then
re-insert, then re-delete again, etc. several times. So collecting
these operations will produce much more "changes" than strictly
needed. That's why I'm trying to find a simpler, less wasteful
strategies. Since TS is very fast, we can trade some of the speed for
simpler, more scalable design of tracking changes.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 16:32 ` Stephen Leake
@ 2021-07-20 16:48 ` Eli Zaretskii
2021-07-20 17:38 ` Stefan Monnier
2021-07-20 17:36 ` Stefan Monnier
2021-07-20 18:04 ` Clément Pit-Claudel
2 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-20 16:48 UTC (permalink / raw)
To: Stephen Leake; +Cc: cpitclaudel, monnier, emacs-devel
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Date: Tue, 20 Jul 2021 09:32:23 -0700
> Cc: Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
>
> Computing fontification and indentation must be synchronous.
I wouldn't say "must", but going async on them certainly brings in a
lot more complexity, and we should avoid that unless it's REALLY
needed.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 16:32 ` Stephen Leake
2021-07-20 16:48 ` Eli Zaretskii
@ 2021-07-20 17:36 ` Stefan Monnier
2021-07-20 18:05 ` Clément Pit-Claudel
2021-07-21 16:02 ` Stephen Leake
2021-07-20 18:04 ` Clément Pit-Claudel
2 siblings, 2 replies; 370+ messages in thread
From: Stefan Monnier @ 2021-07-20 17:36 UTC (permalink / raw)
To: Stephen Leake; +Cc: Clément Pit-Claudel, emacs-devel
>> If we copy the buffer's content to a freshly malloc area before passing
>> that to TS, then there should be no problem running TS in a separate
>> concurrent thread, indeed.
> Except that the results will not be useful, since they won't apply to
> the original buffer if it is changed.
Not true: we just have to keep track of the list of changes (as Yuan's
patch does), then pass it to tree-sitter to get a tree up-to-date w.r.t
the current content of the buffer.
> And if the original buffer is not changed, then we do not need to run
> the parser asynchronously.
We do:
- because we want to do other things in the mean time
- because we want to take advantage of the many CPU cores sitting idle.
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 16:48 ` Eli Zaretskii
@ 2021-07-20 17:38 ` Stefan Monnier
0 siblings, 0 replies; 370+ messages in thread
From: Stefan Monnier @ 2021-07-20 17:38 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stephen Leake, cpitclaudel, emacs-devel
>> Computing fontification and indentation must be synchronous.
> I wouldn't say "must", but going async on them certainly brings in a
> lot more complexity, and we should avoid that unless it's REALLY
> needed.
Agreed. Tree-sitter's *re*parse is supposed to be fast enough
for that. My suggestion to do it concurrently was mostly aimed at the
initial parse (which does imply that the initial fontification would be
async for those modes which depend on tree-sitter for fontification).
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 16:32 ` Stephen Leake
2021-07-20 16:48 ` Eli Zaretskii
2021-07-20 17:36 ` Stefan Monnier
@ 2021-07-20 18:04 ` Clément Pit-Claudel
2021-07-20 18:24 ` Eli Zaretskii
2021-07-21 16:54 ` [SPAM UNSURE] " Stephen Leake
2 siblings, 2 replies; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-20 18:04 UTC (permalink / raw)
To: Stephen Leake, Stefan Monnier; +Cc: emacs-devel
On 7/20/21 12:32 PM, Stephen Leake wrote:
> Computing fontification and indentation must be synchronous.
Must? What makes you say that?
> Except that the results will not be useful, since they won't apply to
the original buffer if it is changed.
Then you will send the additional changes and wait.
TS is an incremental parser, so the work it will have done incorporating part of the changes will not be wasted.
Concrete example: if you have a bit of elisp that runs for .5s to make modifications to the buffer, then press "indent", and only then do you send changes to TS and wait for the response synchronously, then you will wait for .5s + time to incorporate all changes. If you start processing the changes in parallel as they are made by the Elisp code, then you will only wait for .5s + time to incorporate only the changes that had not been processed yet.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 17:36 ` Stefan Monnier
@ 2021-07-20 18:05 ` Clément Pit-Claudel
2021-07-21 16:02 ` Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-20 18:05 UTC (permalink / raw)
To: Stefan Monnier, Stephen Leake; +Cc: emacs-devel
On 7/20/21 1:36 PM, Stefan Monnier wrote:
>>> If we copy the buffer's content to a freshly malloc area before passing
>>> that to TS, then there should be no problem running TS in a separate
>>> concurrent thread, indeed.
>> Except that the results will not be useful, since they won't apply to
>> the original buffer if it is changed.
>
> Not true: we just have to keep track of the list of changes (as Yuan's
> patch does), then pass it to tree-sitter to get a tree up-to-date w.r.t
> the current content of the buffer.
>
>> And if the original buffer is not changed, then we do not need to run
>> the parser asynchronously.
>
> We do:
> - because we want to do other things in the mean time
> - because we want to take advantage of the many CPU cores sitting idle.
Ah, sorry, I didn't see your message, so I sent an answer that's approximately equivalent.
But note that I'm not even sure we need to copy the buffer. Of course, I agree that it makes a lot of things a lot simpler.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 18:04 ` Clément Pit-Claudel
@ 2021-07-20 18:24 ` Eli Zaretskii
2021-07-21 16:54 ` [SPAM UNSURE] " Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-20 18:24 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: stephen_leake, monnier, emacs-devel
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Tue, 20 Jul 2021 14:04:25 -0400
> Cc: emacs-devel@gnu.org
>
> Concrete example: if you have a bit of elisp that runs for .5s to make modifications to the buffer, then press "indent", and only then do you send changes to TS and wait for the response synchronously, then you will wait for .5s + time to incorporate all changes. If you start processing the changes in parallel as they are made by the Elisp code, then you will only wait for .5s + time to incorporate only the changes that had not been processed yet.
Your example is too abstract and disregards the issues that Emacs has
with such "pure" parallelism. In my response to your original
proposal I tried to explain the difficulties with implementing your
suggestions _in_Emacs_, and the complexity which any such
implementation will bring with it. When you compare synchronous with
async implementation, you need to take those difficulties and
complexities into consideration, otherwise the comparison will not be
useful.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 7:16 ` Eli Zaretskii
@ 2021-07-20 20:36 ` Clément Pit-Claudel
2021-07-21 11:26 ` Eli Zaretskii
2021-07-21 16:29 ` Stephen Leake
0 siblings, 2 replies; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-20 20:36 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Thanks for the detailed reply.
On 7/17/21 3:16 AM, Eli Zaretskii wrote:
> When Emacs moves or enlarges/shrinks the gap, that affects the entire
> buffer text after the gap, regardless of where the gap is. So it will
> affect the TS reader if it reads stuff after the gap.
Doesn't enlarging the gap require allocating a new buffer and copying data to it? If so it wouldn't affect the TS reader. Moving is indeed trickier, that's what I referred to as "limited contention".
>> You do need to be careful to not read the garbage data from the gap, but otherwise seeing stale or even inconsistent data from the parser thread shouldn't be an issue, since tree-sitter is supposed to be robust to bad parses.
>
> What would be the purpose of calling the parser if we know in advance
> it will fail when it gets to the "garbage" caused by async access to
> the buffer text?
It won't fail, will it? I thought this was the point of TS, that it would reuse the initial parse on the "good" parts in subsequent parses.
> So I don't see how this could be done without some inter-locking.
Yes, there probably need to be some care around the gap area. But that's what I was referring to re. "optimistic concurrency".
> And what do you want the code which requested parsing do while the
> parse thread runs? The requesting code is in the main thread, so if
> it just waits, you don't gain anything.
You'd have the parser running continuously in the background, every time there is a change. When a piece of code requests a parse it blocks and waits, but presumably for not too long because a very recent previous parse means that the blocking parse is fast.
>> In fact, depending on how robust tree-sitter is, you might even be able to do the concurrency-control optimistically (parse everything up to close to the gap, check that the gap hasn't moved into the region that you read, and then resume reading or rollback).
>
> I don't understand what you suggest here. For starters, the gap could
> move (assuming you are still talking about a separate thread that does
> the parsing), and what do we do then?
Nothing, we start the next parse when this one completes.
> I don't understand what could recording the gap solve. The stuff in
> the gap is generally garbage, and can easily include invalid multibyte
> sequences. I don't think it's a good idea to pass that to TS. Also,
> recording the gap changes in the main thread and accessing that
> information from a concurrent thread again opens a window for races,
> and requires synchronization.
This list of gap changes wouldn't be accessed concurrently: you would (message-)pass a copy of it to the parser thread every time it starts a new parse.
> Bottom line, I think what you are suggesting is premature
> optimization: we don't yet know that we will need this.
I thought we knew that a full parse of some files could take over a second; but yes, it will be nice if we can find a synchronous way to avoid having to do a full parse.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 20:36 ` Clément Pit-Claudel
@ 2021-07-21 11:26 ` Eli Zaretskii
2021-07-21 13:38 ` Clément Pit-Claudel
2021-07-21 16:29 ` Stephen Leake
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-21 11:26 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> Cc: emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Tue, 20 Jul 2021 16:36:42 -0400
>
> Thanks for the detailed reply.
>
> On 7/17/21 3:16 AM, Eli Zaretskii wrote:
> > When Emacs moves or enlarges/shrinks the gap, that affects the entire
> > buffer text after the gap, regardless of where the gap is. So it will
> > affect the TS reader if it reads stuff after the gap.
>
> Doesn't enlarging the gap require allocating a new buffer and copying data to it?
Not necessarily. First, gap could be enlarged for reasons other than
growing buffer text as a whole. And even if we must grow buffer text,
a good memory-allocation system will many times resize the existing
memory block before it allocates another..
> If so it wouldn't affect the TS reader.
Not true, in general. When a new block is allocated by the OS/libc,
the old one is generally invalid and cannot be accessed. In many
cases, the old block could be unmapped from the program's address
space, in which case accessing it will segfault.
> Moving is indeed trickier, that's what I referred to as "limited contention".
We move the gap quite a lot.
> >> You do need to be careful to not read the garbage data from the gap, but otherwise seeing stale or even inconsistent data from the parser thread shouldn't be an issue, since tree-sitter is supposed to be robust to bad parses.
> >
> > What would be the purpose of calling the parser if we know in advance
> > it will fail when it gets to the "garbage" caused by async access to
> > the buffer text?
>
> It won't fail, will it?
"Fail" in the sense that it will be able to process only a small
portion of buffer text before it gets to garbage.
> > And what do you want the code which requested parsing do while the
> > parse thread runs? The requesting code is in the main thread, so if
> > it just waits, you don't gain anything.
>
> You'd have the parser running continuously in the background, every time there is a change. When a piece of code requests a parse it blocks and waits, but presumably for not too long because a very recent previous parse means that the blocking parse is fast.
Well, you cannot safely/usefully parse the buffer "continuously in the
background", for the reasons explained above, because Lisp programs
change buffer text quite a lot.
> > I don't understand what could recording the gap solve. The stuff in
> > the gap is generally garbage, and can easily include invalid multibyte
> > sequences. I don't think it's a good idea to pass that to TS. Also,
> > recording the gap changes in the main thread and accessing that
> > information from a concurrent thread again opens a window for races,
> > and requires synchronization.
>
> This list of gap changes wouldn't be accessed concurrently: you would (message-)pass a copy of it to the parser thread every time it starts a new parse.
I still don't see the point. Can you describe in more detail what
would you suggest doing with the list of gap changes? Just take a
specific example of a small set of gap changes and tell how to use
that.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 11:26 ` Eli Zaretskii
@ 2021-07-21 13:38 ` Clément Pit-Claudel
2021-07-21 13:51 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-21 13:38 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On 7/21/21 7:26 AM, Eli Zaretskii wrote:
> I still don't see the point. Can you describe in more detail what
> would you suggest doing with the list of gap changes? Just take a
> specific example of a small set of gap changes and tell how to use
> that.
I can try, but the idea was half-baked from the start, so I'm not sure how much value it will bring. All I was saying is that depending on how robust TS is, feeding it:
<valuable text><small bit of the gap because the gap moved while TS was scanning><more valuable data>
and then, knowing that the gap had moved, re-feeding it just the area that corresponds to the places around the boundaries of the gap might yield a speedup.
So if the buffer is XYYGGGZ, where G is the gap, and becomes XGGIYYZ while we're scanning because of cursor motion + an insertion, then TS might see XYGIYYZ, due to concurrent mutations; but if we recorded that the gap moved and insertions happened at -#####---, then we can re-feed GGIYY to TS (omitting the Gs, of course), and hopefully it can reuse the parse of X and Z. If X and Z are long enough, that can be valuable.
Alternatively, keeping the list of changes allows us to maintain a copy of the buffer that TS uses for scanning, with updates delayed until TS is done scanning.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 13:38 ` Clément Pit-Claudel
@ 2021-07-21 13:51 ` Eli Zaretskii
2021-07-22 4:59 ` Clément Pit-Claudel
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-21 13:51 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Wed, 21 Jul 2021 09:38:31 -0400
> Cc: emacs-devel@gnu.org
>
> <valuable text><small bit of the gap because the gap moved while TS was scanning><more valuable data>
>
> and then, knowing that the gap had moved, re-feeding it just the area that corresponds to the places around the boundaries of the gap might yield a speedup.
You are assuming that TS will be able to process both <valuable text>
and <more valuable data>, even though it eats the garbage in the gap?
That isn't guaranteed, due to possibly invalid byte sequences in the
gap.
Without synchronization, you also risk reading invalid byte sequences
even outside the gap, because while you read part of a byte sequence,
some editing operation modifies the buffer at that very place.
> Alternatively, keeping the list of changes allows us to maintain a copy of the buffer that TS uses for scanning, with updates delayed until TS is done scanning.
Having a copy for each buffer that needs parsing doesn't scale.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 16:45 ` Eli Zaretskii
@ 2021-07-21 15:49 ` Stephen Leake
2021-07-21 19:37 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-21 15:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, monnier, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Cc: Yuan Fu <casouri@gmail.com>, monnier@iro.umontreal.ca,
>> emacs-devel@gnu.org
>> Date: Tue, 20 Jul 2021 09:25:11 -0700
>>
>> > In addition, Emacs records (for redisplay purposes) two places in each
>> > buffer related to changes: the minimum buffer position before which no
>> > changes were done since last redisplay, and the maximum buffer
>> > position beyond which there were no changes. This can also be used to
>> > pass only a small part of the buffer to the parser, because the rest
>> > didn't change.
>>
>> Again, the input to tree-sitter is a list of changes, not a block of
>> text containing changes.
>
> I fail to see the significance of the difference. Surely, you could
> hand it a block of text with changes to mean that this block replaces
> the previous version of that block. It might take the parser more
> work to update the parse tree in this case, but if it's fast enough,
> that won't be the problem. Right?
tree-sitter doesn't store the previous text, so there's nothing to
compare it to. Alternately, this would require the parser to store the
previous text so it can compute the diff; that could be added in a
wrapper around tree-sitter.
wisi does store the previous text, so it could compute the diff. But
because of memory pressure, we want a design that does not require a
copy of the buffer text; when wisi is turned into an Emacs module, it
will not store a copy of the text.
>> If the parser is in an Emacs module, so it has direct access to the
>> buffer, then the hooks only need to record the buffer positions of the
>> insertions and deletions, not the new text. That should be very fast.
>
> (You are talking about the undo-list.)
Almost; the undo-list can get reset before the parser needs it. And
sometimes it is disabled. But it might make sense to try to use that
instead of maintaining a separate list of changes.
It might make sense to delete the matching change from the parser change
list when undo is invoked, rather than adding another change.
> But even this is wasteful: it is quite customary to delete, then
> re-insert, then re-delete again, etc. several times. So collecting
> these operations will produce much more "changes" than strictly
> needed.
Yes. The wisi parser Ada code includes a step that combines all the
changes (in arbitrary buffer-pos order) into a minimal list of changes
in buffer-pos order; that simplifies applying multiple changes to the
parse tree. We could move that to elisp, if that would help (it's in Ada
because I much prefer debugging Ada to debugging elisp). That could be
done in the buffer-change hook; if the current change can be combined
with the previous one, do that instead of adding a new one.
> That's why I'm trying to find a simpler, less wasteful strategies.
> Since TS is very fast, we can trade some of the speed for simpler,
> more scalable design of tracking changes.
I don't see how optimizing the change list makes it more "scalable"; the
worst case is that the optimal list is the complete list of actions the
user takes, and that will happen often enough to be an important case.
In practice font-lock is triggered on every character typed by the user
(Emacs is faster than people can type), so there will typically be only
one change; nothing to optimize.
In the case where some elisp is changing the buffer in several places
(ie indent-region, or some other re-format), optimizing the change list
might make sense, if the elisp code is not already optimized for that.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 17:36 ` Stefan Monnier
2021-07-20 18:05 ` Clément Pit-Claudel
@ 2021-07-21 16:02 ` Stephen Leake
2021-07-21 17:16 ` Stefan Monnier
1 sibling, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-21 16:02 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Clément Pit-Claudel, emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>> If we copy the buffer's content to a freshly malloc area before passing
>>> that to TS, then there should be no problem running TS in a separate
>>> concurrent thread, indeed.
>> Except that the results will not be useful, since they won't apply to
>> the original buffer if it is changed.
>
> Not true: we just have to keep track of the list of changes (as Yuan's
> patch does), then pass it to tree-sitter to get a tree up-to-date w.r.t
> the current content of the buffer.
The point is that more changes can happen while the parser is running.
>> And if the original buffer is not changed, then we do not need to run
>> the parser asynchronously.
>
> We do:
> - because we want to do other things in the mean time
> - because we want to take advantage of the many CPU cores sitting
> idle.
"using more cores" means parallel execution, which can still be
synchronous. That could be done for all invocations of the parser.
In the typical case of opening a new buffer, it might make sense
to spawn a thread to compute the fontification while the rest of the
major-mode-hook runs. Except functions on that hook could affect the
fontification, and not by changing the buffer; they could set the
fontification level or style.
Are there "other things" that are guaranteed to not affect
fontification?
In any case, the buffer must be read-only while the fontification is
being computed, so either the main emacs thread must wait for the
fontification to complete, or it must actually mark the buffer read-only
until the fontification completes, which could surprise the user.
On the other hand, if we don't force read-only, it might be possible to
use only part of the fontification information, up to the first change.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-20 20:36 ` Clément Pit-Claudel
2021-07-21 11:26 ` Eli Zaretskii
@ 2021-07-21 16:29 ` Stephen Leake
2021-07-21 16:54 ` Clément Pit-Claudel
1 sibling, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-21 16:29 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: Eli Zaretskii, emacs-devel
Clément Pit-Claudel <cpitclaudel@gmail.com> writes:
> On 7/17/21 3:16 AM, Eli Zaretskii wrote:
>>> You do need to be careful to not read the garbage data from the
>>> gap, but otherwise seeing stale or even inconsistent data from the
>>> parser thread shouldn't be an issue, since tree-sitter is supposed
>>> to be robust to bad parses.
>>
>> What would be the purpose of calling the parser if we know in advance
>> it will fail when it gets to the "garbage" caused by async access to
>> the buffer text?
>
> It won't fail, will it? I thought this was the point of TS, that it
> would reuse the initial parse on the "good" parts in subsequent
> parses.
There are limits to the error recovery, and throwing garbage text at
it is likely to encounter those limits. wisi is even more robust, but I
still get "error recover fail" daily.
>> So I don't see how this could be done without some inter-locking.
>
> Yes, there probably need to be some care around the gap area. But
> that's what I was referring to re. "optimistic concurrency".
>
>> And what do you want the code which requested parsing do while the
>> parse thread runs? The requesting code is in the main thread, so if
>> it just waits, you don't gain anything.
>
> You'd have the parser running continuously in the background, every
> time there is a change.
> When a piece of code requests a parse it blocks and waits, but
> presumably for not too long because a very recent previous parse means
> that the blocking parse is fast.
If the parser is truly fast enough to keep up with typing, this does
make sense. Good error correction is slower than non-so-good error
correction, so there might be a trade-off here.
On the other hand, in the typical case of the user typing characters,
font-lock is triggered on every character, so the parser is effectively
synchronous, and the inter-thread communication is wasted time.
We need some metrics on a real implementation to decide this part of the
design.
>>> In fact, depending on how robust tree-sitter is, you might even be
>>> able to do the concurrency-control optimistically (parse everything
>>> up to close to the gap, check that the gap hasn't moved into the
>>> region that you read, and then resume reading or rollback).
>>
>> I don't understand what you suggest here. For starters, the gap could
>> move (assuming you are still talking about a separate thread that does
>> the parsing), and what do we do then?
>
> Nothing, we start the next parse when this one completes.
By "nothing", I think you mean "abort the parse".
>> Bottom line, I think what you are suggesting is premature
>> optimization: we don't yet know that we will need this.
>
> I thought we knew that a full parse of some files could take over a
> second;
Yes, for both tree-sitter and wisi. wisi can take even longer if lots of
error correction is required (I have a time-out set at 5 seconds). But
that happens when the file is first opened; I doubt any user would start
typing that fast. I know I typically take a while to just look at the
text, and then navigate to the point of interest.
> but yes, it will be nice if we can find a synchronous way to avoid
> having to do a full parse.
Hmm. "looking at the text" is better done after it is fontified, so
doing a faster but possibly worse parse and fontification on just the
initial visible region might be a good idea.
While the partial parse is running, we could also spawn a parser thread
to run the full parse.
And if the user scrolls before the full parse is done, do a second
partial parse on the new visible region.
I'll put that on my list of things to try in wisi.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 16:29 ` Stephen Leake
@ 2021-07-21 16:54 ` Clément Pit-Claudel
2021-07-21 19:43 ` Eli Zaretskii
2021-07-21 21:54 ` Stephen Leake
0 siblings, 2 replies; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-21 16:54 UTC (permalink / raw)
To: emacs-devel
On 7/21/21 12:29 PM, Stephen Leake wrote:
> Yes, for both tree-sitter and wisi. wisi can take even longer if lots of
> error correction is required (I have a time-out set at 5 seconds). But
> that happens when the file is first opened; I doubt any user would start
> typing that fast. I know I typically take a while to just look at the
> text, and then navigate to the point of interest.
I'm not sure. We've had significant complaint in Flycheck for freezing Emacs for <1s: we have a synchronous sanity check to determine whether a checker can execute in a buffer (it runs a single time, and it should be async but I haven't gotten around to rewriting it). The problem is that some programs, including eslint, can take as much 1s, and in some bad cases 2-3 seconds, to parse their own config and decide if they can even run.
Users have complained about this delay. It might be better if they were able to scroll around, though — is that what happens with WISI? But if we have a fully synchronous TS, then that won't be possible either: it will be a complete Emacs freeze, no?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: [SPAM UNSURE] Re: How to add pseudo vector types
2021-07-20 18:04 ` Clément Pit-Claudel
2021-07-20 18:24 ` Eli Zaretskii
@ 2021-07-21 16:54 ` Stephen Leake
2021-07-21 17:12 ` Clément Pit-Claudel
1 sibling, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-21 16:54 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: Stefan Monnier, emacs-devel
Clément Pit-Claudel <cpitclaudel@gmail.com> writes:
> On 7/20/21 12:32 PM, Stephen Leake wrote:
>> Computing fontification and indentation must be synchronous.
>
> Must? What makes you say that?
Otherwise the results cannot be applied to the buffer, in general.
>> Except that the results will not be useful, since they won't apply to
>> the original buffer if it is changed.
>
> Then you will send the additional changes and wait.
It is that "wait" that makes it synchronous.
Note that synchronous is not the same as single-thread; mulitple threads
can be used, as long as the main thread waits for the parse results.
But synchronous is also not the same as requiring the buffer text to be
read-only while the parser is running, which is an additional
requirement if the parser is reading the buffer text directly.
> TS is an incremental parser, so the work it will have done
> incorporating part of the changes will not be wasted.
Not guarranteed if the new changes are before some of the old ones, and
TS has no support for interrupting a parse to add more changes.
> Concrete example: if you have a bit of elisp that runs for .5s to make
> modifications to the buffer, then press "indent", and only then do you
> send changes to TS and wait for the response synchronously, then you
> will wait for .5s + time to incorporate all changes. If you start
> processing the changes in parallel as they are made by the Elisp code,
> then you will only wait for .5s + time to incorporate only the changes
> that had not been processed yet.
It might be possible to implement the incremental parse algorithm so it can
accept changes after the parse starts. One requirement would be that the
new changes must be after the current parse point, which is a race
condition.
In your example, "indent" will go back to the first edit point to compute
the indent there; that is pretty much guarranteed to be before the
current parse point, which will be on one of the later changes.
Neither TS nor wisi support that; both have a separate Edit_Tree step
that applies all the changes to the parse tree before Parse is called. It
might be possible to integrate Edit_Tree into Parse, so that changes are
only applied when they are actually needed. But Edit_Tree and Parse are
already very complicated; keeping them separate is a good thing for
correctness and debugging.
Hmm. Perhaps you are not talking about interrupting the parse; you are
assuming that the parse for each change completes before the next change
arrives. Depending on the details of the changes, that might or might
not be wasted time; if we are on battery power (or worried about carbon
footprint), this might be a bad idea.
It still means fontification has to wait for the parse to complete
on all of the changes; it's synchronous in the sense that no user
actions on the buffer are allowed between the time fontification is
requested and the time text properties are applied.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: [SPAM UNSURE] Re: How to add pseudo vector types
2021-07-21 16:54 ` [SPAM UNSURE] " Stephen Leake
@ 2021-07-21 17:12 ` Clément Pit-Claudel
2021-07-21 19:49 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-21 17:12 UTC (permalink / raw)
To: emacs-devel
On 7/21/21 12:54 PM, Stephen Leake wrote:
> Hmm. Perhaps you are not talking about interrupting the parse; you are
> assuming that the parse for each change completes before the next change
> arrives.
Neither of these. I'm assuming that you open a file, launch a parse, batch up changes until that first parse completes, then launch a second parse, during which additional changes are batched up, then launch a third parse, etc.
Any time you actually need the info (for navigating, or for fontification, or…) then you either use the last parse if it was recent enough, or (more likely) you block until you can complete a synchronous parse.
This helps if you run a slow, blocking operation that edits the buffer. Not so much otherwise, indeed.
> It still means fontification has to wait for the parse to complete
> on all of the changes; it's synchronous in the sense that no user
> actions on the buffer are allowed between the time fontification is
> requested and the time text properties are applied.
Sure, sure; but hopefully that time is shorter than if the parser hadn't received a headstart.
Also, note that my original suggestion was mostly about the initial parse:
> I have no idea if it makes sense, but: does the initial parse need to be
synchronous, or could you instead run the parsing in one thread, and the
rest of Emacs in another?
If the initial parse takes a while, you would have no fontification at all for the first <n> seconds, if that's what it takes to parse your buffer (font-lock wouldn't block, it'd return immediately).
Then, after that initial parse, you would switch to a blocking mode every time you need info. That should be fast if the buffer hasn't changed too much.
If it has changed a lot, then you could revert to a non-blocking parse, while abandoning fontification for a little while.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 16:02 ` Stephen Leake
@ 2021-07-21 17:16 ` Stefan Monnier
0 siblings, 0 replies; 370+ messages in thread
From: Stefan Monnier @ 2021-07-21 17:16 UTC (permalink / raw)
To: Stephen Leake; +Cc: Clément Pit-Claudel, emacs-devel
Stephen Leake [2021-07-21 09:02:25] wrote:
> Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> Not true: we just have to keep track of the list of changes (as Yuan's
>> patch does), then pass it to tree-sitter to get a tree up-to-date w.r.t
>> the current content of the buffer.
> The point is that more changes can happen while the parser is running.
Not if the "refresh" is done synchronously.
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 15:49 ` Stephen Leake
@ 2021-07-21 19:37 ` Eli Zaretskii
2021-07-24 2:00 ` Stephen Leake
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-21 19:37 UTC (permalink / raw)
To: Stephen Leake; +Cc: casouri, monnier, emacs-devel
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: casouri@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Wed, 21 Jul 2021 08:49:15 -0700
>
> > I fail to see the significance of the difference. Surely, you could
> > hand it a block of text with changes to mean that this block replaces
> > the previous version of that block. It might take the parser more
> > work to update the parse tree in this case, but if it's fast enough,
> > that won't be the problem. Right?
>
> tree-sitter doesn't store the previous text, so there's nothing to
> compare it to.
There was nothing about comparison in my text. You tell TS that
editing replaced a block of text between A and B with block between A
and C, without revealing the fine-grained changes inside that block.
This must work, because editing could indeed do just that.
> Alternately, this would require the parser to store the
> previous text so it can compute the diff; that could be added in a
> wrapper around tree-sitter.
Presumably, TS has already solved this problem, because it needs that
for allowing the clients to communicate the changes to it.
> > That's why I'm trying to find a simpler, less wasteful strategies.
> > Since TS is very fast, we can trade some of the speed for simpler,
> > more scalable design of tracking changes.
>
> I don't see how optimizing the change list makes it more "scalable";
Keeping too much information about each buffer is less scalable,
especially with many large buffers.
> In practice font-lock is triggered on every character typed by the user
> (Emacs is faster than people can type), so there will typically be only
> one change; nothing to optimize.
Editing doesn't include just typing one character at a time. There's
killing, yanking, C-x i, M-/, M-\, C-M-\, smart completion, etc.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 16:54 ` Clément Pit-Claudel
@ 2021-07-21 19:43 ` Eli Zaretskii
2021-07-24 2:57 ` Stephen Leake
2021-07-24 3:55 ` Clément Pit-Claudel
2021-07-21 21:54 ` Stephen Leake
1 sibling, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-21 19:43 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Wed, 21 Jul 2021 12:54:16 -0400
>
> On 7/21/21 12:29 PM, Stephen Leake wrote:
> > Yes, for both tree-sitter and wisi. wisi can take even longer if lots of
> > error correction is required (I have a time-out set at 5 seconds). But
> > that happens when the file is first opened; I doubt any user would start
> > typing that fast. I know I typically take a while to just look at the
> > text, and then navigate to the point of interest.
>
> I'm not sure. We've had significant complaint in Flycheck for freezing Emacs for <1s
How much "less"? Close to 1 sec is indeed annoying, but 20 msec or so
should be bearable.
You seem to assume up front that TS (re)-parsing will take 1 sec, but
AFAIK there's no reason to assume such bad performance.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: [SPAM UNSURE] Re: How to add pseudo vector types
2021-07-21 17:12 ` Clément Pit-Claudel
@ 2021-07-21 19:49 ` Eli Zaretskii
2021-07-22 5:09 ` Clément Pit-Claudel
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-21 19:49 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Wed, 21 Jul 2021 13:12:16 -0400
>
> On 7/21/21 12:54 PM, Stephen Leake wrote:
> > Hmm. Perhaps you are not talking about interrupting the parse; you are
> > assuming that the parse for each change completes before the next change
> > arrives.
>
> Neither of these. I'm assuming that you open a file, launch a parse, batch up changes until that first parse completes, then launch a second parse, during which additional changes are batched up, then launch a third parse, etc.
But how would the "launched parse" access the buffer text if it runs
in parallel to normal editing? We've discussed the difficulties with
that, and you seem to ignore them here?
> Any time you actually need the info (for navigating, or for fontification, or…) then you either use the last parse if it was recent enough, or (more likely) you block until you can complete a synchronous parse.
Which means the results will many times be slightly wrong, because the
parse info you use is outdated?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 16:54 ` Clément Pit-Claudel
2021-07-21 19:43 ` Eli Zaretskii
@ 2021-07-21 21:54 ` Stephen Leake
2021-07-22 4:40 ` Clément Pit-Claudel
1 sibling, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-21 21:54 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
Clément Pit-Claudel <cpitclaudel@gmail.com> writes:
> On 7/21/21 12:29 PM, Stephen Leake wrote:
>> Yes, for both tree-sitter and wisi. wisi can take even longer if lots of
>> error correction is required (I have a time-out set at 5 seconds). But
>> that happens when the file is first opened; I doubt any user would start
>> typing that fast. I know I typically take a while to just look at the
>> text, and then navigate to the point of interest.
>
> I'm not sure. We've had significant complaint in Flycheck for freezing
> Emacs for <1s: we have a synchronous sanity check to determine whether
> a checker can execute in a buffer (it runs a single time, and it
> should be async but I haven't gotten around to rewriting it). The
> problem is that some programs, including eslint, can take as much 1s,
> and in some bad cases 2-3 seconds, to parse their own config and
> decide if they can even run.
Ok.
> Users have complained about this delay. It might be better if they
> were able to scroll around, though — is that what happens with WISI?
wisi supports partial parse; if a buffer is larger than a user-settable
threshold, for font-lock it parses only the request region of the file,
expanded to reasonable start/end points.
So in that mode, the initial parse of even a very large buffer is fast.
However, using that for indentation is problematic, which is why I'm
implementing incremental parse.
I think continuing to support both will be useful.
> But if we have a fully synchronous TS, then that won't be possible
> either: it will be a complete Emacs freeze, no?
It should only freeze write operations on that buffer, so marking it
read-only while waiting for the parse results might be best.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-19 15:16 ` Yuan Fu
@ 2021-07-22 3:10 ` Yuan Fu
2021-07-22 8:23 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-22 3:10 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Clément Pit-Claudel, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 579 bytes --]
Here is another patch. No big progress since I’m busy moving this week. In this patch I changed from using change hooks to directly updating the trees in edit functions. I also added some node api and tests. Should I keep posting patches, or should I create a branch in /scratch? If the latter, how do I do it?
I’m aware of the ongoing enlightening discussion on potential optimizations for tree-sitter. My plan is to first complete the api and implement some minimal structural editing/font-lock features, then we can concretely measure what needs to improve.
Yuan
[-- Attachment #2: ts.3.patch --]
[-- Type: application/octet-stream, Size: 24276 bytes --]
From fd8ad36fe5ea3b9b12e80879b7434b8bc67b53db Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Wed, 21 Jul 2021 22:43:07 -0400
Subject: [PATCH] checkpoint 3
- change_hook -> directly in edit functions
- add a need_reparse field in Lisp_TS_Parser
- more node api
- tests
---
src/insdel.c | 43 ++++--
src/tree_sitter.c | 274 ++++++++++++++++++++++++++++------
src/tree_sitter.h | 18 ++-
test/src/tree-sitter-tests.el | 106 +++++++++++++
4 files changed, 377 insertions(+), 64 deletions(-)
create mode 100644 test/src/tree-sitter-tests.el
diff --git a/src/insdel.c b/src/insdel.c
index 3c1e13d38b..b313c50cda 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -947,6 +947,10 @@ insert_1_both (const char *string,
adjust_point (nchars, nbytes);
check_markers ();
+
+#ifdef HAVE_TREE_SITTER
+ ts_record_change (PT_BYTE - nbytes, PT_BYTE - nbytes, PT_BYTE);
+#endif
}
\f
/* Insert the part of the text of STRING, a Lisp object assumed to be
@@ -1078,6 +1082,10 @@ insert_from_string_1 (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
adjust_point (nchars, outgoing_nbytes);
check_markers ();
+
+#ifdef HAVE_TREE_SITTER
+ ts_record_change (PT_BYTE - nbytes, PT_BYTE - nbytes, PT_BYTE);
+#endif
}
\f
/* Insert a sequence of NCHARS chars which occupy NBYTES bytes
@@ -1145,6 +1153,10 @@ insert_from_gap (ptrdiff_t nchars, ptrdiff_t nbytes, bool text_at_gap_tail)
adjust_point (nchars, nbytes);
check_markers ();
+
+#ifdef HAVE_TREE_SITTER
+ ts_record_change (PT_BYTE - nbytes, PT_BYTE - nbytes, nbytes);
+#endif
}
\f
/* Insert text from BUF, NCHARS characters starting at CHARPOS, into the
@@ -1292,6 +1304,11 @@ insert_from_buffer_1 (struct buffer *buf,
graft_intervals_into_buffer (intervals, PT, nchars, current_buffer, inherit);
adjust_point (nchars, outgoing_nbytes);
+
+#ifdef HAVE_TREE_SITTER
+ ts_record_change (PT_BYTE - outgoing_nbytes,
+ PT_BYTE - outgoing_nbytes, PT_BYTE);
+#endif
}
\f
/* Record undo information and adjust markers and position keepers for
@@ -1556,6 +1573,11 @@ replace_range (ptrdiff_t from, ptrdiff_t to, Lisp_Object new,
if (adjust_match_data)
update_search_regs (from, to, from + SCHARS (new));
+
+#ifdef HAVE_TREE_SITTER
+ ts_record_change (from_byte, to_byte, GPT_BYTE);
+#endif
+
signal_after_change (from, nchars_del, GPT - from);
update_compositions (from, GPT, CHECK_BORDER);
}
@@ -1683,6 +1705,11 @@ replace_range_2 (ptrdiff_t from, ptrdiff_t from_byte,
modiff_incr (&MODIFF);
CHARS_MODIFF = MODIFF;
+
+#ifdef HAVE_TREE_SITTER
+ ts_record_change (from_byte, to_byte, from_byte + insbytes);
+#endif
+
}
\f
/* Delete characters in current buffer
@@ -1893,6 +1920,10 @@ del_range_2 (ptrdiff_t from, ptrdiff_t from_byte,
evaporate_overlays (from);
+#ifdef HAVE_TREE_SITTER
+ ts_record_change (from_byte, to_byte, from_byte);
+#endif
+
return deletion;
}
@@ -2156,11 +2187,6 @@ signal_before_change (ptrdiff_t start_int, ptrdiff_t end_int,
run_hook (Qfirst_change_hook);
}
-#ifdef HAVE_TREE_SITTER
- /* FIXME: Is this the best place? */
- ts_before_change (start_int, end_int);
-#endif
-
/* Now run the before-change-functions if any. */
if (!NILP (Vbefore_change_functions))
{
@@ -2214,13 +2240,6 @@ signal_after_change (ptrdiff_t charpos, ptrdiff_t lendel, ptrdiff_t lenins)
if (inhibit_modification_hooks)
return;
-#ifdef HAVE_TREE_SITTER
- /* We disrespect combine-after-change, because if we don't record
- this change, the information that we need (the end byte position
- of the change) will be lost. */
- ts_after_change (charpos, lendel, lenins);
-#endif
-
/* If we are deferring calls to the after-change functions
and there are no before-change functions,
just record the args that we were going to use. */
diff --git a/src/tree_sitter.c b/src/tree_sitter.c
index 7d1225161c..a6a8912c84 100644
--- a/src/tree_sitter.c
+++ b/src/tree_sitter.c
@@ -32,49 +32,52 @@ Copyright (C) 2021 Free Software Foundation, Inc.
#include "coding.h"
#include "tree_sitter.h"
-/* parser.h defines a macro ADVANCE that conflicts with alloc.c. */
+/* parser.h defines a macro ADVANCE that conflicts with alloc.c. */
#include <tree_sitter/parser.h>
-/* Record the byte position of the end of the (to-be) changed text.
-We have to record it now, because by the time we get to after-change
-hook, the _byte_ position of the end is lost. */
-void
-ts_before_change (ptrdiff_t start_int, ptrdiff_t end_int)
+DEFUN ("tree-sitter-parser-p",
+ Ftree_sitter_parser_p, Stree_sitter_parser_p, 1, 1, 0,
+ doc: /* Return t if OBJECT is a tree-sitter parser. */)
+ (Lisp_Object object)
{
- /* Iterate through each parser in 'tree-sitter-parser-list' and
- record the byte position. There could be better ways to record
- it than storing the same position in every parser, but this is
- the most fool-proof way, and I expect a buffer to have only one
- parser most of the time anyway. */
- ptrdiff_t beg_byte = CHAR_TO_BYTE (start_int);
- ptrdiff_t old_end_byte = CHAR_TO_BYTE (end_int);
- Lisp_Object parser_list = Fsymbol_value (Qtree_sitter_parser_list);
- while (!NILP (parser_list))
- {
- Lisp_Object lisp_parser = Fcar (parser_list);
- XTS_PARSER (lisp_parser)->edit.start_byte = beg_byte;
- XTS_PARSER (lisp_parser)->edit.old_end_byte = old_end_byte;
- parser_list = Fcdr (parser_list);
- }
+ if (TS_PARSERP (object))
+ return Qt;
+ else
+ return Qnil;
+}
+
+DEFUN ("tree-sitter-node-p",
+ Ftree_sitter_node_p, Stree_sitter_node_p, 1, 1, 0,
+ doc: /* Return t if OBJECT is a tree-sitter node. */)
+ (Lisp_Object object)
+{
+ if (TS_NODEP (object))
+ return Qt;
+ else
+ return Qnil;
}
/* Update each parser's tree after the user made an edit. This
function does not parse the buffer and only updates the tree. (So it
should be very fast.) */
void
-ts_after_change (ptrdiff_t charpos, ptrdiff_t lendel, ptrdiff_t lenins)
+ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
+ ptrdiff_t new_end_byte)
{
- ptrdiff_t new_end_byte = CHAR_TO_BYTE (charpos + lenins);
Lisp_Object parser_list = Fsymbol_value (Qtree_sitter_parser_list);
+ TSPoint dummy_point = {0, 0};
+ TSInputEdit edit = {start_byte, old_end_byte, new_end_byte,
+ dummy_point, dummy_point, dummy_point};
while (!NILP (parser_list))
{
Lisp_Object lisp_parser = Fcar (parser_list);
TSTree *tree = XTS_PARSER (lisp_parser)->tree;
- XTS_PARSER (lisp_parser)->edit.new_end_byte = new_end_byte;
if (tree != NULL)
- ts_tree_edit (tree, &XTS_PARSER (lisp_parser)->edit);
+ ts_tree_edit (tree, &edit);
+ XTS_PARSER (lisp_parser)->need_reparse = true;
parser_list = Fcdr (parser_list);
}
+
}
/* Parse the buffer. We don't parse until we have to. When we have
@@ -82,11 +85,15 @@ ts_after_change (ptrdiff_t charpos, ptrdiff_t lendel, ptrdiff_t lenins)
void
ts_ensure_parsed (Lisp_Object parser)
{
+ if (!XTS_PARSER (parser)->need_reparse)
+ return;
TSParser *ts_parser = XTS_PARSER (parser)->parser;
TSTree *tree = XTS_PARSER(parser)->tree;
TSInput input = XTS_PARSER (parser)->input;
TSTree *new_tree = ts_parser_parse(ts_parser, tree, input);
+ ts_tree_delete (tree);
XTS_PARSER (parser)->tree = new_tree;
+ XTS_PARSER (parser)->need_reparse = false;
}
/* This is the read function provided to tree-sitter to read from a
@@ -96,33 +103,30 @@ ts_ensure_parsed (Lisp_Object parser)
ts_read_buffer (void *buffer, uint32_t byte_index,
TSPoint position, uint32_t *bytes_read)
{
- if (! BUFFER_LIVE_P ((struct buffer *) buffer))
+ if (!BUFFER_LIVE_P ((struct buffer *) buffer))
error ("BUFFER is not live");
ptrdiff_t byte_pos = byte_index + 1;
- // FIXME: Add some boundary checks?
- /* I believe we can get away with only setting current-buffer
- and not actually switching to it, like what we did in
- 'make_gap_1'. */
- struct buffer *old_buffer = current_buffer;
- current_buffer = (struct buffer *) buffer;
-
- /* Read one character. */
+ /* Read one character. Tree-sitter wants us to set bytes_read to 0
+ if it reads to the end of buffer. It doesn't say what it wants
+ for the return value in that case, so we just give it an empty
+ string. */
char *beg;
int len;
- if (byte_pos >= Z_BYTE)
+ // TODO BUF_ZV_BYTE?
+ if (byte_pos >= BUF_Z_BYTE ((struct buffer *) buffer))
{
beg = "";
len = 0;
}
else
{
- beg = (char *) BYTE_POS_ADDR (byte_pos);
+ beg = (char *) BUF_BYTE_ADDRESS (buffer, byte_pos);
len = next_char_len(byte_pos);
}
*bytes_read = (uint32_t) len;
- current_buffer = old_buffer;
+
return beg;
}
@@ -137,9 +141,7 @@ make_ts_parser (struct buffer *buffer, TSParser *parser, TSTree *tree)
lisp_parser->tree = tree;
TSInput input = {buffer, ts_read_buffer, TSInputEncodingUTF8};
lisp_parser->input = input;
- TSPoint dummy_point = {0, 0};
- TSInputEdit edit = {0, 0, 0, dummy_point, dummy_point, dummy_point};
- lisp_parser->edit = edit;
+ lisp_parser->need_reparse = true;
return make_lisp_ptr (lisp_parser, Lisp_Vectorlike);
}
@@ -192,6 +194,7 @@ DEFUN ("tree-sitter-parser-root-node",
doc: /* Return the root node of PARSER. */)
(Lisp_Object parser)
{
+ CHECK_TS_PARSER (parser);
ts_ensure_parsed(parser);
TSNode root_node = ts_tree_root_node (XTS_PARSER (parser)->tree);
return make_ts_node (parser, root_node);
@@ -229,11 +232,29 @@ DEFUN ("tree-sitter-parse", Ftree_sitter_parse, Stree_sitter_parse,
return lisp_node;
}
+/* Below this point are uninteresting mechanical translations of
+ tree-sitter API. */
+
+/* Node functions. */
+
+DEFUN ("tree-sitter-node-type",
+ Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
+ doc: /* Return the NODE's type as a symbol. */)
+ (Lisp_Object node)
+{
+ CHECK_TS_NODE (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ const char *type = ts_node_type(ts_node);
+ return intern_c_string (type);
+}
+
+
DEFUN ("tree-sitter-node-string",
Ftree_sitter_node_string, Stree_sitter_node_string, 1, 1, 0,
doc: /* Return the string representation of NODE. */)
(Lisp_Object node)
{
+ CHECK_TS_NODE (node);
TSNode ts_node = XTS_NODE (node)->node;
char *string = ts_node_string(ts_node);
return make_string(string, strlen (string));
@@ -242,29 +263,125 @@ DEFUN ("tree-sitter-node-string",
DEFUN ("tree-sitter-node-parent",
Ftree_sitter_node_parent, Stree_sitter_node_parent, 1, 1, 0,
doc: /* Return the immediate parent of NODE.
-Return nil if we couldn't find any. */)
+Return nil if there isn't any. */)
(Lisp_Object node)
{
+ CHECK_TS_NODE (node);
TSNode ts_node = XTS_NODE (node)->node;
- TSNode parent = ts_node_parent(ts_node);
+ TSNode parent = ts_node_parent (ts_node);
if (ts_node_is_null(parent))
return Qnil;
- return make_ts_node(XTS_NODE (node)->parser, parent);
+ return make_ts_node (XTS_NODE (node)->parser, parent);
}
DEFUN ("tree-sitter-node-child",
- Ftree_sitter_node_child, Stree_sitter_node_child, 2, 2, 0,
+ Ftree_sitter_node_child, Stree_sitter_node_child, 2, 3, 0,
doc: /* Return the Nth child of NODE.
-Return nil if we couldn't find any. */)
+Return nil if there isn't any. If NAMED is non-nil, look for named
+child only. NAMED defaults to nil. */)
+ (Lisp_Object node, Lisp_Object n, Lisp_Object named)
+{
+ CHECK_TS_NODE (node);
+ CHECK_INTEGER (n);
+ EMACS_INT idx = XFIXNUM (n);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child;
+ if (NILP (named))
+ child = ts_node_child (ts_node, (uint32_t) idx);
+ else
+ child = ts_node_named_child (ts_node, (uint32_t) idx);
+
+ if (ts_node_is_null(child))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, child);
+}
+
+DEFUN ("tree-sitter-node-check",
+ Ftree_sitter_node_check, Stree_sitter_node_check, 2, 2, 0,
+ doc: /* Return non-nil if NODE is in condition COND, nil otherwise.
+
+COND could be 'named, 'missing, 'extra, 'has-error. Named nodes
+correspond to named rules in the grammar, whereas "anonymous" nodes
+correspond to string literals in the grammar.
+
+Missing nodes are inserted by the parser in order to recover from
+certain kinds of syntax errors, i.e., should be there but not there.
+
+Extra nodes represent things like comments, which are not required the
+grammar, but can appear anywhere.
+
+A node "has error" if itself is a syntax error or contains any syntax
+errors. */)
+ (Lisp_Object node, Lisp_Object cond)
+{
+ CHECK_TS_NODE (node);
+ CHECK_SYMBOL (cond);
+ TSNode ts_node = XTS_NODE (node)->node;
+ bool result;
+ if (EQ (cond, Qnamed))
+ result = ts_node_is_named (ts_node);
+ else if (EQ (cond, Qmissing))
+ result = ts_node_is_missing (ts_node);
+ else if (EQ (cond, Qextra))
+ result = ts_node_is_extra (ts_node);
+ else if (EQ (cond, Qhas_error))
+ result = ts_node_has_error (ts_node);
+ else
+ signal_error ("Expecting one of four symbols, see docstring", cond);
+ return result ? Qt : Qnil;
+}
+
+DEFUN ("tree-sitter-node-field-name-for-child",
+ Ftree_sitter_node_field_name_for_child,
+ Stree_sitter_node_field_name_for_child, 2, 2, 0,
+ doc: /* Return the field name of the Nth child of NODE.
+Return nil if there isn't any child or no field is found. */)
(Lisp_Object node, Lisp_Object n)
{
CHECK_INTEGER (n);
EMACS_INT idx = XFIXNUM (n);
TSNode ts_node = XTS_NODE (node)->node;
- // FIXME: Is this cast ok?
- TSNode child = ts_node_child(ts_node, (uint32_t) idx);
+ const char *name
+ = ts_node_field_name_for_child (ts_node, (uint32_t) idx);
+
+ if (name == NULL)
+ return Qnil;
+
+ return make_string (name, strlen (name));
+}
+
+DEFUN ("tree-sitter-node-child-count",
+ Ftree_sitter_node_child_count,
+ Stree_sitter_node_child_count, 1, 2, 0,
+ doc: /* Return the number of children of NODE.
+If NAMED is non-nil, count named child only. NAMED defaults to
+nil. */)
+ (Lisp_Object node, Lisp_Object named)
+{
+ TSNode ts_node = XTS_NODE (node)->node;
+ uint32_t count;
+ if (NILP (named))
+ count = ts_node_child_count (ts_node);
+ else
+ count = ts_node_named_child_count (ts_node);
+ return make_fixnum (count);
+}
+
+DEFUN ("tree-sitter-node-child-by-field-name",
+ Ftree_sitter_node_child_by_field_name,
+ Stree_sitter_node_child_by_field_name, 2, 2, 0,
+ doc: /* Return the child of NODE with field name NAME.
+Return nil if there isn't any. */)
+ (Lisp_Object node, Lisp_Object name)
+{
+ CHECK_STRING (name);
+ char *name_str = SSDATA (name);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child
+ = ts_node_child_by_field_name (ts_node, name_str, strlen (name_str));
if (ts_node_is_null(child))
return Qnil;
@@ -272,10 +389,62 @@ DEFUN ("tree-sitter-node-child",
return make_ts_node(XTS_NODE (node)->parser, child);
}
+DEFUN ("tree-sitter-node-next-sibling",
+ Ftree_sitter_node_next_sibling,
+ Stree_sitter_node_next_sibling, 1, 2, 0,
+ doc: /* Return the next sibling of NODE.
+Return nil if there isn't any. If NAMED is non-nil, look for named
+child only. NAMED defaults to nil. */)
+ (Lisp_Object node, Lisp_Object named)
+{
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode sibling;
+ if (NILP (named))
+ sibling = ts_node_next_sibling (ts_node);
+ else
+ sibling = ts_node_next_named_sibling (ts_node);
+
+ if (ts_node_is_null(sibling))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, sibling);
+}
+
+DEFUN ("tree-sitter-node-prev-sibling",
+ Ftree_sitter_node_prev_sibling,
+ Stree_sitter_node_prev_sibling, 1, 2, 0,
+ doc: /* Return the previous sibling of NODE.
+Return nil if there isn't any. If NAMED is non-nil, look for named
+child only. NAMED defaults to nil. */)
+ (Lisp_Object node, Lisp_Object named)
+{
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode sibling;
+
+ if (NILP (named))
+ sibling = ts_node_prev_sibling (ts_node);
+ else
+ sibling = ts_node_prev_named_sibling (ts_node);
+
+ if (ts_node_is_null(sibling))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, sibling);
+}
+
+/* Query functions */
+
/* Initialize the tree-sitter routines. */
void
syms_of_tree_sitter (void)
{
+ DEFSYM (Qtree_sitter_parser_p, "tree-sitter-parser-p");
+ DEFSYM (Qtree_sitter_node_p, "tree-sitter-node-p");
+ DEFSYM (Qnamed, "named");
+ DEFSYM (Qmissing, "missing");
+ DEFSYM (Qextra, "extra");
+ DEFSYM (Qhas_error, "has-error");
+
DEFSYM (Qtree_sitter_parser_list, "tree-sitter-parser-list");
DEFVAR_LISP ("ts-parser-list", Vtree_sitter_parser_list,
doc: /* A list of tree-sitter parsers.
@@ -284,11 +453,20 @@ syms_of_tree_sitter (void)
Vtree_sitter_parser_list = Qnil;
Fmake_variable_buffer_local (Qtree_sitter_parser_list);
-
+ defsubr (&Stree_sitter_parser_p);
+ defsubr (&Stree_sitter_node_p);
defsubr (&Stree_sitter_create_parser);
defsubr (&Stree_sitter_parser_root_node);
defsubr (&Stree_sitter_parse);
+
+ defsubr (&Stree_sitter_node_type);
defsubr (&Stree_sitter_node_string);
defsubr (&Stree_sitter_node_parent);
defsubr (&Stree_sitter_node_child);
+ defsubr (&Stree_sitter_node_check);
+ defsubr (&Stree_sitter_node_field_name_for_child);
+ defsubr (&Stree_sitter_node_child_count);
+ defsubr (&Stree_sitter_node_child_by_field_name);
+ defsubr (&Stree_sitter_node_next_sibling);
+ defsubr (&Stree_sitter_node_prev_sibling);
}
diff --git a/src/tree_sitter.h b/src/tree_sitter.h
index 0606f336cc..a7e2a2d670 100644
--- a/src/tree_sitter.h
+++ b/src/tree_sitter.h
@@ -37,7 +37,7 @@ #define EMACS_TREE_SITTER_H
TSParser *parser;
TSTree *tree;
TSInput input;
- TSInputEdit edit;
+ bool need_reparse;
};
/* A wrapper around a tree-sitter node. */
@@ -78,11 +78,21 @@ XTS_NODE (Lisp_Object a)
return XUNTAG (a, Lisp_Vectorlike, struct Lisp_TS_Node);
}
-void
-ts_before_change (ptrdiff_t charpos, ptrdiff_t lendel);
+INLINE void
+CHECK_TS_PARSER (Lisp_Object parser)
+{
+ CHECK_TYPE (TS_PARSERP (parser), Qtree_sitter_parser_p, parser);
+}
+
+INLINE void
+CHECK_TS_NODE (Lisp_Object node)
+{
+ CHECK_TYPE (TS_NODEP (node), Qtree_sitter_node_p, node);
+}
void
-ts_after_change (ptrdiff_t charpos, ptrdiff_t lendel, ptrdiff_t lenins);
+ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
+ ptrdiff_t new_end_byte);
Lisp_Object
make_ts_parser (struct buffer *buffer, TSParser *parser, TSTree *tree);
diff --git a/test/src/tree-sitter-tests.el b/test/src/tree-sitter-tests.el
new file mode 100644
index 0000000000..cb1c464d3a
--- /dev/null
+++ b/test/src/tree-sitter-tests.el
@@ -0,0 +1,106 @@
+;;; tree-sitter-tests.el --- tests for src/tree-sitter.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2021 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Code:
+
+(require 'ert)
+(require 'tree-sitter-json)
+
+(ert-deftest tree-sitter-basic-parsing ()
+ "Test basic parsing routines."
+ (with-temp-buffer
+ (let ((parser (tree-sitter-create-parser
+ (current-buffer) (tree-sitter-json))))
+ (should
+ (eq parser (car tree-sitter-parser-list)))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(ERROR)"))
+
+ (insert "[1,2,3]")
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number) (number)))"))
+
+ (goto-char (point-min))
+ (forward-char 3)
+ (insert "{\"name\": \"Bob\"},")
+ (should
+ (equal
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number) (number)))")))))
+
+(ert-deftest tree-sitter-node-api ()
+ "Tests for node API."
+ (with-temp-buffer
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (let (parser root-node doc-node object-node pair-node)
+ (setq parser (tree-sitter-create-parser
+ (current-buffer) (tree-sitter-json)))
+ (setq root-node (tree-sitter-parser-root-node
+ parser))
+ ;; `tree-sitter-node-type'.
+ (should (eq 'document (tree-sitter-node-type root-node)))
+ ;; `tree-sitter-node-check'.
+ (should (eq t (tree-sitter-node-check root-node 'named)))
+ (should (eq nil (tree-sitter-node-check root-node 'missing)))
+ (should (eq nil (tree-sitter-node-check root-node 'extra)))
+ (should (eq nil (tree-sitter-node-check root-node 'has-error)))
+ ;; `tree-sitter-node-child'.
+ (setq doc-node (tree-sitter-node-child root-node 0))
+ (should (eq 'array (tree-sitter-node-type doc-node)))
+ (should (equal (tree-sitter-node-string doc-node)
+ "(array (number) (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number))"))
+ ;; `tree-sitter-node-child-count'.
+ (should (eql 9 (tree-sitter-node-child-count doc-node)))
+ (should (eql 4 (tree-sitter-node-child-count doc-node t)))
+ ;; `tree-sitter-node-field-name-for-child'.
+ (setq object-node (tree-sitter-node-child doc-node 2 t))
+ (setq pair-node (tree-sitter-node-child object-node 0 t))
+ (should (eq 'object (tree-sitter-node-type object-node)))
+ (should (eq 'pair (tree-sitter-node-type pair-node)))
+ (should (equal "key"
+ (tree-sitter-node-field-name-for-child
+ pair-node 0)))
+ ;; `tree-sitter-node-child-by-field-name'.
+ (should (equal "(string (string_content))"
+ (tree-sitter-node-string
+ (tree-sitter-node-child-by-field-name
+ pair-node "key"))))
+ ;; `tree-sitter-node-next-sibling'.
+ (should (equal "(number)"
+ (tree-sitter-node-string
+ (tree-sitter-node-next-sibling object-node t))))
+ (should (equal "(\",\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-next-sibling object-node))))
+ ;; `tree-sitter-node-prev-sibling'.
+ (should (equal "(number)"
+ (tree-sitter-node-string
+ (tree-sitter-node-prev-sibling object-node t))))
+ (should (equal "(\",\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-prev-sibling object-node))))
+ )))
+
+(provide 'tree-sitter-tests)
+;;; tree-sitter-tests.el ends here
--
2.24.3 (Apple Git-128)
^ permalink raw reply related [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 21:54 ` Stephen Leake
@ 2021-07-22 4:40 ` Clément Pit-Claudel
0 siblings, 0 replies; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-22 4:40 UTC (permalink / raw)
To: emacs-devel
On 7/21/21 5:54 PM, Stephen Leake wrote:
> It should only freeze write operations on that buffer, so marking it
> read-only while waiting for the parse results might be best.
Yes, I expect that would be much better than what we have.
Thanks for your work on wisi, by the way!
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 13:51 ` Eli Zaretskii
@ 2021-07-22 4:59 ` Clément Pit-Claudel
2021-07-22 6:38 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-22 4:59 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On 7/21/21 9:51 AM, Eli Zaretskii wrote:
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Wed, 21 Jul 2021 09:38:31 -0400
>> Cc: emacs-devel@gnu.org
>>
>> <valuable text><small bit of the gap because the gap moved while TS was scanning><more valuable data>
>>
>> and then, knowing that the gap had moved, re-feeding it just the area that corresponds to the places around the boundaries of the gap might yield a speedup.
>
> You are assuming that TS will be able to process both <valuable text>
> and <more valuable data>, even though it eats the garbage in the gap?
> That isn't guaranteed, due to possibly invalid byte sequences in the
> gap.
Yes, that's fair.
>> Alternatively, keeping the list of changes allows us to maintain a copy of the buffer that TS uses for scanning, with updates delayed until TS is done scanning.
>
> Having a copy for each buffer that needs parsing doesn't scale.
Because of time, or because of memory?
I though we assumed memory was a non-issue, because tree-sitter's data structures seem to require *a lot* more space than the text of the underlying buffer (in 2018 the main dev said "syntax trees still use over 10x as much memory as the size of the source file.").
Copying time can be an issue, for sure, but memcpy() is fast these days ^^
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: [SPAM UNSURE] Re: How to add pseudo vector types
2021-07-21 19:49 ` Eli Zaretskii
@ 2021-07-22 5:09 ` Clément Pit-Claudel
2021-07-22 6:44 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-22 5:09 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On 7/21/21 3:49 PM, Eli Zaretskii wrote:
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Wed, 21 Jul 2021 13:12:16 -0400
>>
>> On 7/21/21 12:54 PM, Stephen Leake wrote:
>>> Hmm. Perhaps you are not talking about interrupting the parse; you are
>>> assuming that the parse for each change completes before the next change
>>> arrives.
>>
>> Neither of these. I'm assuming that you open a file, launch a parse, batch up changes until that first parse completes, then launch a second parse, during which additional changes are batched up, then launch a third parse, etc.
>
> But how would the "launched parse" access the buffer text if it runs
> in parallel to normal editing? We've discussed the difficulties with
> that, and you seem to ignore them here?
Lots of magic handwaving: IOW, I don't have a solution, just a general hope that minimal synchronization and decent error recovery would help (for example, maybe it's enough to synchronize only when TS requires a chunk of memory). But for the discussion above, Stefan's copying solution works fine.
>> Any time you actually need the info (for navigating, or for fontification, or…) then you either use the last parse if it was recent enough, or (more likely) you block until you can complete a synchronous parse.
>
> Which means the results will many times be slightly wrong, because the
> parse info you use is outdated?
Maybe. In practice if the delay between requesting the info and getting it is perceptible, then displaying outdated info is better than freezing until you get up-to-date info, no? Either you're getting info so fast that the user doesn't realize that you're outdated by 1ms; or you're getting info so slowly that the user realizes that you're running one second behind — but it's much better than freezing for one second.
Less relevant details below:
This is a problem we have all the time with Flycheck btw: you send the text of the buffer to a compiler, it returns 3 seconds later, and you want to display errors as reported by the compiler. By the time we get errors to display, the locations they come with are outdated. We don't have a good solution.
Visual Studio in the old days had a really beautiful solution for this. There was a (basically) free API you could call to snapshot a buffer at a point in time; then there was a function that translated positions in that snapshot to position in the current buffer (think of it as magically putting a marker into the past buffer when the snapshot was taken, and then querying its current position). So positions returned by the compiler or any other tools were still relevant if they referred to a three-seconds old buffer, since you could translate them to the current buffer.
Clément.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 4:59 ` Clément Pit-Claudel
@ 2021-07-22 6:38 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-22 6:38 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> Cc: emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Thu, 22 Jul 2021 00:59:31 -0400
>
> >> Alternatively, keeping the list of changes allows us to maintain a copy of the buffer that TS uses for scanning, with updates delayed until TS is done scanning.
> >
> > Having a copy for each buffer that needs parsing doesn't scale.
>
> Because of time, or because of memory?
Memory, mostly.
> I though we assumed memory was a non-issue, because tree-sitter's data structures seem to require *a lot* more space than the text of the underlying buffer (in 2018 the main dev said "syntax trees still use over 10x as much memory as the size of the source file.").
You are talking about _adding_ to that another copy of the buffer's
text, which could be many megabytes. And your proposal means we will
have such copies for many buffers.
As for the TS memory requirements, if they really need 1GB for a 100MB
file (I doubt that), then TS is probably not a good candidate for
Emacs.
> Copying time can be an issue, for sure, but memcpy() is fast these days ^^
You forget the time needed to allocate the memory for the copy, that
could be orders of magnitude slower for large buffers, especially if
there's a lot of memory pressure on the OS.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: [SPAM UNSURE] Re: How to add pseudo vector types
2021-07-22 5:09 ` Clément Pit-Claudel
@ 2021-07-22 6:44 ` Eli Zaretskii
2021-07-22 14:43 ` Clément Pit-Claudel
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-22 6:44 UTC (permalink / raw)
To: Clément Pit-Claudel; +Cc: emacs-devel
> Cc: emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Thu, 22 Jul 2021 01:09:13 -0400
>
> > But how would the "launched parse" access the buffer text if it runs
> > in parallel to normal editing? We've discussed the difficulties with
> > that, and you seem to ignore them here?
>
> Lots of magic handwaving:
Hand-waving always works, of course.
> > Which means the results will many times be slightly wrong, because the
> > parse info you use is outdated?
>
> Maybe. In practice if the delay between requesting the info and getting it is perceptible, then displaying outdated info is better than freezing until you get up-to-date info, no? Either you're getting info so fast that the user doesn't realize that you're outdated by 1ms; or you're getting info so slowly that the user realizes that you're running one second behind — but it's much better than freezing for one second.
The time doesn't matter here: the amount of changes does.
IME, display based on outdated information is NOT okay. The
discrepancy between the actual stuff on the screen and its
fontification based on outdated buffer context could be quite
annoying, for example.
> This is a problem we have all the time with Flycheck btw: you send the text of the buffer to a compiler, it returns 3 seconds later, and you want to display errors as reported by the compiler. By the time we get errors to display, the locations they come with are outdated. We don't have a good solution.
>
> Visual Studio in the old days had a really beautiful solution for this. There was a (basically) free API you could call to snapshot a buffer at a point in time; then there was a function that translated positions in that snapshot to position in the current buffer (think of it as magically putting a marker into the past buffer when the snapshot was taken, and then querying its current position). So positions returned by the compiler or any other tools were still relevant if they referred to a three-seconds old buffer, since you could translate them to the current buffer.
We can do that as well. It's again something very similar to the
undo-list info we already collect.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 3:10 ` Yuan Fu
@ 2021-07-22 8:23 ` Eli Zaretskii
2021-07-22 13:47 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-22 8:23 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 21 Jul 2021 23:10:14 -0400
> Cc: Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel <emacs-devel@gnu.org>
>
> Should I keep posting patches, or should I create a branch in /scratch?
The latter, I think.
> If the latter, how do I do it?
You need write access to the Emacs repository.
> @@ -96,33 +103,30 @@ ts_ensure_parsed (Lisp_Object parser)
> ts_read_buffer (void *buffer, uint32_t byte_index,
> TSPoint position, uint32_t *bytes_read)
> {
> - if (! BUFFER_LIVE_P ((struct buffer *) buffer))
> + if (!BUFFER_LIVE_P ((struct buffer *) buffer))
> error ("BUFFER is not live");
Is it really TRT to signal an error here? This is not code that would
run from a user command, so signaling an error is not necessarily the
useful response to this situation. Why not simply return without
doing anything?
> + // TODO BUF_ZV_BYTE?
Do you want to discuss this? I'd prefer to have it the other way
around: use BUF_ZV_BYTE by default. The callers could widen the
buffer if they needed to access outside of the narrowing.
> else
> {
> - beg = (char *) BYTE_POS_ADDR (byte_pos);
> + beg = (char *) BUF_BYTE_ADDRESS (buffer, byte_pos);
> len = next_char_len(byte_pos);
The last line is incorrect, as it assumes the current buffer. You
actually don't need that function, it's enough to use
BYTES_BY_CHAR_HEAD on the address in 'beg'.
> *bytes_read = (uint32_t) len;
Is using uint32_t the restriction of tree-sitter? Doesn't it support
reading more than 2 gigabytes?
> +DEFUN ("tree-sitter-node-type",
> + Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
> + doc: /* Return the NODE's type as a symbol. */)
> + (Lisp_Object node)
> +{
> + CHECK_TS_NODE (node);
> + TSNode ts_node = XTS_NODE (node)->node;
> + const char *type = ts_node_type(ts_node);
> + return intern_c_string (type);
Why do we need to intern the string each time? can't we store the
interned symbol there, instead of a C string, in the first place?
Thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 8:23 ` Eli Zaretskii
@ 2021-07-22 13:47 ` Yuan Fu
2021-07-22 14:11 ` Óscar Fuentes
` (2 more replies)
0 siblings, 3 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-22 13:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Clément Pit-Claudel, Stefan Monnier, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3092 bytes --]
>
>> + // TODO BUF_ZV_BYTE?
>
> Do you want to discuss this? I'd prefer to have it the other way
> around: use BUF_ZV_BYTE by default. The callers could widen the
> buffer if they needed to access outside of the narrowing.
Yes, I meant to discuss this. The problem with respecting narrowing is that, a user can freely narrow and widen arbitrarily, and Emacs needs to translate them into insertion & deletion of the buffer text for tree-sitter, every time a user narrows or widens the buffer. Plus, if tree-sitter respects narrowing, it could happen where a user narrows the buffer, the font-locking changes and is not correct anymore. Maybe that’s not the user want. Also, if someone narrows and widens often, maybe narrow to a function for better focus, tree-sitter needs to constantly re-parse most of the buffer. These are not significant disadvantages, but what do we get from respecting narrowing that justifies code complexity and these small annoyances?
>> *bytes_read = (uint32_t) len;
>
> Is using uint32_t the restriction of tree-sitter? Doesn't it support
> reading more than 2 gigabytes?
I’m not sure why it asks for uint32 specifically, but that’s what it asks for its api. I don’t think you are supposed to use tree-sitter on files of size of gigabytes, because the author mentioned that tree-sitter uses over 10x as much memory as the size of the source file [1]. On files larger than a couple of megabytes, I think we better turn off tree-sitter. Normally those files are not regular source files, anyway, and we don’t need a parse tree for a log.
That leads to another point. I suspect the memory limit will come before the speed limit, i.e., as the file size increases, the memory consumption will become unacceptable before the speed does. So it is possible that we want to outright disable tree-sitter for larger files, then we don’t need to do much to improve the responsiveness of tree-sitter on large files. And we might want to delete the parse tree if a buffer has been idle for a while. Of course, that’s just my superstition, we’ll see once we can measure the performance.
>
>> +DEFUN ("tree-sitter-node-type",
>> + Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
>> + doc: /* Return the NODE's type as a symbol. */)
>> + (Lisp_Object node)
>> +{
>> + CHECK_TS_NODE (node);
>> + TSNode ts_node = XTS_NODE (node)->node;
>> + const char *type = ts_node_type(ts_node);
>> + return intern_c_string (type);
>
> Why do we need to intern the string each time? can't we store the
> interned symbol there, instead of a C string, in the first place?
I’m not sure what do you mean by “store the interned symbol there”, where do I store the interned symbol? (BTW, If you see something wrong, that’s probably because I don’t know the right way to do it, and grepping only got me that far.)
[1]: https://github.com/tree-sitter/tree-sitter/issues/222#issuecomment-435987441 <https://github.com/tree-sitter/tree-sitter/issues/222#issuecomment-435987441>
Thanks,
Yuan
[-- Attachment #2: Type: text/html, Size: 4440 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 13:47 ` Yuan Fu
@ 2021-07-22 14:11 ` Óscar Fuentes
2021-07-22 17:09 ` Eli Zaretskii
2021-07-22 17:00 ` Eli Zaretskii
2021-07-24 9:33 ` Stephen Leake
2 siblings, 1 reply; 370+ messages in thread
From: Óscar Fuentes @ 2021-07-22 14:11 UTC (permalink / raw)
To: emacs-devel
Yuan Fu <casouri@gmail.com> writes:
> That leads to another point. I suspect the memory limit will come
> before the speed limit, i.e., as the file size increases, the memory
> consumption will become unacceptable before the speed does. So it is
> possible that we want to outright disable tree-sitter for larger
> files, then we don’t need to do much to improve the responsiveness of
> tree-sitter on large files. And we might want to delete the parse tree
> if a buffer has been idle for a while. Of course, that’s just my
> superstition, we’ll see once we can measure the performance.
Of course those parameters would be configurable on Emacs, but disabling
TS on a 2MB file because it uses 20MB is way too conservative, IMHO.
Nowadays the cheapest netbook comes with at least 1GB RAM and can do
memory-to-memory copies at a rate of GB/s.
Guys, you are speculating too much about minutia and worst-case
scenarios. (Do we really care about TS not supporting files larger than
4GB? I mean, REALLY?)
I'll rather focus on implementing the thing and optimize later. My bet
is that a crude implementation would work fine for the 99% of the users
and be an improvement over what we have now on practically all cases.
BTW, a 10x AST/source-code size ratio is quite reasonable.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: [SPAM UNSURE] Re: How to add pseudo vector types
2021-07-22 6:44 ` Eli Zaretskii
@ 2021-07-22 14:43 ` Clément Pit-Claudel
0 siblings, 0 replies; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-22 14:43 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On 7/22/21 2:44 AM, Eli Zaretskii wrote:
> We can do that as well. It's again something very similar to the
> undo-list info we already collect.
Yes, the last discussion about this didn't end too well :'( And I haven't had time to work on it.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 13:47 ` Yuan Fu
2021-07-22 14:11 ` Óscar Fuentes
@ 2021-07-22 17:00 ` Eli Zaretskii
2021-07-22 17:47 ` Yuan Fu
` (2 more replies)
2021-07-24 9:33 ` Stephen Leake
2 siblings, 3 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-22 17:00 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 22 Jul 2021 09:47:45 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
>
> Yes, I meant to discuss this. The problem with respecting narrowing is that, a user can freely narrow and widen arbitrarily, and Emacs needs to translate them into insertion & deletion of the buffer text for tree-sitter, every time a user narrows or widens the buffer. Plus, if tree-sitter respects narrowing, it could happen where a user narrows the buffer, the font-locking changes and is not correct anymore. Maybe that’s not the user want. Also, if someone narrows and widens often, maybe narrow to a function for better focus, tree-sitter needs to constantly re-parse most of the buffer. These are not significant disadvantages, but what do we get from respecting narrowing that justifies code complexity and these small annoyances?
But that's how the current font-lock and indentation work: they never
look beyond the narrowing limits. So why should the TS-based features
behave differently?
As for temporary narrowing: if we record the changes, but don't send
them to TS until we actually need re-parsing, then we could eliminate
the temporary narrowing when we report the changes to TS, leaving only
the narrowing that exists at the time of the re-parse. At least for
fontifications, that time is redisplay time, and users do expect to
see the text fontified according to the current narrowing.
> >> *bytes_read = (uint32_t) len;
> >
> > Is using uint32_t the restriction of tree-sitter? Doesn't it support
> > reading more than 2 gigabytes?
>
> I’m not sure why it asks for uint32 specifically, but that’s what it asks for its api. I don’t think you are supposed to use tree-sitter on files of size of gigabytes, because the author mentioned that tree-sitter uses over 10x as much memory as the size of the source file [1]. On files larger than a couple of megabytes, I think we better turn off tree-sitter. Normally those files are not regular source files, anyway, and we don’t need a parse tree for a log.
I don't necessarily agree with the "not regular source files" part.
For example, JSON files can be quite large. And there are also log
files, which are even larger -- did no one adapt TS to fontifying
those yet?
More generally: is the problem real? If you make a file that is 1000
copies of xdisp.c, and then submit it to TS, do you really get 10GB of
memory consumption? This is something that is good to know up front,
so we'd know what to expect down the road.
> That leads to another point. I suspect the memory limit will come before the speed limit, i.e., as the file size increases, the memory consumption will become unacceptable before the speed does. So it is possible that we want to outright disable tree-sitter for larger files, then we don’t need to do much to improve the responsiveness of tree-sitter on large files. And we might want to delete the parse tree if a buffer has been idle for a while. Of course, that’s just my superstition, we’ll see once we can measure the performance.
See above: IMO, we should benchmark both the CPU and memory
performance of TS for such large files, before we decide on the course
of action.
> >> +DEFUN ("tree-sitter-node-type",
> >> + Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
> >> + doc: /* Return the NODE's type as a symbol. */)
> >> + (Lisp_Object node)
> >> +{
> >> + CHECK_TS_NODE (node);
> >> + TSNode ts_node = XTS_NODE (node)->node;
> >> + const char *type = ts_node_type(ts_node);
> >> + return intern_c_string (type);
> >
> > Why do we need to intern the string each time? can't we store the
> > interned symbol there, instead of a C string, in the first place?
>
> I’m not sure what do you mean by “store the interned symbol there”, where do I store the interned symbol?
In the struct that ts_node_type accesses, instead of the 'char *'
string you store there now.
> (BTW, If you see something wrong, that’s probably because I don’t know the right way to do it, and grepping only got me that far.)
Do what? feel free to ask questions when you aren't sure how to
accomplish something on the C level.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 14:11 ` Óscar Fuentes
@ 2021-07-22 17:09 ` Eli Zaretskii
2021-07-22 19:29 ` Óscar Fuentes
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-22 17:09 UTC (permalink / raw)
To: Óscar Fuentes; +Cc: emacs-devel
> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Thu, 22 Jul 2021 16:11:09 +0200
>
> Yuan Fu <casouri@gmail.com> writes:
>
> > That leads to another point. I suspect the memory limit will come
> > before the speed limit, i.e., as the file size increases, the memory
> > consumption will become unacceptable before the speed does. So it is
> > possible that we want to outright disable tree-sitter for larger
> > files, then we don’t need to do much to improve the responsiveness of
> > tree-sitter on large files. And we might want to delete the parse tree
> > if a buffer has been idle for a while. Of course, that’s just my
> > superstition, we’ll see once we can measure the performance.
>
> Of course those parameters would be configurable on Emacs, but disabling
> TS on a 2MB file because it uses 20MB is way too conservative, IMHO.
Why would we limit ourselves to 20MB? uint32_t supports upto 4GB.
> Guys, you are speculating too much about minutia and worst-case
> scenarios. (Do we really care about TS not supporting files larger than
> 4GB? I mean, REALLY?)
Yes, we do. For at least 2 reasons: (a) source code files produced by
programs can be very large; (b) having a feature that fails before you
reach the max size of a buffer Emacs supports is a problem, because it
will cause hard-to-deal-with problems.
Or let me turn the table and ask why we cared to support the largest
possible buffer size when 32-bit systems were the rule?
> I'll rather focus on implementing the thing and optimize later. My bet
> is that a crude implementation would work fine for the 99% of the users
> and be an improvement over what we have now on practically all cases.
This is not a prototype project. (Or at least I hope it won't end up
being that.) This is supposed to be the industry-strength code that
core Emacs will use for the years to come to support features which
need language-dependent parsing. It cannot work correctly only in 99%
of use cases. So we must assess the limitations seriously and plan
ahead for them.
> BTW, a 10x AST/source-code size ratio is quite reasonable.
It could be, but please don't forget that this is _in_addition_to_ the
"normal" Emacs memory footprint, and that could easily be 1GB and
sometimes several times that.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 17:00 ` Eli Zaretskii
@ 2021-07-22 17:47 ` Yuan Fu
2021-07-22 19:05 ` Eli Zaretskii
2021-07-23 14:07 ` Stefan Monnier
2021-07-24 9:42 ` Stephen Leake
2 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-22 17:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Clément Pit-Claudel, Stefan Monnier, emacs-devel
> On Jul 22, 2021, at 1:00 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 22 Jul 2021 09:47:45 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> emacs-devel@gnu.org
>>
>> Yes, I meant to discuss this. The problem with respecting narrowing is that, a user can freely narrow and widen arbitrarily, and Emacs needs to translate them into insertion & deletion of the buffer text for tree-sitter, every time a user narrows or widens the buffer. Plus, if tree-sitter respects narrowing, it could happen where a user narrows the buffer, the font-locking changes and is not correct anymore. Maybe that’s not the user want. Also, if someone narrows and widens often, maybe narrow to a function for better focus, tree-sitter needs to constantly re-parse most of the buffer. These are not significant disadvantages, but what do we get from respecting narrowing that justifies code complexity and these small annoyances?
>
> But that's how the current font-lock and indentation work: they never
> look beyond the narrowing limits. So why should the TS-based features
> behave differently?
>
> As for temporary narrowing: if we record the changes, but don't send
> them to TS until we actually need re-parsing, then we could eliminate
> the temporary narrowing when we report the changes to TS, leaving only
> the narrowing that exists at the time of the re-parse. At least for
> fontifications, that time is redisplay time, and users do expect to
> see the text fontified according to the current narrowing.
>
>>>> *bytes_read = (uint32_t) len;
>>>
>>> Is using uint32_t the restriction of tree-sitter? Doesn't it support
>>> reading more than 2 gigabytes?
>>
>> I’m not sure why it asks for uint32 specifically, but that’s what it asks for its api. I don’t think you are supposed to use tree-sitter on files of size of gigabytes, because the author mentioned that tree-sitter uses over 10x as much memory as the size of the source file [1]. On files larger than a couple of megabytes, I think we better turn off tree-sitter. Normally those files are not regular source files, anyway, and we don’t need a parse tree for a log.
>
> I don't necessarily agree with the "not regular source files" part.
> For example, JSON files can be quite large. And there are also log
> files, which are even larger -- did no one adapt TS to fontifying
> those yet?
There is a JSON parser, but I don’t think there is one for log files.
>
> More generally: is the problem real? If you make a file that is 1000
> copies of xdisp.c, and then submit it to TS, do you really get 10GB of
> memory consumption? This is something that is good to know up front,
> so we'd know what to expect down the road.
Yes. I concatenated 100 xdisp.c together, and parsed them with my simple C program. It used 1.8 G. I didn’t test for 1000 together, but I think the trend is linear.
time -l ./main-large-c
16.48 real 15.32 user 0.81 sys
1883959296 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
459951 page reclaims
22 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
6 voluntary context switches
1653 involuntary context switches
107310143182 instructions retired
58561420060 cycles elapsed
1883095040 peak memory footprint
>
>> That leads to another point. I suspect the memory limit will come before the speed limit, i.e., as the file size increases, the memory consumption will become unacceptable before the speed does. So it is possible that we want to outright disable tree-sitter for larger files, then we don’t need to do much to improve the responsiveness of tree-sitter on large files. And we might want to delete the parse tree if a buffer has been idle for a while. Of course, that’s just my superstition, we’ll see once we can measure the performance.
>
> See above: IMO, we should benchmark both the CPU and memory
> performance of TS for such large files, before we decide on the course
> of action.
That’s my thought, too. I should have reserved my suspicion until I have benchmark measurements.
>
>>>> +DEFUN ("tree-sitter-node-type",
>>>> + Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
>>>> + doc: /* Return the NODE's type as a symbol. */)
>>>> + (Lisp_Object node)
>>>> +{
>>>> + CHECK_TS_NODE (node);
>>>> + TSNode ts_node = XTS_NODE (node)->node;
>>>> + const char *type = ts_node_type(ts_node);
>>>> + return intern_c_string (type);
>>>
>>> Why do we need to intern the string each time? can't we store the
>>> interned symbol there, instead of a C string, in the first place?
>>
>> I’m not sure what do you mean by “store the interned symbol there”, where do I store the interned symbol?
>
> In the struct that ts_node_type accesses, instead of the 'char *'
> string you store there now.
The struct that ts_node_type accesses is a TSNode, which is defined by tree-sitter. ts_node_type is an API provided by tree-sitter, I’m just exposing it to lisp. I could return strings instead of symbols, but I thought symbols might be more appropriate and more convenient for users of this function.
>> (BTW, If you see something wrong, that’s probably because I don’t know the right way to do it, and grepping only got me that far.)
>
> Do what? feel free to ask questions when you aren't sure how to
> accomplish something on the C level.
Thanks. Is below the correct way to set a buffer-local variable? (I’m setting tree-sitter-parser-list.)
struct buffer *old_buffer = current_buffer;
set_buffer_internal (XBUFFER (buffer));
Fset (Qtree_sitter_parser_list,
Fcons (lisp_parser, Fsymbol_value (Qtree_sitter_parser_list)));
set_buffer_internal (old_buffer);
Also, we don’t call change hooks in replace_range_2, why? Should I update tree-sitter trees in that function, or should I not?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 17:47 ` Yuan Fu
@ 2021-07-22 19:05 ` Eli Zaretskii
2021-07-23 13:25 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-22 19:05 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 22 Jul 2021 13:47:20 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
>
> > More generally: is the problem real? If you make a file that is 1000
> > copies of xdisp.c, and then submit it to TS, do you really get 10GB of
> > memory consumption? This is something that is good to know up front,
> > so we'd know what to expect down the road.
>
> Yes. I concatenated 100 xdisp.c together, and parsed them with my simple C program. It used 1.8 G. I didn’t test for 1000 together, but I think the trend is linear.
That's good to know, thanks.
So what does TS do if it attempts to allocate more memory and that
fails? Regardless, we'd need some fallback strategy, because AFAIU
many people run with VM overcommit enabled, so the OOM killer will
just kill the Emacs process when it asks for too much memory.
> >>>> +DEFUN ("tree-sitter-node-type",
> >>>> + Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
> >>>> + doc: /* Return the NODE's type as a symbol. */)
> >>>> + (Lisp_Object node)
> >>>> +{
> >>>> + CHECK_TS_NODE (node);
> >>>> + TSNode ts_node = XTS_NODE (node)->node;
> >>>> + const char *type = ts_node_type(ts_node);
> >>>> + return intern_c_string (type);
> >>>
> >>> Why do we need to intern the string each time? can't we store the
> >>> interned symbol there, instead of a C string, in the first place?
> >>
> >> I’m not sure what do you mean by “store the interned symbol there”, where do I store the interned symbol?
> >
> > In the struct that ts_node_type accesses, instead of the 'char *'
> > string you store there now.
>
> The struct that ts_node_type accesses is a TSNode, which is defined by tree-sitter. ts_node_type is an API provided by tree-sitter, I’m just exposing it to lisp. I could return strings instead of symbols, but I thought symbols might be more appropriate and more convenient for users of this function.
Maybe there's a better way of exposing that to Lisp. But that's a
minor point, it can be left for later.
> Is below the correct way to set a buffer-local variable? (I’m setting tree-sitter-parser-list.)
>
> struct buffer *old_buffer = current_buffer;
> set_buffer_internal (XBUFFER (buffer));
>
> Fset (Qtree_sitter_parser_list,
> Fcons (lisp_parser, Fsymbol_value (Qtree_sitter_parser_list)));
>
> set_buffer_internal (old_buffer);
Yes, but it would be better to use DEFVAR_LISP and then you could
assign directly to Vtree_sitter_parser_list, instead of using Fset.
> Also, we don’t call change hooks in replace_range_2, why?
Because it is called in a loop, one character at a time. The caller
of replace_range_2 calls these hooks for the entire region, once.
> Should I update tree-sitter trees in that function, or should I not?
The only caller is casify_region, so you could update there.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 17:09 ` Eli Zaretskii
@ 2021-07-22 19:29 ` Óscar Fuentes
2021-07-23 5:21 ` Eli Zaretskii
2021-07-24 9:38 ` Stephen Leake
0 siblings, 2 replies; 370+ messages in thread
From: Óscar Fuentes @ 2021-07-22 19:29 UTC (permalink / raw)
To: emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> Of course those parameters would be configurable on Emacs, but disabling
>> TS on a 2MB file because it uses 20MB is way too conservative, IMHO.
>
> Why would we limit ourselves to 20MB? uint32_t supports upto 4GB.
I didn't suggest that we should limit ourselves to 20MB, I observed that
current machines have enough resources for handling large files ("large"
meaning "big enough to keep me busy reading for some years.")
>> Guys, you are speculating too much about minutia and worst-case
>> scenarios. (Do we really care about TS not supporting files larger than
>> 4GB? I mean, REALLY?)
>
> Yes, we do. For at least 2 reasons: (a) source code files produced by
> programs can be very large;
I know, I work with machine-generated (read: code-dense) 20+MB C++ files
on a regular basis.
However, I wouldn't agree on renouncing to useful features because they
could be problematic when dealing with large files. That is, it would be
a mistake to discard TS as inadequate for Emacs just because it doesn't
benefit (and I say "not benefit", not "penalise") certain use cases.
> (b) having a feature that fails before you
> reach the max size of a buffer Emacs supports is a problem, because it
> will cause hard-to-deal-with problems.
We can put reasonable limits on when to use TS once we have some
experience with it. What matters right now is if TS would be usable for
the typical use case, and I guess the answer is positive. Also, it is
not as if we had other options to consider.
>> I'll rather focus on implementing the thing and optimize later. My bet
>> is that a crude implementation would work fine for the 99% of the users
>> and be an improvement over what we have now on practically all cases.
>
> This is not a prototype project. (Or at least I hope it won't end up
> being that.) This is supposed to be the industry-strength code that
> core Emacs will use for the years to come to support features which
> need language-dependent parsing. It cannot work correctly only in 99%
> of use cases. So we must assess the limitations seriously and plan
> ahead for them.
I said "would work *fine* for the 99% of users", this does not imply
that it would work incorrectly for the rest.
On the "planning ahead" part, TS support would be an optional,
quasi-external feature for some time, it is not as if it comes out with
some critical bug Emacs would become unusable. TS support can be
fine-tuned without disrupting the rest of Emacs development. If, on the
other hand, we start making changes on Emacs' internals for allowing
some TS-related optimizations (even when we don't know if they are
neccessary at all) that could be a destabilizing move for Emacs as a
whole. Apart from delaying TS support.
>> BTW, a 10x AST/source-code size ratio is quite reasonable.
>
> It could be, but please don't forget that this is _in_addition_to_ the
> "normal" Emacs memory footprint, and that could easily be 1GB and
> sometimes several times that.
Yes, but if you want something you need to pay something, and you can
hardly get TS' features with less than that. At least for complex
languages like C++.
Talking about scenarios of heavy memory usage, I'll comment in passing
that in my recent experience, once Emacs exceeds 2GB the gc pauses start
to be so annoying that I don't care anymore about how much memory an
external tool would use if it works fast enough.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 19:29 ` Óscar Fuentes
@ 2021-07-23 5:21 ` Eli Zaretskii
2021-07-24 9:38 ` Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-23 5:21 UTC (permalink / raw)
To: Óscar Fuentes; +Cc: emacs-devel
> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Thu, 22 Jul 2021 21:29:02 +0200
>
> >> Guys, you are speculating too much about minutia and worst-case
> >> scenarios. (Do we really care about TS not supporting files larger than
> >> 4GB? I mean, REALLY?)
> >
> > Yes, we do. For at least 2 reasons: (a) source code files produced by
> > programs can be very large;
>
> I know, I work with machine-generated (read: code-dense) 20+MB C++ files
> on a regular basis.
>
> However, I wouldn't agree on renouncing to useful features because they
> could be problematic when dealing with large files. That is, it would be
> a mistake to discard TS as inadequate for Emacs just because it doesn't
> benefit (and I say "not benefit", not "penalise") certain use cases.
It was not my intent to say we should discard TS as inadequate because
of these limitations. What I meant is that we should know about the
limitations and plan in advance how to handle them when a user bumps
into them. Disabling TS-related features could be one such
mitigation, but maybe we could come up with smarter fallbacks.
It sounds like the rest of you message was to convince me not to give
up on TS, in which case there's no need: I'm convinced already, and
mostly agree with what you say.
> Talking about scenarios of heavy memory usage, I'll comment in passing
> that in my recent experience, once Emacs exceeds 2GB the gc pauses start
> to be so annoying that I don't care anymore about how much memory an
> external tool would use if it works fast enough.
That's a separate issue. And the amount of memory GC has to scan is
not directly related to the memory footprint of the Emacs process. So
I would be interested in seeing the results of memory-report in those
cases where GC takes too long (in a separate thread, please).
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 19:05 ` Eli Zaretskii
@ 2021-07-23 13:25 ` Yuan Fu
2021-07-23 19:10 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-23 13:25 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Clément Pit-Claudel, Stefan Monnier, emacs-devel
> On Jul 22, 2021, at 3:05 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 22 Jul 2021 13:47:20 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> emacs-devel@gnu.org
>>
>>> More generally: is the problem real? If you make a file that is 1000
>>> copies of xdisp.c, and then submit it to TS, do you really get 10GB of
>>> memory consumption? This is something that is good to know up front,
>>> so we'd know what to expect down the road.
>>
>> Yes. I concatenated 100 xdisp.c together, and parsed them with my simple C program. It used 1.8 G. I didn’t test for 1000 together, but I think the trend is linear.
>
> That's good to know, thanks.
>
> So what does TS do if it attempts to allocate more memory and that
> fails? Regardless, we'd need some fallback strategy, because AFAIU
> many people run with VM overcommit enabled, so the OOM killer will
> just kill the Emacs process when it asks for too much memory.
Abort, it seems:
static inline void *ts_malloc_default(size_t size) {
void *result = malloc(size);
if (size > 0 && !result) {
fprintf(stderr, "tree-sitter failed to allocate %zu bytes", size);
exit(1);
}
return result;
}
>> Also, we don’t call change hooks in replace_range_2, why?
>
> Because it is called in a loop, one character at a time. The caller
> of replace_range_2 calls these hooks for the entire region, once.
>
>> Should I update tree-sitter trees in that function, or should I not?
>
> The only caller is casify_region, so you could update there.
casify_region doesn’t have access to byte positions. I’ll leave it as-is, recording change in replace_range_2, if you don’t object to it.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 17:00 ` Eli Zaretskii
2021-07-22 17:47 ` Yuan Fu
@ 2021-07-23 14:07 ` Stefan Monnier
2021-07-23 14:45 ` Yuan Fu
2021-07-23 19:13 ` Eli Zaretskii
2021-07-24 9:42 ` Stephen Leake
2 siblings, 2 replies; 370+ messages in thread
From: Stefan Monnier @ 2021-07-23 14:07 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, cpitclaudel, emacs-devel
> But that's how the current font-lock and indentation work: they never
> look beyond the narrowing limits.
Not quite: that's true for indentation, but for font-lock we have
`font-lock-dont-widen` (i.e. by default, font-lock widens temporarily
while it does its job).
For TS, given the cost associated with changing the bounds, I think it
would make a lot of sense to ignore narrowing (and maybe provide some
separate way to specify bounds, for the rare cases like Info and Rmail
where a buffer contains "a collection of things" and we only want to
parse/manipulate one of those things at any given time).
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-23 14:07 ` Stefan Monnier
@ 2021-07-23 14:45 ` Yuan Fu
2021-07-23 19:13 ` Eli Zaretskii
1 sibling, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-23 14:45 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, Clément Pit-Claudel, emacs-devel
> On Jul 23, 2021, at 10:07 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
>> But that's how the current font-lock and indentation work: they never
>> look beyond the narrowing limits.
>
> Not quite: that's true for indentation, but for font-lock we have
> `font-lock-dont-widen` (i.e. by default, font-lock widens temporarily
> while it does its job).
>
> For TS, given the cost associated with changing the bounds, I think it
> would make a lot of sense to ignore narrowing (and maybe provide some
> separate way to specify bounds, for the rare cases like Info and Rmail
> where a buffer contains "a collection of things" and we only want to
> parse/manipulate one of those things at any given time).
Tree-sitter lets you set ranges in which the parser works in. That’s how they support multi-language files like html+javascript+css. This will certainly work for Rmail and Info, too.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-23 13:25 ` Yuan Fu
@ 2021-07-23 19:10 ` Eli Zaretskii
2021-07-23 20:01 ` Perry E. Metzger
2021-07-23 20:22 ` Yuan Fu
0 siblings, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-23 19:10 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 23 Jul 2021 09:25:17 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
>
> > So what does TS do if it attempts to allocate more memory and that
> > fails? Regardless, we'd need some fallback strategy, because AFAIU
> > many people run with VM overcommit enabled, so the OOM killer will
> > just kill the Emacs process when it asks for too much memory.
>
> Abort, it seems:
>
> static inline void *ts_malloc_default(size_t size) {
> void *result = malloc(size);
> if (size > 0 && !result) {
> fprintf(stderr, "tree-sitter failed to allocate %zu bytes", size);
> exit(1);
> }
> return result;
> }
We must replace this function, if only because the MS-Windows build of
Emacs uses a custom malloc implementation. Does TS allow the client
to use its own malloc?
> >> Also, we don’t call change hooks in replace_range_2, why?
> >
> > Because it is called in a loop, one character at a time. The caller
> > of replace_range_2 calls these hooks for the entire region, once.
> >
> >> Should I update tree-sitter trees in that function, or should I not?
> >
> > The only caller is casify_region, so you could update there.
>
> casify_region doesn’t have access to byte positions.
You can compute them using CHAR_TO_BYTE.
> I’ll leave it as-is, recording change in replace_range_2, if you don’t object to it.
That'd be wasteful, I think. replace_range_2 is called one character
at a time.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-23 14:07 ` Stefan Monnier
2021-07-23 14:45 ` Yuan Fu
@ 2021-07-23 19:13 ` Eli Zaretskii
2021-07-23 20:28 ` Stefan Monnier
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-23 19:13 UTC (permalink / raw)
To: Stefan Monnier; +Cc: casouri, cpitclaudel, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Yuan Fu <casouri@gmail.com>, cpitclaudel@gmail.com, emacs-devel@gnu.org
> Date: Fri, 23 Jul 2021 10:07:42 -0400
>
> > But that's how the current font-lock and indentation work: they never
> > look beyond the narrowing limits.
>
> Not quite: that's true for indentation, but for font-lock we have
> `font-lock-dont-widen` (i.e. by default, font-lock widens temporarily
> while it does its job).
jit-lock never requests fontifications outside of the accessible
portion, because redisplay doesn't look there.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-23 19:10 ` Eli Zaretskii
@ 2021-07-23 20:01 ` Perry E. Metzger
2021-07-24 5:52 ` Eli Zaretskii
2021-07-23 20:22 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Perry E. Metzger @ 2021-07-23 20:01 UTC (permalink / raw)
To: emacs-devel
On 7/23/21 15:10, Eli Zaretskii wrote:
>> Abort, it seems:
>> static inline void *ts_malloc_default(size_t size) {
>> void *result = malloc(size);
>> if (size > 0 && !result) {
>> fprintf(stderr, "tree-sitter failed to allocate %zu bytes", size);
>> exit(1);
>> }
>> return result;
>> }
> We must replace this function, if only because the MS-Windows build of
> Emacs uses a custom malloc implementation. Does TS allow the client
> to use its own malloc?
Certainly more graceful allocation error behavior would be necessary in
an Emacs context even on Unix-like operating systems. An unexpected hard
exit could result in loss of data for the user.
Perry
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-23 19:10 ` Eli Zaretskii
2021-07-23 20:01 ` Perry E. Metzger
@ 2021-07-23 20:22 ` Yuan Fu
2021-07-24 6:00 ` Eli Zaretskii
2021-07-24 15:04 ` Yuan Fu
1 sibling, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-23 20:22 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Clément Pit-Claudel, Stefan Monnier, emacs-devel
>
> We must replace this function, if only because the MS-Windows build of
> Emacs uses a custom malloc implementation. Does TS allow the client
> to use its own malloc?
Yes, in that case, we need to embed tree-sitter into Emacs, instead of using it as a dynamic library, I think.
// Allow clients to override allocation functions
#ifndef ts_malloc
#define ts_malloc ts_malloc_default
#endif
#ifndef ts_calloc
#define ts_calloc ts_calloc_default
#endif
#ifndef ts_realloc
#define ts_realloc ts_realloc_default
#endif
#ifndef ts_free
#define ts_free ts_free_default
#endif
How do we handle such thing in Emacs?
>
>>>> Also, we don’t call change hooks in replace_range_2, why?
>>>
>>> Because it is called in a loop, one character at a time. The caller
>>> of replace_range_2 calls these hooks for the entire region, once.
>>>
>>>> Should I update tree-sitter trees in that function, or should I not?
>>>
>>> The only caller is casify_region, so you could update there.
>>
>> casify_region doesn’t have access to byte positions.
>
> You can compute them using CHAR_TO_BYTE.
Ok. I’ll do that.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-23 19:13 ` Eli Zaretskii
@ 2021-07-23 20:28 ` Stefan Monnier
2021-07-24 6:02 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-07-23 20:28 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, cpitclaudel, emacs-devel
>> > But that's how the current font-lock and indentation work: they never
>> > look beyond the narrowing limits.
>> Not quite: that's true for indentation, but for font-lock we have
>> `font-lock-dont-widen` (i.e. by default, font-lock widens temporarily
>> while it does its job).
> jit-lock never requests fontifications outside of the accessible
> portion, because redisplay doesn't look there.
But font-lock may look (and fontify) beyond the narrowing, and
when it calls `syntax-ppss` it will usually parse from 1 rather than
from `point-min`.
I'd expect jit/font-lock running on top of TS to behave similarly: the
actual parsing is done over the widened buffer but the fontification is
only applied to the visible part (or nearby).
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 19:37 ` Eli Zaretskii
@ 2021-07-24 2:00 ` Stephen Leake
2021-07-24 6:51 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-24 2:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, monnier, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Cc: casouri@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
>> Date: Wed, 21 Jul 2021 08:49:15 -0700
>>
>> > I fail to see the significance of the difference. Surely, you could
>> > hand it a block of text with changes to mean that this block replaces
>> > the previous version of that block. It might take the parser more
>> > work to update the parse tree in this case, but if it's fast enough,
>> > that won't be the problem. Right?
>>
>> tree-sitter doesn't store the previous text, so there's nothing to
>> compare it to.
>
> There was nothing about comparison in my text. You tell TS that
> editing replaced a block of text between A and B with block between A
> and C, without revealing the fine-grained changes inside that block.
> This must work, because editing could indeed do just that.
I see; treat the whole block as one change. Yes, that would work, but it
would probably be less optimal than sending a list of smaller changes;
depends on the details.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 19:43 ` Eli Zaretskii
@ 2021-07-24 2:57 ` Stephen Leake
2021-07-24 3:39 ` Óscar Fuentes
2021-07-24 7:06 ` Eli Zaretskii
2021-07-24 3:55 ` Clément Pit-Claudel
1 sibling, 2 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-24 2:57 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Clément Pit-Claudel, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Wed, 21 Jul 2021 12:54:16 -0400
>>
>> On 7/21/21 12:29 PM, Stephen Leake wrote:
>> > Yes, for both tree-sitter and wisi. wisi can take even longer if lots of
>> > error correction is required (I have a time-out set at 5 seconds). But
>> > that happens when the file is first opened; I doubt any user would start
>> > typing that fast. I know I typically take a while to just look at the
>> > text, and then navigate to the point of interest.
>>
>> I'm not sure. We've had significant complaint in Flycheck for freezing Emacs for <1s
>
> How much "less"? Close to 1 sec is indeed annoying, but 20 msec or so
> should be bearable.
>
> You seem to assume up front that TS (re)-parsing will take 1 sec, but
> AFAIK there's no reason to assume such bad performance.
This is for the initial parse, on a large file. No matter how fast the
parser is, I can give you a file that takes one second to parse, and
some user will have such a file (the work always expands to consume all
the resources available).
I just got incremental parse working well enough to measure it; in the
largest Ada file I have (10,000 lines from Eurocontrol):
initial parse: 1.539319 seconds
re-indent two lines: 0.038999 seconds
39 milliseconds for re-indent is just slow enough to be noticeable; I still
have algorithms to convert to be as incremental as possible.
The initial parse includes sending the full file text to the external
process over a pipe. Parsing that same large file with the command-line
parser (no emacs involved; file is memory-mapped) takes only 0.190
seconds, so there is lots of room for optimization - moving to a module
with direct access to the emacs buffer should do a lot.
In a very small file:
initial 0.000632 seconds
re-indent 0.000942 seconds
Easily fast enough to keep up with the user.
I don't have a direct comparison of tree-sitter and wisi parsing the
same file; I'll have to see if I can set that up.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 2:57 ` Stephen Leake
@ 2021-07-24 3:39 ` Óscar Fuentes
2021-07-24 7:34 ` Eli Zaretskii
2021-07-25 16:49 ` Stephen Leake
2021-07-24 7:06 ` Eli Zaretskii
1 sibling, 2 replies; 370+ messages in thread
From: Óscar Fuentes @ 2021-07-24 3:39 UTC (permalink / raw)
To: emacs-devel
Stephen Leake <stephen_leake@stephe-leake.org> writes:
> 39 milliseconds for re-indent is just slow enough to be noticeable; I still
> have algorithms to convert to be as incremental as possible.
[snip]
> In a very small file:
>
> initial 0.000632 seconds
> re-indent 0.000942 seconds
>
> Easily fast enough to keep up with the user.
Doing work every time the user changes the file is not always a good
thing. Nowadays the user doesn't just expect automatic indentation, he
wants code formatting too, which means splitting, fusing and inserting
lines, plus moving chunks of code left and right. Doing that every time
a character is added or deleted can be visually confusing due to chunks
of text changing positions as you type, so the systems I know are
triggered by certain events (like the insertion of characters that mark
the end of statements). Then they analyze the code and, if it is well
formed, apply the reformatting. Something similar could be said about
fontification and other tasks.
In my experience, delays of 0.1 seconds are perfectly acceptable with
this method.
So I'll insist on not obsessing too much about performance. Implement
something simple, see if it is usable. If not, invest effort on
optimizations until it is good enough.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-21 19:43 ` Eli Zaretskii
2021-07-24 2:57 ` Stephen Leake
@ 2021-07-24 3:55 ` Clément Pit-Claudel
1 sibling, 0 replies; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-24 3:55 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On 7/21/21 3:43 PM, Eli Zaretskii wrote:
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> I'm not sure. We've had significant complaint in Flycheck for freezing Emacs for <1s
>
> How much "less"? Close to 1 sec is indeed annoying, but 20 msec or so
> should be bearable.
Indeed, for us the freeze is only in when the buffer is first open, so 20ms is fine; the cases we had complains about where close to 1s, maybe .8s (and in some cases significantly more, too).
> You seem to assume up front that TS (re)-parsing will take 1 sec, but
> AFAIK there's no reason to assume such bad performance.
I expect/hope re-parsing will be much faster. For the initial parse, I was going from numbers that were given earlier in this thread.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-23 20:01 ` Perry E. Metzger
@ 2021-07-24 5:52 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 5:52 UTC (permalink / raw)
To: Perry E. Metzger; +Cc: emacs-devel
> Date: Fri, 23 Jul 2021 16:01:14 -0400
> From: "Perry E. Metzger" <perry@piermont.com>
>
> On 7/23/21 15:10, Eli Zaretskii wrote:
>
> >> Abort, it seems:
> >> static inline void *ts_malloc_default(size_t size) {
> >> void *result = malloc(size);
> >> if (size > 0 && !result) {
> >> fprintf(stderr, "tree-sitter failed to allocate %zu bytes", size);
> >> exit(1);
> >> }
> >> return result;
> >> }
> > We must replace this function, if only because the MS-Windows build of
> > Emacs uses a custom malloc implementation. Does TS allow the client
> > to use its own malloc?
>
> Certainly more graceful allocation error behavior would be necessary in
> an Emacs context even on Unix-like operating systems. An unexpected hard
> exit could result in loss of data for the user.
Sure, which is why this must be replaced.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-23 20:22 ` Yuan Fu
@ 2021-07-24 6:00 ` Eli Zaretskii
2021-07-25 18:01 ` Stephen Leake
2021-07-24 15:04 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 6:00 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 23 Jul 2021 16:22:59 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
>
> > We must replace this function, if only because the MS-Windows build of
> > Emacs uses a custom malloc implementation. Does TS allow the client
> > to use its own malloc?
>
> Yes, in that case, we need to embed tree-sitter into Emacs, instead of using it as a dynamic library, I think.
>
> // Allow clients to override allocation functions
> #ifndef ts_malloc
> #define ts_malloc ts_malloc_default
> #endif
> #ifndef ts_calloc
> #define ts_calloc ts_calloc_default
> #endif
> #ifndef ts_realloc
> #define ts_realloc ts_realloc_default
> #endif
> #ifndef ts_free
> #define ts_free ts_free_default
> #endif
>
> How do we handle such thing in Emacs?
We use xmalloc, which calls memory_full when allocation fails, which
releases some spare memory we have for this purpose, and tells the
user to save the session and exit.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-23 20:28 ` Stefan Monnier
@ 2021-07-24 6:02 ` Eli Zaretskii
2021-07-24 14:19 ` Stefan Monnier
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 6:02 UTC (permalink / raw)
To: Stefan Monnier; +Cc: casouri, cpitclaudel, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: casouri@gmail.com, cpitclaudel@gmail.com, emacs-devel@gnu.org
> Date: Fri, 23 Jul 2021 16:28:16 -0400
>
> >> > But that's how the current font-lock and indentation work: they never
> >> > look beyond the narrowing limits.
> >> Not quite: that's true for indentation, but for font-lock we have
> >> `font-lock-dont-widen` (i.e. by default, font-lock widens temporarily
> >> while it does its job).
> > jit-lock never requests fontifications outside of the accessible
> > portion, because redisplay doesn't look there.
>
> But font-lock may look (and fontify) beyond the narrowing, and
> when it calls `syntax-ppss` it will usually parse from 1 rather than
> from `point-min`.
Yes, and that's why I said that callers should call 'widen' if they
need to do so.
The question is what should TS reading do _by_default_.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 2:00 ` Stephen Leake
@ 2021-07-24 6:51 ` Eli Zaretskii
2021-07-25 16:16 ` Stephen Leake
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 6:51 UTC (permalink / raw)
To: Stephen Leake; +Cc: casouri, monnier, emacs-devel
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: casouri@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Fri, 23 Jul 2021 19:00:12 -0700
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> >> > I fail to see the significance of the difference. Surely, you could
> >> > hand it a block of text with changes to mean that this block replaces
> >> > the previous version of that block. It might take the parser more
> >> > work to update the parse tree in this case, but if it's fast enough,
> >> > that won't be the problem. Right?
> >>
> >> tree-sitter doesn't store the previous text, so there's nothing to
> >> compare it to.
> >
> > There was nothing about comparison in my text. You tell TS that
> > editing replaced a block of text between A and B with block between A
> > and C, without revealing the fine-grained changes inside that block.
> > This must work, because editing could indeed do just that.
>
> I see; treat the whole block as one change. Yes, that would work, but it
> would probably be less optimal than sending a list of smaller changes;
> depends on the details.
Since TS is very fast, I think this sub-optimality will not cause any
tangible performance issues in Emacs. And from our POV it is a good
optimization because it will minimize (and to some extent optimize)
the traffic between Emacs and TS.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 2:57 ` Stephen Leake
2021-07-24 3:39 ` Óscar Fuentes
@ 2021-07-24 7:06 ` Eli Zaretskii
2021-07-25 17:48 ` Stephen Leake
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 7:06 UTC (permalink / raw)
To: Stephen Leake; +Cc: cpitclaudel, emacs-devel
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
> Date: Fri, 23 Jul 2021 19:57:32 -0700
>
> > How much "less"? Close to 1 sec is indeed annoying, but 20 msec or so
> > should be bearable.
> >
> > You seem to assume up front that TS (re)-parsing will take 1 sec, but
> > AFAIK there's no reason to assume such bad performance.
>
> This is for the initial parse, on a large file. No matter how fast the
> parser is, I can give you a file that takes one second to parse, and
> some user will have such a file (the work always expands to consume all
> the resources available).
That problem is already with us: if I visit xdisp.c in an unoptimized
build of Emacs 28, I wait almost 4 sec for the first window-full to be
displayed. (It's more like 0.5 sec in an optimized build of Emacs
27.2.) So the real question is how much using TS will _improve_ the
situation.
> I just got incremental parse working well enough to measure it; in the
> largest Ada file I have (10,000 lines from Eurocontrol):
>
> initial parse: 1.539319 seconds
> re-indent two lines: 0.038999 seconds
>
> 39 milliseconds for re-indent is just slow enough to be noticeable; I still
> have algorithms to convert to be as incremental as possible.
For comparison, how much does re-indentation of 2 lines take in Emacs
without a parser?
39 msec might be noticeable, but it isn't annoying; anything below 50
msec isn't. Try "C-x TAB" in Emacs on 10-line block of text, and you
get more than that. So if you consider that time a problem, it is
here already as well.
> The initial parse includes sending the full file text to the external
> process over a pipe.
So the above results are with wisi. We need timings with TS to see
the results that really matter for this discussion.
> I don't have a direct comparison of tree-sitter and wisi parsing the
> same file; I'll have to see if I can set that up.
Please do. Otherwise we are comparing apples with oranges. They are
all fruit, but still...
Thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 3:39 ` Óscar Fuentes
@ 2021-07-24 7:34 ` Eli Zaretskii
2021-07-25 16:49 ` Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 7:34 UTC (permalink / raw)
To: Óscar Fuentes; +Cc: emacs-devel
> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Sat, 24 Jul 2021 05:39:09 +0200
>
> So I'll insist on not obsessing too much about performance. Implement
> something simple, see if it is usable. If not, invest effort on
> optimizations until it is good enough.
Agreed.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 13:47 ` Yuan Fu
2021-07-22 14:11 ` Óscar Fuentes
2021-07-22 17:00 ` Eli Zaretskii
@ 2021-07-24 9:33 ` Stephen Leake
2021-07-24 22:54 ` Dmitry Gutov
2 siblings, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-24 9:33 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, emacs-devel, Clément Pit-Claudel,
Stefan Monnier
Yuan Fu <casouri@gmail.com> writes:
>>
>>> + // TODO BUF_ZV_BYTE?
>>
>> Do you want to discuss this? I'd prefer to have it the other way
>> around: use BUF_ZV_BYTE by default. The callers could widen the
>> buffer if they needed to access outside of the narrowing.
>
> Yes, I meant to discuss this. The problem with respecting narrowing is
> that, a user can freely narrow and widen arbitrarily, and Emacs needs
> to translate them into insertion & deletion of the buffer text for
> tree-sitter, every time a user narrows or widens the buffer.
I don't think that's the right thing to do. tree-sitter should always
have a tree that represents the entire buffer; if the user narrows,
edits will only affect the narrowed region, but tree-sitter won't notice
that, and won't care.
In particular, that means buffer positions reported by tree-sitter will
match emacs buffer positions.
> Plus, if tree-sitter respects narrowing, it could happen where a user
> narrows the buffer, the font-locking changes and is not correct
> anymore. Maybe that’s not the user want.
Exactly. The indent will be wrong, too, if narrowing excludes a
containing block.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 19:29 ` Óscar Fuentes
2021-07-23 5:21 ` Eli Zaretskii
@ 2021-07-24 9:38 ` Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-24 9:38 UTC (permalink / raw)
To: Óscar Fuentes; +Cc: emacs-devel
Óscar Fuentes <ofv@wanadoo.es> writes:
> Eli Zaretskii <eliz@gnu.org> writes:
>> (b) having a feature that fails before you
>> reach the max size of a buffer Emacs supports is a problem, because it
>> will cause hard-to-deal-with problems.
>
> We can put reasonable limits on when to use TS once we have some
> experience with it. What matters right now is if TS would be usable for
> the typical use case, and I guess the answer is positive. Also, it is
> not as if we had other options to consider.
wisi supports > 4G, not that I've actually tried it. And incremental
parse is now working well enough to benchmark, in my devel version.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-22 17:00 ` Eli Zaretskii
2021-07-22 17:47 ` Yuan Fu
2021-07-23 14:07 ` Stefan Monnier
@ 2021-07-24 9:42 ` Stephen Leake
2021-07-24 11:22 ` Eli Zaretskii
2 siblings, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-24 9:42 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, emacs-devel, cpitclaudel, monnier
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Yuan Fu <casouri@gmail.com>
>>
>> Yes, I meant to discuss this. The problem with respecting narrowing
>> is that, a user can freely narrow and widen arbitrarily, and Emacs
>> needs to translate them into insertion & deletion of the buffer text
>> for tree-sitter, every time a user narrows or widens the buffer.
>> Plus, if tree-sitter respects narrowing, it could happen where a
>> user narrows the buffer, the font-locking changes and is not correct
>> anymore. Maybe that’s not the user want. Also, if someone narrows
>> and widens often, maybe narrow to a function for better focus,
>> tree-sitter needs to constantly re-parse most of the buffer. These
>> are not significant disadvantages, but what do we get from
>> respecting narrowing that justifies code complexity and these small
>> annoyances?
>
> But that's how the current font-lock and indentation work: they never
> look beyond the narrowing limits.
And that's broken, unless the narrowing is for multi-major-mode.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 9:42 ` Stephen Leake
@ 2021-07-24 11:22 ` Eli Zaretskii
2021-07-25 18:21 ` Stephen Leake
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 11:22 UTC (permalink / raw)
To: Stephen Leake; +Cc: casouri, emacs-devel, cpitclaudel, monnier
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: Yuan Fu <casouri@gmail.com>, cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Sat, 24 Jul 2021 02:42:24 -0700
>
> > But that's how the current font-lock and indentation work: they never
> > look beyond the narrowing limits.
>
> And that's broken
??? Of course, it isn't: it's how Emacs has worked since v21.1.
> unless the narrowing is for multi-major-mode.
And what would you do in that case, if you allow TS to look beyond the
restriction?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-17 17:54 ` Eli Zaretskii
@ 2021-07-24 14:08 ` Stefan Monnier
2021-07-24 14:32 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-07-24 14:08 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, emacs-devel
>> If we copy the buffer's content to a freshly malloc area before passing
>> that to TS, then there should be no problem running TS in a separate
>> concurrent thread, indeed.
> Making a copy of the buffer is a non-starter from where I stand. It
> doesn't scale, for starters. I don't see any reason to go to such a
> complex design at this early stage.
I see absolutely no problem with scaling in making a copy: the extra
memory and CPU time taken by the copy will be a constant factor which
I don't expect to go much beyond 10%, which doesn't threaten scaling and
seems perfectly acceptable in return for being able to perform the
parse concurrently.
I'm not sure we'll want to do that, but I see no reason to consider it
a non-starter.
[ BTW, it's not clear to me if an update needs to be able to read the
whole buffer or if it only needs access to the "update
description". ]
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 6:02 ` Eli Zaretskii
@ 2021-07-24 14:19 ` Stefan Monnier
0 siblings, 0 replies; 370+ messages in thread
From: Stefan Monnier @ 2021-07-24 14:19 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, cpitclaudel, emacs-devel
>> But font-lock may look (and fontify) beyond the narrowing, and
>> when it calls `syntax-ppss` it will usually parse from 1 rather than
>> from `point-min`.
> Yes, and that's why I said that callers should call 'widen' if they
> need to do so.
> The question is what should TS reading do _by_default_.
Ah, then we're in violent agreement. The low-level interface with TS
should access the text within the narrowed region. And the code that
calls TS will usually want to widen beforehand.
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 14:08 ` Stefan Monnier
@ 2021-07-24 14:32 ` Eli Zaretskii
2021-07-24 15:10 ` Stefan Monnier
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 14:32 UTC (permalink / raw)
To: Stefan Monnier; +Cc: cpitclaudel, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: cpitclaudel@gmail.com, emacs-devel@gnu.org
> Date: Sat, 24 Jul 2021 10:08:58 -0400
>
> >> If we copy the buffer's content to a freshly malloc area before passing
> >> that to TS, then there should be no problem running TS in a separate
> >> concurrent thread, indeed.
> > Making a copy of the buffer is a non-starter from where I stand. It
> > doesn't scale, for starters. I don't see any reason to go to such a
> > complex design at this early stage.
>
> I see absolutely no problem with scaling in making a copy: the extra
> memory and CPU time taken by the copy will be a constant factor which
> I don't expect to go much beyond 10%
10% of what? It will be 100% of all the buffers that need parsing.
> I'm not sure we'll want to do that, but I see no reason to consider it
> a non-starter.
It's a bad start, okay?
Anyway, it looks like nothing like that will be necessary,
fortunately.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-23 20:22 ` Yuan Fu
2021-07-24 6:00 ` Eli Zaretskii
@ 2021-07-24 15:04 ` Yuan Fu
2021-07-24 15:48 ` Eli Zaretskii
2021-07-24 16:14 ` Eli Zaretskii
1 sibling, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-24 15:04 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Clément Pit-Claudel, Stefan Monnier, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]
I wrote a simple interface between font-lock and tree-sitter, and it works pretty well: using tree-sitter for fontification, xdisp.c opens a lot faster, and scrolling through the buffer is also perceivably faster. My simple interface works like this: tree-sitter allow you to “pattern match” nodes in the parse tree with a DSL, and assign names to the matched nodes, e.g., given a pattern, you get back a list of (NAME . MATCHED-NODE). And if we use font-lock faces as names for those nodes, we get back a list of (FACE . MATCHED-NODE) from tree-sitter, and Emacs can simply look at the beginning and end of the node, and apply FACE to that range. For flexibility, FACE can also be a function, in which case the function is called with the node. This interface is basically what emacs-tree-sitter does (I don’t know if they allow a capture name to be a function, though.)
I have an example major-mode for C that uses tree-sitter for font-locking at the end of tree-sitter.el.
Main functions to look at: tree-sitter-query-capture in tree_sitter.c, and tree-sitter-fontify-region-function in tree-sitter.el.
On the font-lock front, tree-sitter-fontify-region-function replaces font-lock-default-fontify-region, and tree-sitter-font-lock-settings replaces font-lock-defaults and font-lock-keywords. I should support font-lock-maximum-decoration but haven’t came up with a good way to do that. Maybe I should somehow reuse font-lock-defaults, and make it able to configure for tree-sitter font-locking? Apart from font-lock-maximum-decoration, what else should tree-sitter share with font-lock?
BTW, what is the best way to signal a lisp error from C? I tried xsignal2, signal_error, error and friends but they seem to crash Emacs. Maybe I wasn’t using them correctly.
IIUC if we want tree-sitter to use our malloc, we need to build it with Emacs, where should I put the source of tree-sitter?
What’s the different between make_string and make_pure_c_string? I’ve seen this “pure” thing else where, what does “pure” mean?
Yuan
[-- Attachment #2: ts.4.patch --]
[-- Type: application/octet-stream, Size: 36844 bytes --]
From d28e10e5905d244d92b71b74566c0bed80d5ed2b Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Sat, 24 Jul 2021 10:39:15 -0400
Subject: [PATCH] checkpoint 4
- Add font-locking
- Remove change-recording from replace_range_2, add to casify_region
---
lisp/emacs-lisp/cl-preloaded.el | 2 +
lisp/tree-sitter.el | 276 ++++++++++++++++++++++++++
src/casefiddle.c | 12 ++
src/insdel.c | 11 +-
src/tree_sitter.c | 332 +++++++++++++++++++++++++++++---
src/tree_sitter.h | 4 +-
test/src/tree-sitter-tests.el | 58 +++++-
7 files changed, 655 insertions(+), 40 deletions(-)
create mode 100644 lisp/tree-sitter.el
diff --git a/lisp/emacs-lisp/cl-preloaded.el b/lisp/emacs-lisp/cl-preloaded.el
index 7365e23186..2dccdff91a 100644
--- a/lisp/emacs-lisp/cl-preloaded.el
+++ b/lisp/emacs-lisp/cl-preloaded.el
@@ -68,6 +68,8 @@ cl--typeof-types
(font-spec atom) (font-entity atom) (font-object atom)
(vector array sequence atom)
(user-ptr atom)
+ (tree-sitter-parser atom)
+ (tree-sitter-node atom)
;; Plus, really hand made:
(null symbol list sequence atom))
"Alist of supertypes.
diff --git a/lisp/tree-sitter.el b/lisp/tree-sitter.el
new file mode 100644
index 0000000000..a6ecb09386
--- /dev/null
+++ b/lisp/tree-sitter.el
@@ -0,0 +1,276 @@
+;;; tree-sitter.el --- tree-sitter utilities -*- lexical-binding: t -*-
+
+;; Copyright (C) 2021 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;;; Code:
+
+;;; Node & parser accessors
+
+(defun tree-sitter-node-buffer (node)
+ "Return the buffer in where NODE belongs."
+ (tree-sitter-parser-buffer
+ (tree-sitter-node-parser node)))
+
+;;; Parser API supplement
+
+(defun tree-sitter-get-parser (name)
+ "Find the first parser with name NAME in `tree-sitter-parser-list'.
+Return nil if we can't find any."
+ (catch 'found
+ (dolist (parser tree-sitter-parser-list)
+ (when (equal name (tree-sitter-parser-name parser))
+ (throw 'found parser)))))
+
+(defun tree-sitter-get-parser-create (name language)
+ "Find the first parser with name NAME in `tree-sitter-parser-list'.
+If none exists, create one and return it. LANGUAGE is passed to
+`tree-sitter-create-parser' when creating the parser."
+ (or (tree-sitter-get-parser name)
+ (tree-sitter-create-parser (current-buffer) language name)))
+
+;;; Node API supplement
+
+(defun tree-sitter-node-beginning (node)
+ "Return the start position of NODE."
+ (byte-to-position (tree-sitter-node-start-byte node)))
+
+(defun tree-sitter-node-end (node)
+ "Return the end position of NODE."
+ (byte-to-position (tree-sitter-node-end-byte node)))
+
+(defun tree-sitter-node-in-range (beg end &optional parser-name named)
+ "Return the smallest node covering BEG to END.
+Find node in current buffer. Return nil if none find. If NAMED
+non-nil, only look for named node. NAMED defaults to nil. By
+default, use the first parser in `tree-sitter-parser-list'; but
+if PARSER-NAME is non-nil, it specifies the name of the parser that
+should be used."
+ (when-let ((root (tree-sitter-parser-root-node
+ (if parser-name
+ (tree-sitter-get-parser parser-name)
+ (car tree-sitter-parser-list)))))
+ (tree-sitter-node-descendant-for-byte-range
+ root (position-bytes beg) (position-bytes end) named)))
+
+(defun tree-sitter-filter-child (node pred &optional named)
+ "Return children of NODE that satisfies PRED.
+PRED is a function that takes one argument, the child node. If
+NAMED non-nil, only search named node. NAMED defaults to nil."
+ (let ((child (tree-sitter-node-child node 0 named))
+ result)
+ (while child
+ (when (funcall pred child)
+ (push child result))
+ (setq child (tree-sitter-node-next-sibling child named)))
+ result))
+
+(defun tree-sitter-node-content (node)
+ "Return the buffer content corresponding to NODE."
+ (with-current-buffer (tree-sitter-node-buffer node)
+ (buffer-substring-no-properties
+ (tree-sitter-node-beginning node)
+ (tree-sitter-node-end node))))
+
+;;; Font-lock
+
+(defvar-local tree-sitter-font-lock-settings nil
+ "A list of settings for tree-sitter-based font-locking.
+
+Each setting controls one parser (often of different language).
+A settings is a list of form (NAME LANGUAGE PATTERN). NAME is
+the name given to the parser, by convention it is
+\"font-lock-<language>\", where <language> is the language that
+the parser uses. LANGUAGE is the language object returned by
+tree-sitter language dynamic modules.
+
+PATTERN is a tree-sitter query pattern. (See manual for how to
+write query patterns.) This pattern should capture nodes with
+either face names or function names. If captured with a face
+name, the node's corresponding text in the buffer is fontified
+with that face; if captured with a function name, the function is
+called with three arguments, BEG END NODE, where BEG and END
+marks the span of the corresponding text, and NODE is the node
+itself.")
+
+(defun tree-sitter-fontify-region-function (beg end &optional verbose)
+ "Fontify the region between BEG and END.
+If VERBOSE is non-nil, print status messages.
+\(See `font-lock-fontify-region-function'.)"
+ (dolist (elm tree-sitter-font-lock-settings)
+ (let ((parser-name (car elm))
+ (language (nth 1 elm))
+ (match-pattern (nth 2 elm)))
+ (tree-sitter-get-parser-create parser-name language)
+ (when-let ((node (tree-sitter-node-in-range beg end parser-name)))
+ (let ((captures (tree-sitter-query-capture
+ node match-pattern
+ ;; specifying the range is important. More
+ ;; often than not, NODE will be the root
+ ;; node, and if we don't specify the range,
+ ;; we are basically querying the whole file.
+ (position-bytes beg) (position-bytes end))))
+ (with-silent-modifications
+ (while captures
+ (let* ((face (caar captures))
+ (node (cdar captures))
+ (beg (tree-sitter-node-beginning node))
+ (end (tree-sitter-node-end node)))
+ (cond ((facep face)
+ (put-text-property beg end 'face face))
+ ((functionp face)
+ (funcall face beg end node)))
+
+ (if verbose
+ (message "Fontifying text from %d to %d with %s"
+ beg end face)))
+ (setq captures (cdr captures))))
+ `(jit-lock-bounds ,(tree-sitter-node-beginning node)
+ . ,(tree-sitter-node-end node)))))))
+
+
+(define-derived-mode json-mode js-mode "JSON"
+ "Major mode for JSON documents."
+ (setq-local font-lock-fontify-region-function
+ #'tree-sitter-fontify-region-function)
+ (setq-local tree-sitter-font-lock-settings
+ `(("font-lock-json"
+ ,(tree-sitter-json)
+ "(string) @font-lock-string-face
+(true) @font-lock-constant-face
+(false) @font-lock-constant-face
+(null) @font-lock-constant-face"))))
+
+(defun ts-c-fontify-system-lib (beg end _)
+ (put-text-property beg (1+ beg) 'face 'font-lock-preprocessor-face)
+ (put-text-property (1- end) end 'face 'font-lock-preprocessor-face)
+ (put-text-property (1+ beg) (1- end)
+ 'face 'font-lock-string-face))
+
+(define-derived-mode ts-c-mode prog-mode "TS C"
+ "C mode with tree-sitter support."
+ (setq-local font-lock-fontify-region-function
+ #'tree-sitter-fontify-region-function)
+ (setq-local tree-sitter-font-lock-settings
+ `(("font-lock-c"
+ ,(tree-sitter-c)
+ "(null) @font-lock-constant-face
+(true) @font-lock-constant-face
+(false) @font-lock-constant-face
+
+(comment) @font-lock-comment-face
+
+(system_lib_string) @ts-c-fontify-system-lib
+
+(unary_expression
+ operator: _ @font-lock-negation-char-face)
+
+(string_literal) @font-lock-string-face
+(char_literal) @font-lock-string-face
+
+
+
+(function_definition
+ declarator: (identifier) @font-lock-function-name-face)
+
+(declaration
+ declarator: (identifier) @font-lock-function-name-face)
+
+(function_declarator
+ declarator: (identifier) @font-lock-function-name-face)
+
+
+
+(init_declarator
+ declarator: (identifier) @font-lock-variable-name-face)
+
+(parameter_declaration
+ declarator: (identifier) @font-lock-variable-name-face)
+
+(preproc_def
+ name: (identifier) @font-lock-variable-name-face)
+
+(enumerator
+ name: (identifier) @font-lock-variable-name-face)
+
+(field_identifier) @font-lock-variable-name-face
+
+(parameter_list
+ (parameter_declaration
+ (identifier) @font-lock-variable-name-face))
+
+(pointer_declarator
+ declarator: (identifier) @font-lock-variable-name-face)
+
+(array_declarator
+ declarator: (identifier) @font-lock-variable-name-face)
+
+(preproc_function_def
+ name: (identifier) @font-lock-variable-name-face
+ parameters: (preproc_params
+ (identifier) @font-lock-variable-name-face))
+
+
+
+(type_identifier) @font-lock-type-face
+(primitive_type) @font-lock-type-face
+
+\"auto\" @font-lock-keyword-face
+\"break\" @font-lock-keyword-face
+\"case\" @font-lock-keyword-face
+\"const\" @font-lock-keyword-face
+\"continue\" @font-lock-keyword-face
+\"default\" @font-lock-keyword-face
+\"do\" @font-lock-keyword-face
+\"else\" @font-lock-keyword-face
+\"enum\" @font-lock-keyword-face
+\"extern\" @font-lock-keyword-face
+\"for\" @font-lock-keyword-face
+\"goto\" @font-lock-keyword-face
+\"if\" @font-lock-keyword-face
+\"register\" @font-lock-keyword-face
+\"return\" @font-lock-keyword-face
+\"sizeof\" @font-lock-keyword-face
+\"static\" @font-lock-keyword-face
+\"struct\" @font-lock-keyword-face
+\"switch\" @font-lock-keyword-face
+\"typedef\" @font-lock-keyword-face
+\"union\" @font-lock-keyword-face
+\"volatile\" @font-lock-keyword-face
+\"while\" @font-lock-keyword-face
+
+\"long\" @font-lock-type-face
+\"short\" @font-lock-type-face
+\"signed\" @font-lock-type-face
+\"unsigned\" @font-lock-type-face
+
+\"#include\" @font-lock-preprocessor-face
+\"#define\" @font-lock-preprocessor-face
+\"#ifdef\" @font-lock-preprocessor-face
+\"#ifndef\" @font-lock-preprocessor-face
+\"#endif\" @font-lock-preprocessor-face
+\"#else\" @font-lock-preprocessor-face
+\"#elif\" @font-lock-preprocessor-face"))))
+
+(add-to-list 'auto-mode-alist '("\\.json\\'" . json-mode))
+(add-to-list 'auto-mode-alist '("\\.tsc\\'" . ts-c-mode))
+
+(provide 'tree-sitter)
+
+;;; tree-sitter.el ends here
diff --git a/src/casefiddle.c b/src/casefiddle.c
index a7a2541490..42cd2fdd28 100644
--- a/src/casefiddle.c
+++ b/src/casefiddle.c
@@ -30,6 +30,10 @@ Copyright (C) 1985, 1994, 1997-1999, 2001-2021 Free Software Foundation,
#include "composite.h"
#include "keymap.h"
+#ifdef HAVE_TREE_SITTER
+#include "tree_sitter.h"
+#endif
+
enum case_action {CASE_UP, CASE_DOWN, CASE_CAPITALIZE, CASE_CAPITALIZE_UP};
/* State for casing individual characters. */
@@ -495,6 +499,11 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
modify_text (start, end);
prepare_casing_context (&ctx, flag, true);
+#ifdef HAVE_TREE_SITTER
+ ptrdiff_t start_byte = CHAR_TO_BYTE (start);
+ ptrdiff_t old_end_byte = CHAR_TO_BYTE (end);
+#endif
+
ptrdiff_t orig_end = end;
record_delete (start, make_buffer_string (start, end, true), false);
if (NILP (BVAR (current_buffer, enable_multibyte_characters)))
@@ -513,6 +522,9 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
{
signal_after_change (start, end - start - added, end - start);
update_compositions (start, end, CHECK_ALL);
+#ifdef HAVE_TREE_SITTER
+ ts_record_change (start_byte, old_end_byte, CHAR_TO_BYTE (end));
+#endif
}
return orig_end + added;
diff --git a/src/insdel.c b/src/insdel.c
index b313c50cda..3dfc281b49 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -1592,7 +1592,11 @@ replace_range (ptrdiff_t from, ptrdiff_t to, Lisp_Object new,
If MARKERS, relocate markers.
Unlike most functions at this level, never call
- prepare_to_modify_buffer and never call signal_after_change. */
+ prepare_to_modify_buffer and never call signal_after_change.
+ Because this function is called in a loop, one character at a time.
+ The caller of 'replace_range_2' calls these hooks for the entire
+ region once. Apart from signal_after_change, any caller of this
+ function should also call ts_record_change. */
void
replace_range_2 (ptrdiff_t from, ptrdiff_t from_byte,
@@ -1705,11 +1709,6 @@ replace_range_2 (ptrdiff_t from, ptrdiff_t from_byte,
modiff_incr (&MODIFF);
CHARS_MODIFF = MODIFF;
-
-#ifdef HAVE_TREE_SITTER
- ts_record_change (from_byte, to_byte, from_byte + insbytes);
-#endif
-
}
\f
/* Delete characters in current buffer
diff --git a/src/tree_sitter.c b/src/tree_sitter.c
index a6a8912c84..e9f8ddc7e3 100644
--- a/src/tree_sitter.c
+++ b/src/tree_sitter.c
@@ -35,6 +35,8 @@ Copyright (C) 2021 Free Software Foundation, Inc.
/* parser.h defines a macro ADVANCE that conflicts with alloc.c. */
#include <tree_sitter/parser.h>
+/*** Functions related to parser and node object. */
+
DEFUN ("tree-sitter-parser-p",
Ftree_sitter_parser_p, Stree_sitter_parser_p, 1, 1, 0,
doc: /* Return t if OBJECT is a tree-sitter parser. */)
@@ -57,6 +59,8 @@ DEFUN ("tree-sitter-node-p",
return Qnil;
}
+/*** Parsing functions */
+
/* Update each parser's tree after the user made an edit. This
function does not parse the buffer and only updates the tree. (So it
should be very fast.) */
@@ -77,7 +81,6 @@ ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
XTS_PARSER (lisp_parser)->need_reparse = true;
parser_list = Fcdr (parser_list);
}
-
}
/* Parse the buffer. We don't parse until we have to. When we have
@@ -91,9 +94,19 @@ ts_ensure_parsed (Lisp_Object parser)
TSTree *tree = XTS_PARSER(parser)->tree;
TSInput input = XTS_PARSER (parser)->input;
TSTree *new_tree = ts_parser_parse(ts_parser, tree, input);
+ /* This should be very rare: it only happens when 1) language is not
+ set (impossible in Emacs because the user has to supply a
+ language to create a parser), 2) parse canceled due to timeout
+ (impossible because we don't set a timeout), 3) parse canceled
+ due to cancellation flag (impossible because we don't set the
+ flag). (See comments for ts_parser_parse in
+ tree_sitter/api.h.) */
+ if (new_tree == NULL)
+ signal_error ("Parse failed", parser);
ts_tree_delete (tree);
XTS_PARSER (parser)->tree = new_tree;
XTS_PARSER (parser)->need_reparse = false;
+ TSNode node = ts_tree_root_node (new_tree);
}
/* This is the read function provided to tree-sitter to read from a
@@ -103,9 +116,6 @@ ts_ensure_parsed (Lisp_Object parser)
ts_read_buffer (void *buffer, uint32_t byte_index,
TSPoint position, uint32_t *bytes_read)
{
- if (!BUFFER_LIVE_P ((struct buffer *) buffer))
- error ("BUFFER is not live");
-
ptrdiff_t byte_pos = byte_index + 1;
/* Read one character. Tree-sitter wants us to set bytes_read to 0
@@ -114,8 +124,17 @@ ts_read_buffer (void *buffer, uint32_t byte_index,
string. */
char *beg;
int len;
+ /* This function could run from a user command, so it is better to
+ do nothing instead of raising an error. (It was a pain in the a**
+ to read mega-if-conditions in Emacs source, so I write the two
+ branches separately, hoping the compiler can merge them.) */
+ if (!BUFFER_LIVE_P ((struct buffer *) buffer))
+ {
+ beg = "";
+ len = 0;
+ }
// TODO BUF_ZV_BYTE?
- if (byte_pos >= BUF_Z_BYTE ((struct buffer *) buffer))
+ else if (byte_pos >= BUF_Z_BYTE ((struct buffer *) buffer))
{
beg = "";
len = 0;
@@ -123,19 +142,23 @@ ts_read_buffer (void *buffer, uint32_t byte_index,
else
{
beg = (char *) BUF_BYTE_ADDRESS (buffer, byte_pos);
- len = next_char_len(byte_pos);
+ len = BYTES_BY_CHAR_HEAD ((int) beg);
}
*bytes_read = (uint32_t) len;
return beg;
}
+/*** Creators and accessors for parser and node */
+
/* Wrap the parser in a Lisp_Object to be used in the Lisp machine. */
Lisp_Object
-make_ts_parser (struct buffer *buffer, TSParser *parser, TSTree *tree)
+make_ts_parser (struct buffer *buffer, TSParser *parser,
+ TSTree *tree, Lisp_Object name)
{
struct Lisp_TS_Parser *lisp_parser
- = ALLOCATE_PLAIN_PSEUDOVECTOR (struct Lisp_TS_Parser, PVEC_TS_PARSER);
+ = ALLOCATE_PSEUDOVECTOR (struct Lisp_TS_Parser, name, PVEC_TS_PARSER);
+ lisp_parser->name = name;
lisp_parser->buffer = buffer;
lisp_parser->parser = parser;
lisp_parser->tree = tree;
@@ -156,17 +179,35 @@ make_ts_node (Lisp_Object parser, TSNode node)
return make_lisp_ptr (lisp_node, Lisp_Vectorlike);
}
+DEFUN ("tree-sitter-node-parser",
+ Ftree_sitter_node_parser, Stree_sitter_node_parser,
+ 1, 1, 0,
+ doc: /* Return the parser to which NODE belongs. */)
+ (Lisp_Object node)
+{
+ CHECK_TS_NODE (node);
+ return XTS_NODE (node)->parser;
+}
DEFUN ("tree-sitter-create-parser",
Ftree_sitter_create_parser, Stree_sitter_create_parser,
- 2, 2, 0,
+ 2, 3, 0,
doc: /* Create and return a parser in BUFFER for LANGUAGE.
+
The parser is automatically added to BUFFER's
`tree-sitter-parser-list'. LANGUAGE should be the language provided
-by a tree-sitter language dynamic module. */)
- (Lisp_Object buffer, Lisp_Object language)
+by a tree-sitter language dynamic module.
+
+NAME (a string) is the name assigned to the parser, like the name for
+a process. Unlike process names, not care is taken to make each
+parser's name unique. By default, no name is assigned to the parser;
+the only consequence of that is you can't use
+`tree-sitter-get-parser' to find the parser by its name. */)
+ (Lisp_Object buffer, Lisp_Object language, Lisp_Object name)
{
CHECK_BUFFER(buffer);
+ if (!NILP (name))
+ CHECK_STRING (name);
/* LANGUAGE is a USER_PTR that contains the pointer to a TSLanguage
struct. */
@@ -175,9 +216,8 @@ DEFUN ("tree-sitter-create-parser",
ts_parser_set_language (parser, lang);
Lisp_Object lisp_parser
- = make_ts_parser (XBUFFER(buffer), parser, NULL);
+ = make_ts_parser (XBUFFER(buffer), parser, NULL, name);
- // FIXME: Is this the correct way to set a buffer-local variable?
struct buffer *old_buffer = current_buffer;
set_buffer_internal (XBUFFER (buffer));
@@ -188,6 +228,30 @@ DEFUN ("tree-sitter-create-parser",
return lisp_parser;
}
+DEFUN ("tree-sitter-parser-buffer",
+ Ftree_sitter_parser_buffer, Stree_sitter_parser_buffer,
+ 1, 1, 0,
+ doc: /* Return the buffer of PARSER. */)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ Lisp_Object buf;
+ XSETBUFFER (buf, XTS_PARSER (parser)->buffer);
+ return buf;
+}
+
+DEFUN ("tree-sitter-parser-name",
+ Ftree_sitter_parser_name, Stree_sitter_parser_name,
+ 1, 1, 0,
+ doc: /* Return parser's name. */)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ return XTS_PARSER (parser)->name;
+}
+
+/*** Parser API */
+
DEFUN ("tree-sitter-parser-root-node",
Ftree_sitter_parser_root_node, Stree_sitter_parser_root_node,
1, 1, 0,
@@ -200,7 +264,8 @@ DEFUN ("tree-sitter-parser-root-node",
return make_ts_node (parser, root_node);
}
-DEFUN ("tree-sitter-parse", Ftree_sitter_parse, Stree_sitter_parse,
+DEFUN ("tree-sitter-parse-string",
+ Ftree_sitter_parse_string, Stree_sitter_parse_string,
2, 2, 0,
doc: /* Parse STRING and return the root node.
LANGUAGE should be the language provided by a tree-sitter language
@@ -219,23 +284,20 @@ DEFUN ("tree-sitter-parse", Ftree_sitter_parse, Stree_sitter_parse,
SSDATA (string),
strlen (SSDATA (string)));
- /* See comment for ts_parser_parse in tree_sitter/api.h
- for possible reasons for a failure. */
+ /* See comment in ts_ensure_parsed for possible reasons for a
+ failure. */
if (tree == NULL)
signal_error ("Failed to parse STRING", string);
TSNode root_node = ts_tree_root_node (tree);
- Lisp_Object lisp_parser = make_ts_parser (NULL, parser, tree);
+ Lisp_Object lisp_parser = make_ts_parser (NULL, parser, tree, Qnil);
Lisp_Object lisp_node = make_ts_node (lisp_parser, root_node);
return lisp_node;
}
-/* Below this point are uninteresting mechanical translations of
- tree-sitter API. */
-
-/* Node functions. */
+/*** Node API */
DEFUN ("tree-sitter-node-type",
Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
@@ -245,9 +307,31 @@ DEFUN ("tree-sitter-node-type",
CHECK_TS_NODE (node);
TSNode ts_node = XTS_NODE (node)->node;
const char *type = ts_node_type(ts_node);
+ // TODO: Maybe return a string instead.
return intern_c_string (type);
}
+DEFUN ("tree-sitter-node-start-byte",
+ Ftree_sitter_node_start_byte, Stree_sitter_node_start_byte, 1, 1, 0,
+ doc: /* Return the NODE's start byte position. */)
+ (Lisp_Object node)
+{
+ CHECK_TS_NODE (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ uint32_t start_byte = ts_node_start_byte(ts_node);
+ return make_fixnum(start_byte + 1);
+}
+
+DEFUN ("tree-sitter-node-end-byte",
+ Ftree_sitter_node_end_byte, Stree_sitter_node_end_byte, 1, 1, 0,
+ doc: /* Return the NODE's end byte position. */)
+ (Lisp_Object node)
+{
+ CHECK_TS_NODE (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ uint32_t end_byte = ts_node_end_byte(ts_node);
+ return make_fixnum(end_byte + 1);
+}
DEFUN ("tree-sitter-node-string",
Ftree_sitter_node_string, Stree_sitter_node_string, 1, 1, 0,
@@ -303,9 +387,9 @@ DEFUN ("tree-sitter-node-check",
Ftree_sitter_node_check, Stree_sitter_node_check, 2, 2, 0,
doc: /* Return non-nil if NODE is in condition COND, nil otherwise.
-COND could be 'named, 'missing, 'extra, 'has-error. Named nodes
-correspond to named rules in the grammar, whereas "anonymous" nodes
-correspond to string literals in the grammar.
+COND could be 'named, 'missing, 'extra, 'has-changes, 'has-error.
+Named nodes correspond to named rules in the grammar, whereas
+"anonymous" nodes correspond to string literals in the grammar.
Missing nodes are inserted by the parser in order to recover from
certain kinds of syntax errors, i.e., should be there but not there.
@@ -313,6 +397,9 @@ DEFUN ("tree-sitter-node-check",
Extra nodes represent things like comments, which are not required the
grammar, but can appear anywhere.
+A node "has changes" if the buffer changed since the node is
+created. (Don't forget the "s" at the end of 'has-changes.)
+
A node "has error" if itself is a syntax error or contains any syntax
errors. */)
(Lisp_Object node, Lisp_Object cond)
@@ -329,7 +416,10 @@ DEFUN ("tree-sitter-node-check",
result = ts_node_is_extra (ts_node);
else if (EQ (cond, Qhas_error))
result = ts_node_has_error (ts_node);
+ else if (EQ (cond, Qhas_changes))
+ result = ts_node_has_changes (ts_node);
else
+ // TODO: Is this a good error message?
signal_error ("Expecting one of four symbols, see docstring", cond);
return result ? Qt : Qnil;
}
@@ -432,8 +522,177 @@ DEFUN ("tree-sitter-node-prev-sibling",
return make_ts_node(XTS_NODE (node)->parser, sibling);
}
+DEFUN ("tree-sitter-node-first-child-for-byte",
+ Ftree_sitter_node_first_child_for_byte,
+ Stree_sitter_node_first_child_for_byte, 2, 3, 0,
+ doc: /* Return the first child of NODE on POS.
+Specifically, return the first child that extends beyond POS. POS is
+a byte position in the buffer counting from 1. Return nil if there
+isn't any. If NAMED is non-nil, look for named child only. NAMED
+defaults to nil. Note that this function returns an immediate child,
+not the smallest (grand)child. */)
+ (Lisp_Object node, Lisp_Object pos, Lisp_Object named)
+{
+ CHECK_INTEGER (pos);
+
+ struct buffer *buf = (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ ptrdiff_t byte_pos = XFIXNUM (pos);
+
+ if (byte_pos < BUF_BEGV_BYTE (buf) || byte_pos > BUF_ZV_BYTE (buf))
+ xsignal1 (Qargs_out_of_range, pos);
+
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child;
+ if (NILP (named))
+ child = ts_node_first_child_for_byte (ts_node, byte_pos - 1);
+ else
+ child = ts_node_first_named_child_for_byte (ts_node, byte_pos - 1);
+
+ if (ts_node_is_null(child))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, child);
+}
+
+DEFUN ("tree-sitter-node-descendant-for-byte-range",
+ Ftree_sitter_node_descendant_for_byte_range,
+ Stree_sitter_node_descendant_for_byte_range, 3, 4, 0,
+ doc: /* Return the smallest node that covers BEG to END.
+The returned node is a descendant of NODE. POS is a byte position
+counting from 1. Return nil if there isn't any. If NAMED is non-nil,
+look for named child only. NAMED defaults to nil. */)
+ (Lisp_Object node, Lisp_Object beg, Lisp_Object end, Lisp_Object named)
+{
+ CHECK_INTEGER (beg);
+ CHECK_INTEGER (end);
+
+ struct buffer *buf = (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ ptrdiff_t byte_beg = XFIXNUM (beg);
+ ptrdiff_t byte_end = XFIXNUM (end);
+
+ /* Checks for BUFFER_BEG <= BEG <= END <= BUFFER_END. */
+ if (!(BUF_BEGV_BYTE (buf) <= byte_beg
+ && byte_beg <= byte_end
+ && byte_end <= BUF_ZV_BYTE (buf)))
+ xsignal2 (Qargs_out_of_range, beg, end);
+
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child;
+ if (NILP (named))
+ child = ts_node_descendant_for_byte_range
+ (ts_node, byte_beg - 1 , byte_end - 1);
+ else
+ child = ts_node_named_descendant_for_byte_range
+ (ts_node, byte_beg - 1, byte_end - 1);
+
+ if (ts_node_is_null(child))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, child);
+}
+
/* Query functions */
+Lisp_Object ts_query_error_to_string (TSQueryError error)
+{
+ char *error_name;
+ switch (error)
+ {
+ case TSQueryErrorNone:
+ error_name = "none";
+ break;
+ case TSQueryErrorSyntax:
+ error_name = "syntax";
+ break;
+ case TSQueryErrorNodeType:
+ error_name = "node type";
+ break;
+ case TSQueryErrorField:
+ error_name = "field";
+ break;
+ case TSQueryErrorCapture:
+ error_name = "capture";
+ break;
+ case TSQueryErrorStructure:
+ error_name = "structure";
+ break;
+ }
+ return make_pure_c_string (error_name, strlen(error_name));
+}
+
+DEFUN ("tree-sitter-query-capture",
+ Ftree_sitter_query_capture,
+ Stree_sitter_query_capture, 2, 4, 0,
+ doc: /* Query NODE with PATTERN.
+
+Returns a list of (CAPTURE_NAME . NODE). CAPTURE_NAME is the name
+assigned to the node in PATTERN. NODE is the captured node.
+
+PATTERN is a string containing one or more matching patterns. See
+manual for further explanation for how to write a match pattern.
+
+BEG and END, if _both_ non-nil, specifies the range in which the query
+is executed.
+
+Return nil if the query failed. */)
+ (Lisp_Object node, Lisp_Object pattern,
+ Lisp_Object beg, Lisp_Object end)
+{
+ CHECK_TS_NODE (node);
+ CHECK_STRING (pattern);
+
+ TSNode ts_node = XTS_NODE (node)->node;
+ Lisp_Object lisp_parser = XTS_NODE (node)->parser;
+ const TSLanguage *lang = ts_parser_language
+ (XTS_PARSER (lisp_parser)->parser);
+ char *source = SSDATA (pattern);
+
+ uint32_t error_offset;
+ uint32_t error_type;
+ TSQuery *query = ts_query_new (lang, source, strlen (source),
+ &error_offset, &error_type);
+ TSQueryCursor *cursor = ts_query_cursor_new ();
+
+ if (query == NULL)
+ {
+ // FIXME: Signal an error?
+ return Qnil;
+ }
+ if (!NILP (beg) && !NILP (end))
+ {
+ EMACS_INT beg_byte = XFIXNUM (beg);
+ EMACS_INT end_byte = XFIXNUM (end);
+ ts_query_cursor_set_byte_range
+ (cursor, (uint32_t) beg_byte - 1, (uint32_t) end_byte - 1);
+ }
+
+ ts_query_cursor_exec (cursor, query, ts_node);
+ TSQueryMatch match;
+ TSQueryCapture capture;
+ Lisp_Object result = Qnil;
+ Lisp_Object entry;
+ Lisp_Object captured_node;
+ const char *capture_name;
+ uint32_t capture_name_len;
+ while (ts_query_cursor_next_match (cursor, &match))
+ {
+ const TSQueryCapture *captures = match.captures;
+ for (int idx=0; idx < match.capture_count; idx++)
+ {
+ capture = captures[idx];
+ captured_node = make_ts_node(lisp_parser, capture.node);
+ capture_name = ts_query_capture_name_for_id
+ (query, capture.index, &capture_name_len);
+ entry = Fcons (intern_c_string (capture_name),
+ captured_node);
+ result = Fcons (entry, result);
+ }
+ }
+ ts_query_delete (query);
+ ts_query_cursor_delete (cursor);
+ return Freverse (result);
+}
+
/* Initialize the tree-sitter routines. */
void
syms_of_tree_sitter (void)
@@ -443,11 +702,18 @@ syms_of_tree_sitter (void)
DEFSYM (Qnamed, "named");
DEFSYM (Qmissing, "missing");
DEFSYM (Qextra, "extra");
+ DEFSYM (Qhas_changes, "has-changes");
DEFSYM (Qhas_error, "has-error");
+ DEFSYM (Qtree_sitter_query_error, "tree-sitter-query-error");
+ Fput (Qtree_sitter_query_error, Qerror_conditions,
+ pure_list (Qtree_sitter_query_error, Qerror));
+ Fput (Qtree_sitter_query_error, Qerror_message,
+ build_pure_c_string ("Error with query pattern"))
+
DEFSYM (Qtree_sitter_parser_list, "tree-sitter-parser-list");
- DEFVAR_LISP ("ts-parser-list", Vtree_sitter_parser_list,
- doc: /* A list of tree-sitter parsers.
+ DEFVAR_LISP ("tree-sitter-parser-list", Vtree_sitter_parser_list,
+ doc: /* A list of tree-sitter parsers.
// TODO: more doc.
If you removed a parser from this list, do not put it back in. */);
Vtree_sitter_parser_list = Qnil;
@@ -455,11 +721,19 @@ syms_of_tree_sitter (void)
defsubr (&Stree_sitter_parser_p);
defsubr (&Stree_sitter_node_p);
+
+ defsubr (&Stree_sitter_node_parser);
+
defsubr (&Stree_sitter_create_parser);
+ defsubr (&Stree_sitter_parser_buffer);
+ defsubr (&Stree_sitter_parser_name);
+
defsubr (&Stree_sitter_parser_root_node);
- defsubr (&Stree_sitter_parse);
+ defsubr (&Stree_sitter_parse_string);
defsubr (&Stree_sitter_node_type);
+ defsubr (&Stree_sitter_node_start_byte);
+ defsubr (&Stree_sitter_node_end_byte);
defsubr (&Stree_sitter_node_string);
defsubr (&Stree_sitter_node_parent);
defsubr (&Stree_sitter_node_child);
@@ -469,4 +743,8 @@ syms_of_tree_sitter (void)
defsubr (&Stree_sitter_node_child_by_field_name);
defsubr (&Stree_sitter_node_next_sibling);
defsubr (&Stree_sitter_node_prev_sibling);
+ defsubr (&Stree_sitter_node_first_child_for_byte);
+ defsubr (&Stree_sitter_node_descendant_for_byte_range);
+
+ defsubr (&Stree_sitter_query_capture);
}
diff --git a/src/tree_sitter.h b/src/tree_sitter.h
index a7e2a2d670..e9b4a71326 100644
--- a/src/tree_sitter.h
+++ b/src/tree_sitter.h
@@ -33,6 +33,7 @@ #define EMACS_TREE_SITTER_H
struct Lisp_TS_Parser
{
union vectorlike_header header;
+ Lisp_Object name;
struct buffer *buffer;
TSParser *parser;
TSTree *tree;
@@ -95,7 +96,8 @@ ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
ptrdiff_t new_end_byte);
Lisp_Object
-make_ts_parser (struct buffer *buffer, TSParser *parser, TSTree *tree);
+make_ts_parser (struct buffer *buffer, TSParser *parser,
+ TSTree *tree, Lisp_Object name);
Lisp_Object
make_ts_node (Lisp_Object parser, TSNode node);
diff --git a/test/src/tree-sitter-tests.el b/test/src/tree-sitter-tests.el
index cb1c464d3a..c61ad678d2 100644
--- a/test/src/tree-sitter-tests.el
+++ b/test/src/tree-sitter-tests.el
@@ -21,6 +21,7 @@
(require 'ert)
(require 'tree-sitter-json)
+(require 'tree-sitter)
(ert-deftest tree-sitter-basic-parsing ()
"Test basic parsing routines."
@@ -52,12 +53,13 @@ tree-sitter-basic-parsing
(ert-deftest tree-sitter-node-api ()
"Tests for node API."
(with-temp-buffer
- (insert "[1,2,{\"name\": \"Bob\"},3]")
(let (parser root-node doc-node object-node pair-node)
- (setq parser (tree-sitter-create-parser
- (current-buffer) (tree-sitter-json)))
- (setq root-node (tree-sitter-parser-root-node
- parser))
+ (progn
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (setq parser (tree-sitter-create-parser
+ (current-buffer) (tree-sitter-json)))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
;; `tree-sitter-node-type'.
(should (eq 'document (tree-sitter-node-type root-node)))
;; `tree-sitter-node-check'.
@@ -100,7 +102,51 @@ tree-sitter-node-api
(should (equal "(\",\")"
(tree-sitter-node-string
(tree-sitter-node-prev-sibling object-node))))
- )))
+ ;; `tree-sitter-node-first-child-for-byte'.
+ (should (equal "(number)"
+ (tree-sitter-node-string
+ (tree-sitter-node-first-child-for-byte
+ doc-node 3 t))))
+ (should (equal "(\",\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-first-child-for-byte
+ doc-node 3))))
+ ;; `tree-sitter-node-descendant-for-byte-range'.
+ (should (equal "(\"{\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-descendant-for-byte-range
+ root-node 6 7))))
+ (should (equal "(object (pair key: (string (string_content)) value: (string (string_content))))"
+ (tree-sitter-node-string
+ (tree-sitter-node-descendant-for-byte-range
+ root-node 6 7 t)))))))
+
+(ert-deftest tree-sitter-query-api ()
+ "Tests for query API."
+ (with-temp-buffer
+ (let (parser root-node pattern doc-node object-node pair-node)
+ (progn
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (setq parser (tree-sitter-create-parser
+ (current-buffer) (tree-sitter-json)))
+ (setq root-node (tree-sitter-parser-root-node
+ parser))
+ (setq pattern "(string) @string
+(pair key: (_) @keyword)
+(number) @number"))
+
+ (should
+ (equal
+ '((number . "1") (number . "2")
+ (keyword . "\"name\"")
+ (string . "\"name\"")
+ (string . "\"Bob\"")
+ (number . "3"))
+ (mapcar (lambda (entry)
+ (cons (car entry)
+ (tree-sitter-node-content
+ (cdr entry))))
+ (tree-sitter-query-capture root-node pattern)))))))
(provide 'tree-sitter-tests)
;;; tree-sitter-tests.el ends here
--
2.24.3 (Apple Git-128)
^ permalink raw reply related [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 14:32 ` Eli Zaretskii
@ 2021-07-24 15:10 ` Stefan Monnier
2021-07-24 15:51 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-07-24 15:10 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, emacs-devel
>> I see absolutely no problem with scaling in making a copy: the extra
>> memory and CPU time taken by the copy will be a constant factor which
>> I don't expect to go much beyond 10%
> 10% of what? It will be 100% of all the buffers that need parsing.
10% of the memory used by that buffer, since TS's data structure eats up
about 10x the size of the buffer's text.
Given the memory needs of TS we may decide to have
a `tree-sitter-maximum-size` config to disable TS on overly large
buffers (just like font-lock has such a setting, since when used
without jit-lock, font-lock also can easily end up using more memory
than the buffer's text).
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 15:04 ` Yuan Fu
@ 2021-07-24 15:48 ` Eli Zaretskii
2021-07-24 17:14 ` Yuan Fu
2021-07-26 14:38 ` Perry E. Metzger
2021-07-24 16:14 ` Eli Zaretskii
1 sibling, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 15:48 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 24 Jul 2021 11:04:35 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
>
> I wrote a simple interface between font-lock and tree-sitter, and it works pretty well: using tree-sitter for fontification, xdisp.c opens a lot faster, and scrolling through the buffer is also perceivably faster. My simple interface works like this: tree-sitter allow you to “pattern match” nodes in the parse tree with a DSL, and assign names to the matched nodes, e.g., given a pattern, you get back a list of (NAME . MATCHED-NODE). And if we use font-lock faces as names for those nodes, we get back a list of (FACE . MATCHED-NODE) from tree-sitter, and Emacs can simply look at the beginning and end of the node, and apply FACE to that range. For flexibility, FACE can also be a function, in which case the function is called with the node. This interface is basically what emacs-tree-sitt
er does (I don’t know if they allow a capture name to be a function, though.)
>
> I have an example major-mode for C that uses tree-sitter for font-locking at the end of tree-sitter.el.
>
> Main functions to look at: tree-sitter-query-capture in tree_sitter.c, and tree-sitter-fontify-region-function in tree-sitter.el.
Thanks!
> BTW, what is the best way to signal a lisp error from C? I tried xsignal2, signal_error, error and friends but they seem to crash Emacs. Maybe I wasn’t using them correctly.
xsignal2 should work, as should xsignal. Please show the code which
crashed.
> IIUC if we want tree-sitter to use our malloc, we need to build it with Emacs, where should I put the source of tree-sitter?
tree-sitter itself should be a library we link against. If you meant
the tree-sitter support code, then it should go on a separate file in
src/. Or did I misunderstand your question?
> What’s the different between make_string and make_pure_c_string? I’ve seen this “pure” thing else where, what does “pure” mean?
I suggest to read the node "Pure Storage" in the ELisp manual. It
explains that.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 15:10 ` Stefan Monnier
@ 2021-07-24 15:51 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 15:51 UTC (permalink / raw)
To: Stefan Monnier; +Cc: cpitclaudel, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: cpitclaudel@gmail.com, emacs-devel@gnu.org
> Date: Sat, 24 Jul 2021 11:10:26 -0400
>
> >> I see absolutely no problem with scaling in making a copy: the extra
> >> memory and CPU time taken by the copy will be a constant factor which
> >> I don't expect to go much beyond 10%
> > 10% of what? It will be 100% of all the buffers that need parsing.
>
> 10% of the memory used by that buffer, since TS's data structure eats up
> about 10x the size of the buffer's text.
That's still a lot of wasted storage, let alone time.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 15:04 ` Yuan Fu
2021-07-24 15:48 ` Eli Zaretskii
@ 2021-07-24 16:14 ` Eli Zaretskii
2021-07-24 17:32 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 16:14 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 24 Jul 2021 11:04:35 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
>
> +(define-derived-mode ts-c-mode prog-mode "TS C"
> + "C mode with tree-sitter support."
> + (setq-local font-lock-fontify-region-function
> + #'tree-sitter-fontify-region-function)
> + (setq-local tree-sitter-font-lock-settings
> + `(("font-lock-c"
> + ,(tree-sitter-c)
> + "(null) @font-lock-constant-face
> +(true) @font-lock-constant-face
> +(false) @font-lock-constant-face
> +
> +(comment) @font-lock-comment-face
> +
> +(system_lib_string) @ts-c-fontify-system-lib
> +
> +(unary_expression
> + operator: _ @font-lock-negation-char-face)
> +
> +(string_literal) @font-lock-string-face
> +(char_literal) @font-lock-string-face
Where does this repertoire of possible syntax categories come from?
Is this from some list that TS exposes or documents? If so, what
happens when the repertoire is modified?
> beg = (char *) BUF_BYTE_ADDRESS (buffer, byte_pos);
> - len = next_char_len(byte_pos);
> + len = BYTES_BY_CHAR_HEAD ((int) beg);
The last line is wrong: you need the byte itself. So it should be:
len = BYTES_BY_CHAR_HEAD (*beg);
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 15:48 ` Eli Zaretskii
@ 2021-07-24 17:14 ` Yuan Fu
2021-07-24 17:20 ` Eli Zaretskii
2021-07-26 14:38 ` Perry E. Metzger
1 sibling, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-24 17:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, Stefan Monnier, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 667 bytes --]
>
>> IIUC if we want tree-sitter to use our malloc, we need to build it with Emacs, where should I put the source of tree-sitter?
>
> tree-sitter itself should be a library we link against. If you meant
> the tree-sitter support code, then it should go on a separate file in
> src/. Or did I misunderstand your question?
If we link against libtree-sitter, how do we change its malloc behavior? Tree-sitter has these kind of things:
#ifndef ts_malloc
#define ts_malloc ts_malloc_default
#endif
So I assume we need to define ts_malloc to, say, xmalloc when compiling libtree-sitter. And if we only link to it, we can’t redefine ts_malloc.
Yuan
[-- Attachment #2: Type: text/html, Size: 4452 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 17:14 ` Yuan Fu
@ 2021-07-24 17:20 ` Eli Zaretskii
2021-07-24 17:40 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 17:20 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 24 Jul 2021 13:14:50 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> cpitclaudel@gmail.com,
> emacs-devel@gnu.org
>
> IIUC if we want tree-sitter to use our malloc, we need to build it with Emacs, where should I put the
> source of tree-sitter?
>
> tree-sitter itself should be a library we link against. If you meant
> the tree-sitter support code, then it should go on a separate file in
> src/. Or did I misunderstand your question?
>
> If we link against libtree-sitter, how do we change its malloc behavior? Tree-sitter has these kind of things:
>
> #ifndef ts_malloc
> #define ts_malloc ts_malloc_default
> #endif
>
> So I assume we need to define ts_malloc to, say, xmalloc when compiling libtree-sitter. And if we only link to
> it, we can’t redefine ts_malloc.
How does TS propose the client projects to do that? Are you saying
that the only way to replace its malloc is to recompile tree-sitter??
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 16:14 ` Eli Zaretskii
@ 2021-07-24 17:32 ` Yuan Fu
2021-07-24 17:42 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-24 17:32 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, monnier, emacs-devel
> On Jul 24, 2021, at 12:14 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sat, 24 Jul 2021 11:04:35 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> emacs-devel@gnu.org
>>
>> +(define-derived-mode ts-c-mode prog-mode "TS C"
>> + "C mode with tree-sitter support."
>> + (setq-local font-lock-fontify-region-function
>> + #'tree-sitter-fontify-region-function)
>> + (setq-local tree-sitter-font-lock-settings
>> + `(("font-lock-c"
>> + ,(tree-sitter-c)
>> + "(null) @font-lock-constant-face
>> +(true) @font-lock-constant-face
>> +(false) @font-lock-constant-face
>> +
>> +(comment) @font-lock-comment-face
>> +
>> +(system_lib_string) @ts-c-fontify-system-lib
>> +
>> +(unary_expression
>> + operator: _ @font-lock-negation-char-face)
>> +
>> +(string_literal) @font-lock-string-face
>> +(char_literal) @font-lock-string-face
>
> Where does this repertoire of possible syntax categories come from?
> Is this from some list that TS exposes or documents? If so, what
> happens when the repertoire is modified?
These “syntax categories” are defined by individual language grammar definition for tree-sitter, so it could change from language to language. And tree-sitter does not document them. If these “syntax categories” change, then we need to change our code with them. But I doubt that it will happen often. They are hard to document, because a non-trivial grammar definition often defines hundreds of them; the grammar definition for C has 1000 LOC.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 17:20 ` Eli Zaretskii
@ 2021-07-24 17:40 ` Yuan Fu
2021-07-24 17:46 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-24 17:40 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Clément Pit-Claudel, Stefan Monnier, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1840 bytes --]
> On Jul 24, 2021, at 1:20 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com <mailto:casouri@gmail.com>>
>> Date: Sat, 24 Jul 2021 13:14:50 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca <mailto:monnier@iro.umontreal.ca>>,
>> cpitclaudel@gmail.com <mailto:cpitclaudel@gmail.com>,
>> emacs-devel@gnu.org <mailto:emacs-devel@gnu.org>
>>
>> IIUC if we want tree-sitter to use our malloc, we need to build it with Emacs, where should I put the
>> source of tree-sitter?
>>
>> tree-sitter itself should be a library we link against. If you meant
>> the tree-sitter support code, then it should go on a separate file in
>> src/. Or did I misunderstand your question?
>>
>> If we link against libtree-sitter, how do we change its malloc behavior? Tree-sitter has these kind of things:
>>
>> #ifndef ts_malloc
>> #define ts_malloc ts_malloc_default
>> #endif
>>
>> So I assume we need to define ts_malloc to, say, xmalloc when compiling libtree-sitter. And if we only link to
>> it, we can’t redefine ts_malloc.
>
> How does TS propose the client projects to do that? Are you saying
> that the only way to replace its malloc is to recompile tree-sitter??
Here is the relevant lines in alloc.h in tree-sitter:
// Allow clients to override allocation functions
#ifndef ts_malloc
#define ts_malloc ts_malloc_default
#endif
#ifndef ts_calloc
#define ts_calloc ts_calloc_default
#endif
#ifndef ts_realloc
#define ts_realloc ts_realloc_default
#endif
#ifndef ts_free
#define ts_free ts_free_default
#endif
I’m not a C expert, does this allow us to replace its malloc in runtime?
Relative discussion found on the issue tracker: https://github.com/tree-sitter/tree-sitter/issues/739 <https://github.com/tree-sitter/tree-sitter/issues/739>
Yuan
[-- Attachment #2: Type: text/html, Size: 4793 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 17:32 ` Yuan Fu
@ 2021-07-24 17:42 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 17:42 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 24 Jul 2021 13:32:18 -0400
> Cc: monnier@iro.umontreal.ca,
> cpitclaudel@gmail.com,
> emacs-devel@gnu.org
>
> >> +(define-derived-mode ts-c-mode prog-mode "TS C"
> >> + "C mode with tree-sitter support."
> >> + (setq-local font-lock-fontify-region-function
> >> + #'tree-sitter-fontify-region-function)
> >> + (setq-local tree-sitter-font-lock-settings
> >> + `(("font-lock-c"
> >> + ,(tree-sitter-c)
> >> + "(null) @font-lock-constant-face
> >> +(true) @font-lock-constant-face
> >> +(false) @font-lock-constant-face
> >> +
> >> +(comment) @font-lock-comment-face
> >> +
> >> +(system_lib_string) @ts-c-fontify-system-lib
> >> +
> >> +(unary_expression
> >> + operator: _ @font-lock-negation-char-face)
> >> +
> >> +(string_literal) @font-lock-string-face
> >> +(char_literal) @font-lock-string-face
> >
> > Where does this repertoire of possible syntax categories come from?
> > Is this from some list that TS exposes or documents? If so, what
> > happens when the repertoire is modified?
>
> These “syntax categories” are defined by individual language grammar definition for tree-sitter, so it could change from language to language. And tree-sitter does not document them. If these “syntax categories” change, then we need to change our code with them. But I doubt that it will happen often. They are hard to document, because a non-trivial grammar definition often defines hundreds of them; the grammar definition for C has 1000 LOC.
Isn't there a better way of updating those than manually take them out
of the TS grammar? Maybe write a short program linked against TS that
would spill them in some format that's convenient to use? Manual
updates are a serious maintenance burden.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 17:40 ` Yuan Fu
@ 2021-07-24 17:46 ` Eli Zaretskii
2021-07-24 18:06 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 17:46 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 24 Jul 2021 13:40:28 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
>
> How does TS propose the client projects to do that? Are you saying
> that the only way to replace its malloc is to recompile tree-sitter??
>
> Here is the relevant lines in alloc.h in tree-sitter:
>
> // Allow clients to override allocation functions
>
> #ifndef ts_malloc
> #define ts_malloc ts_malloc_default
> #endif
> #ifndef ts_calloc
> #define ts_calloc ts_calloc_default
> #endif
> #ifndef ts_realloc
> #define ts_realloc ts_realloc_default
> #endif
> #ifndef ts_free
> #define ts_free ts_free_default
> #endif
>
> I’m not a C expert, does this allow us to replace its malloc in runtime?
No, not AFAIU. It only allows to make such changes when TS is
compiled.
We should ask the TS developers to provide a way of specifying custom
memory allocation/release function as part of TS initialization. It
is a feature many packages provide.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 17:46 ` Eli Zaretskii
@ 2021-07-24 18:06 ` Yuan Fu
2021-07-24 18:21 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-24 18:06 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, Stefan Monnier, emacs-devel
> We should ask the TS developers to provide a way of specifying custom
> memory allocation/release function as part of TS initialization. It
> is a feature many packages provide.
I commented on tree-sitter’s 1.0 checklist.
> Isn't there a better way of updating those than manually take them out
> of the TS grammar? Maybe write a short program linked against TS that
> would spill them in some format that's convenient to use? Manual
> updates are a serious maintenance burden.
How does this convenient format looks like, in your mind? The grammar definition is already the “source”, I don’t see a way to magically make it easier to work with. What does “manual updates” refer to? If you mean updating patterns like
(init_declarator
declarator: (identifier) @font-lock-variable-name-face)
(parameter_declaration
declarator: (identifier) @font-lock-variable-name-face)
when a language’s grammar changes, I don’t think we need to update them often, or ever. And It is not harder than updating font-lock-keywords when a language adds a new fancy syntax.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 18:06 ` Yuan Fu
@ 2021-07-24 18:21 ` Eli Zaretskii
2021-07-24 18:55 ` Stefan Monnier
2021-07-25 18:44 ` Stephen Leake
0 siblings, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-24 18:21 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 24 Jul 2021 14:06:52 -0400
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> cpitclaudel@gmail.com,
> emacs-devel@gnu.org
>
> > We should ask the TS developers to provide a way of specifying custom
> > memory allocation/release function as part of TS initialization. It
> > is a feature many packages provide.
>
> I commented on tree-sitter’s 1.0 checklist.
Thanks.
> > Isn't there a better way of updating those than manually take them out
> > of the TS grammar? Maybe write a short program linked against TS that
> > would spill them in some format that's convenient to use? Manual
> > updates are a serious maintenance burden.
>
> How does this convenient format looks like, in your mind? The grammar definition is already the “source”, I don’t see a way to magically make it easier to work with. What does “manual updates” refer to? If you mean updating patterns like
>
> (init_declarator
> declarator: (identifier) @font-lock-variable-name-face)
>
> (parameter_declaration
> declarator: (identifier) @font-lock-variable-name-face)
>
> when a language’s grammar changes, I don’t think we need to update them often, or ever. And It is not harder than updating font-lock-keywords when a language adds a new fancy syntax.
It isn't an immediate problem, so we can delay it for later.
However, I do worry about the ability to update this in some
non-manual way. Take for example the way we update our character
databases when Unicode adds more characters/scripts: we use the data
files distributed by the Unicode Consortium and process them with
scripts in admin/unidata to produce intermediate files in a format
convenient for processing by Emacs, then we process those intermediate
files as part of building Emacs. Unicode files change maybe or twice
a year, but still, doing all those changes manually would be a burden.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 18:21 ` Eli Zaretskii
@ 2021-07-24 18:55 ` Stefan Monnier
2021-07-25 18:44 ` Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Stefan Monnier @ 2021-07-24 18:55 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, cpitclaudel, emacs-devel
> However, I do worry about the ability to update this in some
> non-manual way.
It has to be manual to the extent that it's not something that is
inherent to the BNF grammar. The rules could accompany the grammar
(and TS could give access to them), in which case presumably all editors
using TS would end up fontifying in the same way, which would
make a fair bit of sense.
But in any case, this seems to be a preoccupation that goes much beyond
the actual immediate integration of tree-sitter into Emacs, and concerns
instead the evolution of tree-sitter itself.
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 9:33 ` Stephen Leake
@ 2021-07-24 22:54 ` Dmitry Gutov
0 siblings, 0 replies; 370+ messages in thread
From: Dmitry Gutov @ 2021-07-24 22:54 UTC (permalink / raw)
To: Stephen Leake, Yuan Fu
Cc: Eli Zaretskii, Clément Pit-Claudel, Stefan Monnier,
emacs-devel
On 24.07.2021 12:33, Stephen Leake wrote:
>> Plus, if tree-sitter respects narrowing, it could happen where a user
>> narrows the buffer, the font-locking changes and is not correct
>> anymore. Maybe that’s not the user want.
> Exactly. The indent will be wrong, too, if narrowing excludes a
> containing block.
The important pieces of code now (in recent Emacs versions) undo
narrowing when do fundamental operations like parsing the buffer (with
syntax-spss), applying font-lock rules or doing indentation, unless
instructed otherwise by the major mode, or the multiple-major-mode
framework.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 6:51 ` Eli Zaretskii
@ 2021-07-25 16:16 ` Stephen Leake
0 siblings, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-25 16:16 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, monnier, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Cc: casouri@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org
>> Date: Fri, 23 Jul 2021 19:00:12 -0700
>>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>> >> > I fail to see the significance of the difference. Surely, you could
>> >> > hand it a block of text with changes to mean that this block replaces
>> >> > the previous version of that block. It might take the parser more
>> >> > work to update the parse tree in this case, but if it's fast enough,
>> >> > that won't be the problem. Right?
>> >>
>> >> tree-sitter doesn't store the previous text, so there's nothing to
>> >> compare it to.
>> >
>> > There was nothing about comparison in my text. You tell TS that
>> > editing replaced a block of text between A and B with block between A
>> > and C, without revealing the fine-grained changes inside that block.
>> > This must work, because editing could indeed do just that.
>>
>> I see; treat the whole block as one change. Yes, that would work, but it
>> would probably be less optimal than sending a list of smaller changes;
>> depends on the details.
>
> Since TS is very fast, I think this sub-optimality will not cause any
> tangible performance issues in Emacs. And from our POV it is a good
> optimization because it will minimize (and to some extent optimize)
> the traffic between Emacs and TS.
"optimal" refers to more than speed; error recovery is also important.
The more of the previous tree you keep, the better the error recovery.
After we get some good metrics/benchmarks for actual Emacs use (ie, how
good is the indentation?), we can explore this.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 3:39 ` Óscar Fuentes
2021-07-24 7:34 ` Eli Zaretskii
@ 2021-07-25 16:49 ` Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-25 16:49 UTC (permalink / raw)
To: Óscar Fuentes; +Cc: emacs-devel
Óscar Fuentes <ofv@wanadoo.es> writes:
> Stephen Leake <stephen_leake@stephe-leake.org> writes:
>
>> 39 milliseconds for re-indent is just slow enough to be noticeable; I still
>> have algorithms to convert to be as incremental as possible.
>
> [snip]
>
>> In a very small file:
>>
>> initial 0.000632 seconds
>> re-indent 0.000942 seconds
>>
>> Easily fast enough to keep up with the user.
>
> Doing work every time the user changes the file is not always a good
> thing.
It very much depends on the user's preferences. Note that in standard
Emacs usage, font-lock runs after every character is typed.
With the current ada-mode release, which uses partial parse instead of
incremental parse, the parse process cannot keep up with user typing. So
I run with jit-lock-defer-time set to 1.5 seconds. However, many people
want the fontification to be much more responsive. With wisi
incremental parse, ada-mode can now do that.
Since I got incremental parse working in wisi, I've set
jit-lock-defer-time to the default nil; I like it, and will not go back.
I mostly tolerated the delay before because I knew how hard it would be
to fix :).
> Nowadays the user doesn't just expect automatic indentation, he wants
> code formatting too, which means splitting, fusing and inserting
> lines, plus moving chunks of code left and right. Doing that every
> time a character is added or deleted can be visually confusing due to
> chunks of text changing positions as you type, so the systems I know
> are triggered by certain events (like the insertion of characters that
> mark the end of statements).
Yes; different parser-based operations are triggered by different
events. That is true for wisi now; font-lock is triggered by the
standard Emacs mechanisms (ie, after every character is typed, the
window is scrolled, etc), indent is triggered by the standard Emacs
mechanisms (indent-region-function, indent-line-function; ie RET and
TAB), navigate (computing single-file cross-reference) is triggered by
forward-sexp or some similar "wisi-goto-*" function, reformatting is
triggered by "align" (in parallel with the standard Emacs align
mechanism) or a direct wisi-reformat-* function (there are some in a
context menu for Ada). All of these operations update the parse tree
only if the buffer has changed; if not, they use the existing tree.
The user can always customize things - wisi provides the framework.
> Then they analyze the code and, if it is well formed, apply the
> reformatting. Something similar could be said about fontification and
> other tasks.
Wisi does indentation even in the presence of syntax errors (ie, not
"well formed"). This helps when writing code; when entering an "if"
statement, you don't have to start with a complete template; you just
type the code. It does sometimes cause confusing results; fixing the
syntax always resolves that.
> So I'll insist on not obsessing too much about performance. Implement
> something simple, see if it is usable. If not, invest effort on
> optimizations until it is good enough.
Yes; premature optimization is the enemy of good enough. And good
benchmarks/metrics should be the guide of any optimization; allowing
font-lock to run after every character is typed is one such metric.
Indentation in the presence of syntax errors is another; that was the
primary complaint about ada-mode before I implemented error correction,
and is still a common complaint; incremental parse will improve that.
These two metrics were the trigger that started me implementing
incremental parse.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 7:06 ` Eli Zaretskii
@ 2021-07-25 17:48 ` Stephen Leake
0 siblings, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-25 17:48 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Cc: Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> emacs-devel@gnu.org
>> Date: Fri, 23 Jul 2021 19:57:32 -0700
>>
>> > How much "less"? Close to 1 sec is indeed annoying, but 20 msec or so
>> > should be bearable.
>> >
>> > You seem to assume up front that TS (re)-parsing will take 1 sec, but
>> > AFAIK there's no reason to assume such bad performance.
>>
>> This is for the initial parse, on a large file. No matter how fast the
>> parser is, I can give you a file that takes one second to parse, and
>> some user will have such a file (the work always expands to consume all
>> the resources available).
>
> That problem is already with us: if I visit xdisp.c in an unoptimized
> build of Emacs 28, I wait almost 4 sec for the first window-full to be
> displayed. (It's more like 0.5 sec in an optimized build of Emacs
> 27.2.) So the real question is how much using TS will _improve_ the
> situation.
Yes. But here other solutions, like parsing only part of the buffer,
offer much better improvement.
>> I just got incremental parse working well enough to measure it; in the
>> largest Ada file I have (10,000 lines from Eurocontrol):
>>
>> initial parse: 1.539319 seconds
>> re-indent two lines: 0.038999 seconds
>>
>> 39 milliseconds for re-indent is just slow enough to be noticeable; I still
>> have algorithms to convert to be as incremental as possible.
>
> For comparison, how much does re-indentation of 2 lines take in Emacs
> without a parser?
I don't think this is a meaningful question, or at least, I don't have
an answer.
For ada-mode, you'd have to go back to version 4.0, where the
indentation was ad-hoc elisp. It was fast enough to be not noticeable.
But I switched to a parser because that indentation algorithm was often
incorrect, and was very brittle in the face of new features in new Ada
language standard releases.
Other languages don't use a parser for indentation, so there's no way to
compare. Even the AdaCore editor Gnat Studio doesn't use their parser
for indentation in Ada; Emacs ada-mode is the only one I know of.
I guess you could say it's a trade of indentation quality vs speed.
Witness the recent thread about inconsistent fontification in C; a
parser would resolve that, but LSP via eglot is probably slower than the
current elisp. Indentation is similar, but the quality difference is
bigger, at least for Ada.
> 39 msec might be noticeable, but it isn't annoying; anything below 50
> msec isn't.
You are right; in that large Ada file, I don't notice the font-lock
delay after typing each character.
> Try "C-x TAB" in Emacs on 10-line block of text, and you get more than
> that.
Depends on the mode;
text-mode: 0.4 microseconds.
In xdisp.c, indenting it_char_has_category, 47.5 milliseconds.
In benchmark.el, indenting benchmark-call; 1.2 milliseconds.
The computation here is font-lock due to the text moving in the buffer;
in the ada-mode benchmark above, it is computing indent. Calling
indent-rigidly, then indent-region (which results in zero net buffer
change, so apparently no significant font-lock), I get:
xdisp.c: 17.1 ms
benchmark.el: 3.6 ms
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 6:00 ` Eli Zaretskii
@ 2021-07-25 18:01 ` Stephen Leake
2021-07-25 19:09 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-25 18:01 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, emacs-devel, cpitclaudel, monnier
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 23 Jul 2021 16:22:59 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> emacs-devel@gnu.org
>>
>> > We must replace this function, if only because the MS-Windows build of
>> > Emacs uses a custom malloc implementation. Does TS allow the client
>> > to use its own malloc?
>>
>> Yes, in that case, we need to embed tree-sitter into Emacs, instead
>> of using it as a dynamic library, I think.
>>
>> // Allow clients to override allocation functions
>> #ifndef ts_malloc
>> #define ts_malloc ts_malloc_default
>> #endif
>> #ifndef ts_calloc
>> #define ts_calloc ts_calloc_default
>> #endif
>> #ifndef ts_realloc
>> #define ts_realloc ts_realloc_default
>> #endif
>> #ifndef ts_free
>> #define ts_free ts_free_default
>> #endif
>>
>> How do we handle such thing in Emacs?
>
> We use xmalloc, which calls memory_full when allocation fails, which
> releases some spare memory we have for this purpose, and tells the
> user to save the session and exit.
I'm thinking about how this applies to wisi, when migrating to a module.
Ada has a built-in allocator; it's probably possible to change that, but
I'd like to understand exactly why we need to do that.
The Ada allocator throws an exception on allocation fail; is it
sufficient to turn that exception into an elisp signal, and arrange for
elisp to call memory_full (or take some other action, like killing the
parser)?
Another possible reason to change the Ada allocator is if we want to
expose Ada memory pointers directly to elisp, as Yuan Fu wants to do for
tree-sitter (I don't plan to do this for wisi). Does that require that
the pointers be allocated by the same allocator? I'm not clear what that
would mean for the garbage collector; is it then expected to recover the
tree-sitter-allocated memory for the tree? or does it ignore those lisp
objects?
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 11:22 ` Eli Zaretskii
@ 2021-07-25 18:21 ` Stephen Leake
2021-07-25 19:03 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-25 18:21 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, emacs-devel, cpitclaudel, monnier
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Cc: Yuan Fu <casouri@gmail.com>, cpitclaudel@gmail.com,
>> monnier@iro.umontreal.ca, emacs-devel@gnu.org
>> Date: Sat, 24 Jul 2021 02:42:24 -0700
>>
>> > But that's how the current font-lock and indentation work: they never
>> > look beyond the narrowing limits.
>>
>> And that's broken
>
> ??? Of course, it isn't: it's how Emacs has worked since v21.1.
Ada (and other languages, but not all) requires the full file text to
properly compute font and indent; narrowing breaks that.
The fix for font-lock is font-lock-dont-widen; I implemented a similar
mechanism for indent of an ada-mode region in multi-major-mode.
In plain ada-mode, indent is currently broken in a narrowed buffer;
wisi-indent-region does not widen because it is language-agnostic, and I
have not gotten around to implementing a "widen for indent" hook because I
don't use narrowing very often, and no one has complained.
>> unless the narrowing is for multi-major-mode.
>
> And what would you do in that case, if you allow TS to look beyond the
> restriction?
In the multi-major-mode case, there is a separate parser for each
language, and each sub-mode region in the text would get its own parser
tree (ie, it acts like a separate file), and that parser tree is only
told about changes to those regions. So the parser will never try to
look outside the region; it doesn't need to know about narrowing.
I'll have to upgrade my Ada multi-major-mode implementation to do this
for incremental parse.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 18:21 ` Eli Zaretskii
2021-07-24 18:55 ` Stefan Monnier
@ 2021-07-25 18:44 ` Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-25 18:44 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, emacs-devel, cpitclaudel, monnier
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Yuan Fu <casouri@gmail.com>
>>
>> when a language’s grammar changes, I don’t think we need to update
>> them often, or ever. And It is not harder than updating
>> font-lock-keywords when a language adds a new fancy syntax.
>
> It isn't an immediate problem, so we can delay it for later.
>
> However, I do worry about the ability to update this in some
> non-manual way. Take for example the way we update our character
> databases when Unicode adds more characters/scripts: we use the data
> files distributed by the Unicode Consortium
If the language concerned has some standard definition that is machine
readable, then we could get partway there.
But no language standard specifies fontification (or indent), so there
is no standard machine-readable description of these.
tree-sitter provides a defacto standard for specifying fontification (in
the highlight rules files); it would make sense for emacs to be able to
read those files directly, along with linking to the corresponding
tree-sitter parser. There would have to provide a separate mapping from
tree-sitter notation to emacs font names.
wisi provides a mechanism to describe fontification and indentation in
the grammar source file; every time ISO releases a new Ada language
version, I have to compare them and incorporate the changes. I've
written code to partly automate this (the language reference manual
contains the grammar in a variant of EBNF in an appendix, which is
mostly machine-readable), but it's highly Ada specific, and still mostly
a manual process. Fortunately it only happens every 10 years for Ada :).
Many languages provide some EBNF description of the language, but it is
often in a form that is not suitable for whatever parser generator you
are using; it is usually optimized for human understanding. I made it a
requirement for the wisi parser generator to use the Ada reference
grammar as closely as possible, but I still have to modify the grammar
to get reasonable performance. You can find many different Java grammars
on the web, optimized for different parser generators (there are two
different but nominally equivalent grammars in the Java docs).
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-25 18:21 ` Stephen Leake
@ 2021-07-25 19:03 ` Eli Zaretskii
2021-07-26 16:40 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-25 19:03 UTC (permalink / raw)
To: Stephen Leake; +Cc: casouri, emacs-devel, cpitclaudel, monnier
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: casouri@gmail.com, cpitclaudel@gmail.com, monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
> Date: Sun, 25 Jul 2021 11:21:27 -0700
>
> >> > But that's how the current font-lock and indentation work: they never
> >> > look beyond the narrowing limits.
> >>
> >> And that's broken
> >
> > ??? Of course, it isn't: it's how Emacs has worked since v21.1.
>
> Ada (and other languages, but not all) requires the full file text to
> properly compute font and indent; narrowing breaks that.
Not relevant: if a major mode's fontification code knows it needs to
do that, it will call 'widen'.
The issue was what should the TS reader function do. My firm opinion
is that it should not look beyond the restriction, because it isn't
its business to make those decisions. If the caller needs to widen,
it will.
> >> unless the narrowing is for multi-major-mode.
> >
> > And what would you do in that case, if you allow TS to look beyond the
> > restriction?
>
> In the multi-major-mode case, there is a separate parser for each
> language, and each sub-mode region in the text would get its own parser
> tree (ie, it acts like a separate file), and that parser tree is only
> told about changes to those regions. So the parser will never try to
> look outside the region; it doesn't need to know about narrowing.
Once again, we are talking about the function used by TS to read
buffer text. Not about the parser or its caller. Low-level code,
which knows nothing about the context, should never look beyond the
restriction.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-25 18:01 ` Stephen Leake
@ 2021-07-25 19:09 ` Eli Zaretskii
2021-07-26 5:10 ` Stephen Leake
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-25 19:09 UTC (permalink / raw)
To: Stephen Leake; +Cc: casouri, emacs-devel, cpitclaudel, monnier
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: Yuan Fu <casouri@gmail.com>, cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Sun, 25 Jul 2021 11:01:22 -0700
>
> >> How do we handle such thing in Emacs?
> >
> > We use xmalloc, which calls memory_full when allocation fails, which
> > releases some spare memory we have for this purpose, and tells the
> > user to save the session and exit.
>
> I'm thinking about how this applies to wisi, when migrating to a module.
>
> Ada has a built-in allocator; it's probably possible to change that, but
> I'd like to understand exactly why we need to do that.
We need that to allow the user to save the session while he/she can.
> The Ada allocator throws an exception on allocation fail; is it
> sufficient to turn that exception into an elisp signal, and arrange for
> elisp to call memory_full (or take some other action, like killing the
> parser)?
What is a "lisp signal" in this context?
> Another possible reason to change the Ada allocator is if we want to
> expose Ada memory pointers directly to elisp, as Yuan Fu wants to do for
> tree-sitter (I don't plan to do this for wisi). Does that require that
> the pointers be allocated by the same allocator?
Same allocator as what?
> I'm not clear what that would mean for the garbage collector; is it
> then expected to recover the tree-sitter-allocated memory for the
> tree? or does it ignore those lisp objects?
It depends on which Lisp object you wrap those pointers. User-pointer
object allow you to provide your own "finalizer" function.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-25 19:09 ` Eli Zaretskii
@ 2021-07-26 5:10 ` Stephen Leake
2021-07-26 12:56 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-26 5:10 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, emacs-devel, cpitclaudel, monnier
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Stephen Leake <stephen_leake@stephe-leake.org>
>> Cc: Yuan Fu <casouri@gmail.com>, cpitclaudel@gmail.com,
>> monnier@iro.umontreal.ca, emacs-devel@gnu.org
>> Date: Sun, 25 Jul 2021 11:01:22 -0700
>>
>> >> How do we handle such thing in Emacs?
>> >
>> > We use xmalloc, which calls memory_full when allocation fails, which
>> > releases some spare memory we have for this purpose, and tells the
>> > user to save the session and exit.
>>
>> I'm thinking about how this applies to wisi, when migrating to a module.
>>
>> Ada has a built-in allocator; it's probably possible to change that, but
>> I'd like to understand exactly why we need to do that.
>
> We need that to allow the user to save the session while he/she can.
>
>> The Ada allocator throws an exception on allocation fail; is it
>> sufficient to turn that exception into an elisp signal, and arrange for
>> elisp to call memory_full (or take some other action, like killing the
>> parser)?
>
> What is a "lisp signal" in this context?
The module interface layer of wisi.el would do:
(signal 'error "parser ran out of memory")
>> Another possible reason to change the Ada allocator is if we want to
>> expose Ada memory pointers directly to elisp, as Yuan Fu wants to do for
>> tree-sitter (I don't plan to do this for wisi). Does that require that
>> the pointers be allocated by the same allocator?
>
> Same allocator as what?
As other lisp symbols.
>> I'm not clear what that would mean for the garbage collector; is it
>> then expected to recover the tree-sitter-allocated memory for the
>> tree? or does it ignore those lisp objects?
>
> It depends on which Lisp object you wrap those pointers. User-pointer
> object allow you to provide your own "finalizer" function.
Ok, that would work.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 5:10 ` Stephen Leake
@ 2021-07-26 12:56 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-26 12:56 UTC (permalink / raw)
To: Stephen Leake; +Cc: casouri, emacs-devel, cpitclaudel, monnier
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: casouri@gmail.com, cpitclaudel@gmail.com, monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
> Date: Sun, 25 Jul 2021 22:10:12 -0700
>
> >> The Ada allocator throws an exception on allocation fail; is it
> >> sufficient to turn that exception into an elisp signal, and arrange for
> >> elisp to call memory_full (or take some other action, like killing the
> >> parser)?
> >
> > What is a "lisp signal" in this context?
>
> The module interface layer of wisi.el would do:
>
> (signal 'error "parser ran out of memory")
We don't have such an error (and handling an error when you've run out
of memory could backfire).
> >> Another possible reason to change the Ada allocator is if we want to
> >> expose Ada memory pointers directly to elisp, as Yuan Fu wants to do for
> >> tree-sitter (I don't plan to do this for wisi). Does that require that
> >> the pointers be allocated by the same allocator?
> >
> > Same allocator as what?
>
> As other lisp symbols.
Not sure, perhaps you could free them in a finalizer instead. If you
want GC to free them, then yes.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-24 15:48 ` Eli Zaretskii
2021-07-24 17:14 ` Yuan Fu
@ 2021-07-26 14:38 ` Perry E. Metzger
1 sibling, 0 replies; 370+ messages in thread
From: Perry E. Metzger @ 2021-07-26 14:38 UTC (permalink / raw)
To: emacs-devel
On 7/24/21 11:48, Eli Zaretskii wrote:
>> From: Yuan Fu <casouri@gmail.com>
>>
>>
>> IIUC if we want tree-sitter to use our malloc, we need to build it with Emacs, where should I put the source of tree-sitter?
> tree-sitter itself should be a library we link against. If you meant
> the tree-sitter support code, then it should go on a separate file in
> src/. Or did I misunderstand your question?
>
I suspect that the authors' expectations are that enough things need to
be tweaked that a given editor project like Emacs probably would want to
recompile Tree Sitter for use in their system. I'm not 100% sure about
that, but it seems to be what they're assuming.
Perry
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-25 19:03 ` Eli Zaretskii
@ 2021-07-26 16:40 ` Yuan Fu
2021-07-26 16:49 ` Eli Zaretskii
2021-07-26 23:40 ` Ergus
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-26 16:40 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Stephen Leake, Stefan Monnier,
emacs-devel
>
>>>> unless the narrowing is for multi-major-mode.
>>>
>>> And what would you do in that case, if you allow TS to look beyond the
>>> restriction?
>>
>> In the multi-major-mode case, there is a separate parser for each
>> language, and each sub-mode region in the text would get its own parser
>> tree (ie, it acts like a separate file), and that parser tree is only
>> told about changes to those regions. So the parser will never try to
>> look outside the region; it doesn't need to know about narrowing.
>
> Once again, we are talking about the function used by TS to read
> buffer text. Not about the parser or its caller. Low-level code,
> which knows nothing about the context, should never look beyond the
> restriction.
It doesn’t harm for tree-sitter to see the rest of the buffer, it doesn’t modify anything, all it does it reading the text. OTOH, restricting tree-sitter to the bounds of narrows adds complexity for no benefit (as far as I can see). Maybe narrowing is the context that low level code should ignore, or at least tree-sitter should ignore. The only benefit that I can think of is “we firmly adhere to the ‘contract’ that no one can look beyond the narrowed region”, but is it a good contract? Is there really a contract in the first place? IMO, narrowing acts like masking tapes over the rest of the buffer, so that user edits like re-replace wouldn’t spill out. Demanding everything in Emacs to not have access to the rest of the buffer is dogmatic (in the sense that it is too rigid and is simply following the doctrine blindly).
And about language definitions and font-locking, I just realized that tree-sitter language definitions provides highlighting patterns, and we only need to minimally modify them to use them for Emacs, so there aren’t much manual effort involved.
Also, anyone have thoughts on how should tree-sitter intergrate with font-lock beyond the current simple interface?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 16:40 ` Yuan Fu
@ 2021-07-26 16:49 ` Eli Zaretskii
2021-07-26 17:09 ` Yuan Fu
2021-07-26 18:32 ` chad
2021-07-26 23:40 ` Ergus
1 sibling, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-26 16:49 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 26 Jul 2021 12:40:31 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> > Once again, we are talking about the function used by TS to read
> > buffer text. Not about the parser or its caller. Low-level code,
> > which knows nothing about the context, should never look beyond the
> > restriction.
>
> It doesn’t harm for tree-sitter to see the rest of the buffer, it doesn’t modify anything, all it does it reading the text. OTOH, restricting tree-sitter to the bounds of narrows adds complexity for no benefit (as far as I can see).
Which complexity does it add? You just compare with BEGV_BYTE instead
of BEG_BYTE etc.
If we let TS look where it wants, we will lose the ability to restrict
it to a certain part of the buffer text. This is needed at least for
some specialized modes, and is generally desirable, as it gives Lisp
programs an easy way to impose such restrictions whenever they need.
> Maybe narrowing is the context that low level code should ignore
No other code in Emacs does, and for a good reason.
> The only benefit that I can think of is “we firmly adhere to the ‘contract’ that no one can look beyond the narrowed region”, but is it a good contract? Is there really a contract in the first place?
It served us very well until now, so yes, I think it's a good
contract.
> IMO, narrowing acts like masking tapes over the rest of the buffer, so that user edits like re-replace wouldn’t spill out. Demanding everything in Emacs to not have access to the rest of the buffer is dogmatic (in the sense that it is too rigid and is simply following the doctrine blindly).
Again, this "dogma" is used and adhered everywhere else in Emacs by
such low-level code.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 16:49 ` Eli Zaretskii
@ 2021-07-26 17:09 ` Yuan Fu
2021-07-26 18:55 ` Eli Zaretskii
2021-07-27 6:13 ` Stephen Leake
2021-07-26 18:32 ` chad
1 sibling, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-26 17:09 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, Stephen Leake, monnier, emacs-devel
>>
>>> Once again, we are talking about the function used by TS to read
>>> buffer text. Not about the parser or its caller. Low-level code,
>>> which knows nothing about the context, should never look beyond the
>>> restriction.
>>
>> It doesn’t harm for tree-sitter to see the rest of the buffer, it doesn’t modify anything, all it does it reading the text. OTOH, restricting tree-sitter to the bounds of narrows adds complexity for no benefit (as far as I can see).
>
> Which complexity does it add? You just compare with BEGV_BYTE instead
> of BEG_BYTE etc.
We need to “delete” the hidden text and “re-insert” when we widen the buffer. I’ll try to make it a no-op as long as we remember to widen before calling tree-sitter to parse anything.
>
> If we let TS look where it wants, we will lose the ability to restrict
> it to a certain part of the buffer text. This is needed at least for
> some specialized modes, and is generally desirable, as it gives Lisp
> programs an easy way to impose such restrictions whenever they need.
Tree-sitter lets you set ranges for a parser to limit it self within, in order to support multi-language files.
>
>> Maybe narrowing is the context that low level code should ignore
>
> No other code in Emacs does, and for a good reason.
>
>> The only benefit that I can think of is “we firmly adhere to the ‘contract’ that no one can look beyond the narrowed region”, but is it a good contract? Is there really a contract in the first place?
>
> It served us very well until now, so yes, I think it's a good
> contract.
>
>> IMO, narrowing acts like masking tapes over the rest of the buffer, so that user edits like re-replace wouldn’t spill out. Demanding everything in Emacs to not have access to the rest of the buffer is dogmatic (in the sense that it is too rigid and is simply following the doctrine blindly).
>
> Again, this "dogma" is used and adhered everywhere else in Emacs by
> such low-level code.
Ok. I trust you to know better than I do.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 16:49 ` Eli Zaretskii
2021-07-26 17:09 ` Yuan Fu
@ 2021-07-26 18:32 ` chad
2021-07-26 18:44 ` Perry E. Metzger
2021-07-26 19:09 ` Eli Zaretskii
1 sibling, 2 replies; 370+ messages in thread
From: chad @ 2021-07-26 18:32 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, EMACS development team, Stephen Leake,
Clément Pit-Claudel, Stefan Monnier
[-- Attachment #1: Type: text/plain, Size: 2141 bytes --]
On Mon, Jul 26, 2021 at 9:49 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > It doesn’t harm for tree-sitter to see the rest of the buffer, it
> doesn’t modify anything, all it does it reading the text. OTOH, restricting
> tree-sitter to the bounds of narrows adds complexity for no benefit (as far
> as I can see).
>
> Which complexity does it add? You just compare with BEGV_BYTE instead
> of BEG_BYTE etc.
>
In order to exercise many of its over-all benefits, tree-sitter builds and
maintains a parse tree of the whole file, and takes care to modify that
tree in-place with minimized changes. If emacs+ts were to remove everything
outside the narrow and then re-add it each time something temporarily
narrowed a file (say, to enhance mental focus), then emacs+ts would be
(wastefully) throwing away some of the underlying assumptions that makes ts
useful.
Emacs' internals use narrow/widen *and mostly honor them at most levels*
because they are emacs' abstraction for separating parts of a buffer from
other parts. Tree-sitter has a separate abstraction for doing this -- the
developer can have ts use different internal objects for different parts of
the file. This allows editors like Atom (a major influence on ts' original
feature set) to support what emacs would call multiple major modes. (Emacs
could still use some help here, c.f. Alan's proposal for "islands" from a
few years back.)
Using the ts multiple parsers support inside emacs+ts to "handle" narrowing
seems like a strong idea, but there are likely some complexities involved
in "switching" back and forth between the full-file parse and the narrowed
parse, plus making sure that the right parses are updated when the buffer
changes. With that in mind, it might be easier to start with an emacs+ts
prototype that always uses the full-file parse, and then adding the
"sub-parses" later. In that sense, it seems like it's primarily a matter of
what level of itch people want to start scratching when.
emacs-dev islands discussion:
https://lists.gnu.org/archive/html/emacs-devel/2016-04/msg00585.html
Hope that helps,
~Chad
[-- Attachment #2: Type: text/html, Size: 2687 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 18:32 ` chad
@ 2021-07-26 18:44 ` Perry E. Metzger
2021-07-26 19:13 ` Eli Zaretskii
2021-07-26 19:09 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Perry E. Metzger @ 2021-07-26 18:44 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2358 bytes --]
On 7/26/21 14:32, chad wrote:
>
> In order to exercise many of its over-all benefits, tree-sitter builds
> and maintains a parse tree of the whole file, and takes care to modify
> that tree in-place with minimized changes. If emacs+ts were to remove
> everything outside the narrow and then re-add it each time something
> temporarily narrowed a file (say, to enhance mental focus), then
> emacs+ts would be (wastefully) throwing away some of the
> underlying assumptions that makes ts useful.
>
> Emacs' internals use narrow/widen *and mostly honor them at most
> levels* because they are emacs' abstraction for separating parts of a
> buffer from other parts. Tree-sitter has a separate abstraction for
> doing this -- the developer can have ts use different internal objects
> for different parts of the file. This allows editors like Atom (a
> major influence on ts' original feature set) to support what emacs
> would call multiple major modes. (Emacs could still use some help
> here, c.f. Alan's proposal for "islands" from a few years back.)
>
> Using the ts multiple parsers support inside emacs+ts to "handle"
> narrowing seems like a strong idea, but there are likely some
> complexities involved in "switching" back and forth between the
> full-file parse and the narrowed parse, plus making sure that the
> right parses are updated when the buffer changes. With that in mind,
> it might be easier to start with an emacs+ts prototype that always
> uses the full-file parse, and then adding the "sub-parses" later. In
> that sense, it seems like it's primarily a matter of what level of
> itch people want to start scratching when.
>
> emacs-dev islands discussion:
> https://lists.gnu.org/archive/html/emacs-devel/2016-04/msg00585.html
>
I strongly agree with what's said here. I'll also note that some
languages will not parse correctly if you narrow to (say) only a block
or part of a function and not a whole file, and it would be unexpected
by users to narrow to a part of their code and suddenly have things like
tree sitter go haywire. If I narrow to a part of a function, I'd like my
indentation and highlighting to remain correct.
My suggestion is that we at least experiment with allowing Tree Sitter
to see the whole file or just the narrowed parts, and see what works in
practice works better.
Perry
[-- Attachment #2: Type: text/html, Size: 3556 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 17:09 ` Yuan Fu
@ 2021-07-26 18:55 ` Eli Zaretskii
2021-07-26 19:06 ` Yuan Fu
2021-07-27 6:13 ` Stephen Leake
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-26 18:55 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 26 Jul 2021 13:09:13 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
>
> > Which complexity does it add? You just compare with BEGV_BYTE instead
> > of BEG_BYTE etc.
>
> We need to “delete” the hidden text and “re-insert” when we widen the buffer. I’ll try to make it a no-op as long as we remember to widen before calling tree-sitter to parse anything.
If some parser needs access to the whole buffer, its caller should
widen the buffer before calling the parser.
IOW, the control on which part of the buffer is visible to the parser
should be on the level of the caller of the parser, not at the level
of the function which accesses buffer text.
> > If we let TS look where it wants, we will lose the ability to restrict
> > it to a certain part of the buffer text. This is needed at least for
> > some specialized modes, and is generally desirable, as it gives Lisp
> > programs an easy way to impose such restrictions whenever they need.
>
> Tree-sitter lets you set ranges for a parser to limit it self within, in order to support multi-language files.
That's okay, but why would we want to expose this to Lisp as the means
to restrict the accessible portion, when we already have such a means?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 18:55 ` Eli Zaretskii
@ 2021-07-26 19:06 ` Yuan Fu
2021-07-26 19:19 ` Perry E. Metzger
2021-07-26 19:20 ` Eli Zaretskii
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-26 19:06 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, Stephen Leake, monnier, emacs-devel
>
>>> If we let TS look where it wants, we will lose the ability to restrict
>>> it to a certain part of the buffer text. This is needed at least for
>>> some specialized modes, and is generally desirable, as it gives Lisp
>>> programs an easy way to impose such restrictions whenever they need.
>>
>> Tree-sitter lets you set ranges for a parser to limit it self within, in order to support multi-language files.
>
> That's okay, but why would we want to expose this to Lisp as the means
> to restrict the accessible portion, when we already have such a means?
Tree-sitter lets you set multiple discontinuous ranges, whereas narrowing can only narrow to a single continuous range. Multiple discontinuous range is much more useful for HTML+CSS+JS, or PHP + HML.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 18:32 ` chad
2021-07-26 18:44 ` Perry E. Metzger
@ 2021-07-26 19:09 ` Eli Zaretskii
2021-07-26 19:48 ` chad
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-26 19:09 UTC (permalink / raw)
To: chad; +Cc: casouri, emacs-devel, stephen_leake, cpitclaudel, monnier
> From: chad <yandros@gmail.com>
> Date: Mon, 26 Jul 2021 11:32:23 -0700
> Cc: Yuan Fu <casouri@gmail.com>, Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stephen Leake <stephen_leake@stephe-leake.org>, Stefan Monnier <monnier@iro.umontreal.ca>,
> EMACS development team <emacs-devel@gnu.org>
>
> In order to exercise many of its over-all benefits, tree-sitter builds and maintains a parse tree of the whole
> file, and takes care to modify that tree in-place with minimized changes. If emacs+ts were to remove
> everything outside the narrow and then re-add it each time something temporarily narrowed a file (say, to
> enhance mental focus), then emacs+ts would be (wastefully) throwing away some of the underlying
> assumptions that makes ts useful.
>
> Emacs' internals use narrow/widen *and mostly honor them at most levels* because they are emacs'
> abstraction for separating parts of a buffer from other parts. Tree-sitter has a separate abstraction for doing
> this -- the developer can have ts use different internal objects for different parts of the file. This allows editors
> like Atom (a major influence on ts' original feature set) to support what emacs would call multiple major
> modes. (Emacs could still use some help here, c.f. Alan's proposal for "islands" from a few years back.)
We are mis-communicating. The issue is not _whether_ to allow TS to
access most or all of the buffer text, the issue is _on_what_level_
should this be controlled. All I'm saying is that the right level is
NOT the function which accesses buffer text, the right level is
higher. At that higher level, if some parser (or even almost every
parser) needs to access the entire buffer, some code should call
'widen'.
By contrast, if the text-reading function always treats the buffer as
widened, we will never be able to invoke a TS parser on a portion of
the text, something that is needed by specialized features. It makes
no sense to require those features to start using TS-specific means of
restricting access to portions of the buffer to that effect, when a
simple restriction is good enough and is already being used.
> Using the ts multiple parsers support inside emacs+ts to "handle" narrowing seems like a strong idea, but
> there are likely some complexities involved in "switching" back and forth between the full-file parse and the
> narrowed parse, plus making sure that the right parses are updated when the buffer changes. With that in
> mind, it might be easier to start with an emacs+ts prototype that always uses the full-file parse, and then
> adding the "sub-parses" later.
I disagree. The cost of having the text-reading function look only
inside the restriction is very small: a single call to 'widen' in the
caller. The cost of having that function ignore the restriction is
much higher.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 18:44 ` Perry E. Metzger
@ 2021-07-26 19:13 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-26 19:13 UTC (permalink / raw)
To: Perry E. Metzger; +Cc: emacs-devel
> Date: Mon, 26 Jul 2021 14:44:24 -0400
> From: "Perry E. Metzger" <perry@piermont.com>
>
> My suggestion is that we at least experiment with allowing Tree Sitter to see the whole file or just the
> narrowed parts, and see what works in practice works better.
No one suggested anything to the contrary. The callers of TS which
need it to access the entire buffer should call 'widen', and that's
it.
This is a tempest in a teapot.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 19:06 ` Yuan Fu
@ 2021-07-26 19:19 ` Perry E. Metzger
2021-07-26 19:31 ` Eli Zaretskii
2021-07-26 19:20 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Perry E. Metzger @ 2021-07-26 19:19 UTC (permalink / raw)
To: emacs-devel
On 7/26/21 15:06, Yuan Fu wrote:
> Tree-sitter lets you set multiple discontinuous ranges, whereas
> narrowing can only narrow to a single continuous range. Multiple
> discontinuous range is much more useful for HTML+CSS+JS, or PHP + HML.
Other obvious uses: restructured text or markdown documentation amidst
code in another language, various sorts of literate programming, etc.
(This of course brings up that someday it might be nice to have Emacs
aware of such multi-modal text and able to switch how you're editing
even inside a single file, but that's a bigger topic.)
Perry
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 19:06 ` Yuan Fu
2021-07-26 19:19 ` Perry E. Metzger
@ 2021-07-26 19:20 ` Eli Zaretskii
2021-07-26 19:45 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-26 19:20 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 26 Jul 2021 15:06:14 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
>
> >
> >>> If we let TS look where it wants, we will lose the ability to restrict
> >>> it to a certain part of the buffer text. This is needed at least for
> >>> some specialized modes, and is generally desirable, as it gives Lisp
> >>> programs an easy way to impose such restrictions whenever they need.
> >>
> >> Tree-sitter lets you set ranges for a parser to limit it self within, in order to support multi-language files.
> >
> > That's okay, but why would we want to expose this to Lisp as the means
> > to restrict the accessible portion, when we already have such a means?
>
> Tree-sitter lets you set multiple discontinuous ranges, whereas narrowing can only narrow to a single continuous range. Multiple discontinuous range is much more useful for HTML+CSS+JS, or PHP + HML.
I understand. But forcing various Emacs features to use these ranges
where a simple restriction will do makes little sense.
Last time something like these discontinuous ranges was discussed as a
general feature in Emacs, we couldn't come up with an agreed-upon
design and implementation. So adding something like that to Emacs is
not an easy job.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 19:19 ` Perry E. Metzger
@ 2021-07-26 19:31 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-26 19:31 UTC (permalink / raw)
To: Perry E. Metzger; +Cc: emacs-devel
> Date: Mon, 26 Jul 2021 15:19:20 -0400
> From: "Perry E. Metzger" <perry@piermont.com>
>
> Other obvious uses: restructured text or markdown documentation amidst
> code in another language, various sorts of literate programming, etc.
We should, of course, support these features. But their support
should be controlled by Lisp programs, not be hard-coded in some
low-level C code. The way to access discontinuous ranges of buffer
text as a single character sequence needs support in Emacs Lisp before
we can map it to the equivalent TS features.
> (This of course brings up that someday it might be nice to have Emacs
> aware of such multi-modal text and able to switch how you're editing
> even inside a single file, but that's a bigger topic.)
We have the beginning of this, but have a lot more turf to cover. And
currently, what we have uses restrictions.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 19:20 ` Eli Zaretskii
@ 2021-07-26 19:45 ` Yuan Fu
2021-07-26 19:57 ` Dmitry Gutov
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-26 19:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, Stephen Leake, monnier, emacs-devel
>>>>> If we let TS look where it wants, we will lose the ability to restrict
>>>>> it to a certain part of the buffer text. This is needed at least for
>>>>> some specialized modes, and is generally desirable, as it gives Lisp
>>>>> programs an easy way to impose such restrictions whenever they need.
>>>>
>>>> Tree-sitter lets you set ranges for a parser to limit it self within, in order to support multi-language files.
>>>
>>> That's okay, but why would we want to expose this to Lisp as the means
>>> to restrict the accessible portion, when we already have such a means?
>>
>> Tree-sitter lets you set multiple discontinuous ranges, whereas narrowing can only narrow to a single continuous range. Multiple discontinuous range is much more useful for HTML+CSS+JS, or PHP + HML.
>
> I understand. But forcing various Emacs features to use these ranges
> where a simple restriction will do makes little sense.
>
> Last time something like these discontinuous ranges was discussed as a
> general feature in Emacs, we couldn't come up with an agreed-upon
> design and implementation. So adding something like that to Emacs is
> not an easy job.
We can provide both. Those who needs the more powerful ranges could use that, and those who don’t can use narrowing.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 19:09 ` Eli Zaretskii
@ 2021-07-26 19:48 ` chad
2021-07-26 20:05 ` Óscar Fuentes
2021-07-27 13:59 ` Eli Zaretskii
0 siblings, 2 replies; 370+ messages in thread
From: chad @ 2021-07-26 19:48 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, EMACS development team, Stephen Leake,
Clément Pit-Claudel, Stefan Monnier
[-- Attachment #1: Type: text/plain, Size: 998 bytes --]
I think I understand your point, and I agree that it would be ill-advised
to remove the ability to change the "scope" in question from lisp's
control. What I'm trying to say (and I think Yuan Fu is also suggesting) is
that while emacs necessarily has *one* view of the buffer, narrowed or not,
tree-sitter might want to maintain multiple trees of that buffer, with the
default being the same as emacs' widened view, and narrowed views being
separate parse trees created as needed. I'm suggesting this as an
alternative to having emacs+ts effectively throw away most of the parse
tree every time the user narrows, then have to re-build it on each widen.
In other words, I think this might be a communication issue about the
default trade-off behavior: generally keep a fully-widened tree and
create/use narrowed trees as needed, versus generally keeping only a tree
that matches whatever a higher-level view of the buffer might give at any
one moment, rebuilding each time.
Hope that helps,
~Chad
[-- Attachment #2: Type: text/html, Size: 1104 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 19:45 ` Yuan Fu
@ 2021-07-26 19:57 ` Dmitry Gutov
0 siblings, 0 replies; 370+ messages in thread
From: Dmitry Gutov @ 2021-07-26 19:57 UTC (permalink / raw)
To: Yuan Fu, Eli Zaretskii; +Cc: cpitclaudel, Stephen Leake, monnier, emacs-devel
On 26.07.2021 22:45, Yuan Fu wrote:
>> Last time something like these discontinuous ranges was discussed as a
>> general feature in Emacs, we couldn't come up with an agreed-upon
>> design and implementation. So adding something like that to Emacs is
>> not an easy job.
> We can provide both. Those who needs the more powerful ranges could use that, and those who don’t can use narrowing.
If one wanted to continue where the previous discussions stopped, we
tentatively decided that the variable prog-indentation-context could
help. I.e. when some multiple-major-mode framework wanted to tell the
current major mode that there are more "ranges" of the same mode in the
buffer, it would bind prog-indentation-context to some particular value.
It's very much "to be discussed later", but the second element of
prog-indentation-context can be a list of those ranges, or, more likely,
a functions that produces that list.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 19:48 ` chad
@ 2021-07-26 20:05 ` Óscar Fuentes
2021-07-26 21:30 ` Clément Pit-Claudel
2021-07-27 14:02 ` Eli Zaretskii
2021-07-27 13:59 ` Eli Zaretskii
1 sibling, 2 replies; 370+ messages in thread
From: Óscar Fuentes @ 2021-07-26 20:05 UTC (permalink / raw)
To: emacs-devel
chad <yandros@gmail.com> writes:
> I think I understand your point, and I agree that it would be ill-advised
> to remove the ability to change the "scope" in question from lisp's
> control. What I'm trying to say (and I think Yuan Fu is also suggesting) is
> that while emacs necessarily has *one* view of the buffer, narrowed or not,
> tree-sitter might want to maintain multiple trees of that buffer, with the
> default being the same as emacs' widened view, and narrowed views being
> separate parse trees created as needed. I'm suggesting this as an
> alternative to having emacs+ts effectively throw away most of the parse
> tree every time the user narrows, then have to re-build it on each widen.
>
> In other words, I think this might be a communication issue about the
> default trade-off behavior: generally keep a fully-widened tree and
> create/use narrowed trees as needed, versus generally keeping only a tree
> that matches whatever a higher-level view of the buffer might give at any
> one moment, rebuilding each time.
IIUC this is not about the user doing M-x narrow-to-defun or somesuch,
we can agree that (in general) the right thing to do for TS is to keep
using the whole buffer.
I think Eli is trying to control what TS sees because doing so would
make possible some features and/or simplify implementing them. Think of
an Org file with some code blocks. It makes no sense to expose the whole
Org file to TS and, I guess, it would just complicate things for no
benefit. On that scenario, it might make sense to deal with the code
blocks as independent entities instead of parts of something else.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 20:05 ` Óscar Fuentes
@ 2021-07-26 21:30 ` Clément Pit-Claudel
2021-07-26 21:46 ` Óscar Fuentes
2021-07-27 14:02 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Clément Pit-Claudel @ 2021-07-26 21:30 UTC (permalink / raw)
To: emacs-devel
On 7/26/21 4:05 PM, Óscar Fuentes wrote:
> Think of an Org file with some code blocks. It makes no sense to
> expose the whole Org file to TS and, I guess, it would just
> complicate things for no benefit. On that scenario, it might make
> sense to deal with the code blocks as independent entities instead of
> parts of something else.
Isn't this example actually in favor of *not* narrowing before giving the buffer to TS? Consecutive org code blocks often build upon each other, so you'd want to give the whole buffer to TS and restrict its analysis to just the code blocks (multiple disjoint ranges), a results that you couldn't achieve with narrowing.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 21:30 ` Clément Pit-Claudel
@ 2021-07-26 21:46 ` Óscar Fuentes
0 siblings, 0 replies; 370+ messages in thread
From: Óscar Fuentes @ 2021-07-26 21:46 UTC (permalink / raw)
To: emacs-devel
Clément Pit-Claudel <cpitclaudel@gmail.com> writes:
> On 7/26/21 4:05 PM, Óscar Fuentes wrote:
>> Think of an Org file with some code blocks. It makes no sense to
>> expose the whole Org file to TS and, I guess, it would just
>> complicate things for no benefit. On that scenario, it might make
>> sense to deal with the code blocks as independent entities instead of
>> parts of something else.
>
> Isn't this example actually in favor of *not* narrowing before giving
> the buffer to TS? Consecutive org code blocks often build upon each
> other, so you'd want to give the whole buffer to TS and restrict its
> analysis to just the code blocks (multiple disjoint ranges), a results
> that you couldn't achieve with narrowing.
I don't know from where you got your "often" :-) Of course it is a
possibility, but I've seen plenty of Org files containing loosely
related code snippets (and those who build on each other tend to be
written on dynamic languages, which benefit a lot less from static code
analysis.)
As far as my personal experience goes, I very much prefer that each
block is treated indepently, because my Org files contain code recipes
for specialized tasks, bug reproducers, multiple variations of
experiments, etc. They are not related at all.
So it is desirable to support both modes well.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 16:40 ` Yuan Fu
2021-07-26 16:49 ` Eli Zaretskii
@ 2021-07-26 23:40 ` Ergus
2021-07-27 14:49 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Ergus @ 2021-07-26 23:40 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Clément Pit-Claudel, Stephen Leake,
Stefan Monnier, emacs-devel
On Mon, Jul 26, 2021 at 12:40:31PM -0400, Yuan Fu wrote:
>>
>>>>> unless the narrowing is for multi-major-mode.
>>>>
>>>> And what would you do in that case, if you allow TS to look beyond the
>>>> restriction?
>>>
>>> In the multi-major-mode case, there is a separate parser for each
>>> language, and each sub-mode region in the text would get its own parser
>>> tree (ie, it acts like a separate file), and that parser tree is only
>>> told about changes to those regions. So the parser will never try to
>>> look outside the region; it doesn't need to know about narrowing.
>>
>> Once again, we are talking about the function used by TS to read
>> buffer text. Not about the parser or its caller. Low-level code,
>> which knows nothing about the context, should never look beyond the
>> restriction.
>
>It doesn’t harm for tree-sitter to see the rest of the buffer, it
>doesn’t modify anything, all it does it reading the text. OTOH,
>restricting tree-sitter to the bounds of narrows adds complexity for no
>benefit (as far as I can see). Maybe narrowing is the context that low
>level code should ignore, or at least tree-sitter should ignore. The
>only benefit that I can think of is “we firmly adhere to the ‘contract’
>that no one can look beyond the narrowed region”, but is it a good
>contract? Is there really a contract in the first place? IMO, narrowing
>acts like masking tapes over the rest of the buffer, so that user edits
>like re-replace wouldn’t spill out. Demanding everything in Emacs to
>not have access to the rest of the buffer is dogmatic (in the sense
>that it is too rigid and is simply following the doctrine blindly).
>
Hi Yuan:
From my absolute ignorance on tree_sitter and your changes. There is a
function ts_parser_set_included_ranges that is a way I used once to
reduce the parsing region and improve (notably) the performance in a
test api.
Can't narrow regions use that? I think it is the same idea but I am
probably wrong.
Limiting the region to parse to the modified region (that in emacs may
be known thanks to the gap and maybe the undo-tree) and using the output
tree from the previous parse as the `old_tree` parameter in
ts_parser_parse_string made tree_sitter incredibly fast in my case (and
useful to run it on every key press).
In my case using old_tree reduced the time by a factor of 10 in a big
source file; and limiting the parser to the "changed" region only made
it almost instantly in more than 80% of the executions with small
modifications. (I repeat; it was a much simpler use case)
>And about language definitions and font-locking, I just realized that
>tree-sitter language definitions provides highlighting patterns, and we
>only need to minimally modify them to use them for Emacs, so there
>aren’t much manual effort involved.
>
I think tree-sitter has many more language definitions than Emacs in
some languages, and probably we may want to properly support them. So
maybe: instead of just modifying what is on tree-sitter to make it
similar to what emacs currently has; we could just use the node's
syntactic information and then let emacs use it adding more faces if
needed... Does it makes sense?
The idea is to have real syntactic information on the text itself
because that may help in the future to implement indentation and
navigation commands using three-sitter's information (commands like
up-list or forward-sexp) will be the equivalent to
ts_tree_cursor_goto_parent or ts_tree_cursor_goto_next_sibling.
>Also, anyone have thoughts on how should tree-sitter intergrate with
>font-lock beyond the current simple interface?
>
No idea, but in my experience the most efficient way to traverse a
tree-sitter tree is with ts_tree_cursor but maybe for font-lock the best
is just to use ts_tree_get_changed_ranges.
>Yuan
Best,
Ergus
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 17:09 ` Yuan Fu
2021-07-26 18:55 ` Eli Zaretskii
@ 2021-07-27 6:13 ` Stephen Leake
2021-07-27 14:56 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-27 6:13 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, cpitclaudel, monnier
Yuan Fu <casouri@gmail.com> writes:
>>>
>>>> Once again, we are talking about the function used by TS to read
>>>> buffer text. Not about the parser or its caller. Low-level code,
>>>> which knows nothing about the context, should never look beyond the
>>>> restriction.
>>>
>>> It doesn’t harm for tree-sitter to see the rest of the buffer, it
>>> doesn’t modify anything, all it does it reading the text. OTOH,
>>> restricting tree-sitter to the bounds of narrows adds complexity
>>> for no benefit (as far as I can see).
>>
>> Which complexity does it add? You just compare with BEGV_BYTE instead
>> of BEG_BYTE etc.
>
> We need to “delete” the hidden text and “re-insert” when we widen the
> buffer. I’ll try to make it a no-op as long as we remember to widen
> before calling tree-sitter to parse anything.
First, the only thing TS deletes is tree nodes, not text; it does not
have a copy of the buffer.
Why do you think we need to delete the tree nodes corresponding to the
hidden text? They provide exactly the context needed to parse the
visible text properly.
This assumes the narrowing is temporary, not for a multi-major-mode.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 19:48 ` chad
2021-07-26 20:05 ` Óscar Fuentes
@ 2021-07-27 13:59 ` Eli Zaretskii
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-27 13:59 UTC (permalink / raw)
To: chad; +Cc: casouri, emacs-devel, stephen_leake, cpitclaudel, monnier
> From: chad <yandros@gmail.com>
> Date: Mon, 26 Jul 2021 12:48:00 -0700
> Cc: Yuan Fu <casouri@gmail.com>, Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stephen Leake <stephen_leake@stephe-leake.org>, Stefan Monnier <monnier@iro.umontreal.ca>,
> EMACS development team <emacs-devel@gnu.org>
>
> I think I understand your point, and I agree that it would be ill-advised to remove the ability to change the
> "scope" in question from lisp's control. What I'm trying to say (and I think Yuan Fu is also suggesting) is that
> while emacs necessarily has *one* view of the buffer, narrowed or not, tree-sitter might want to maintain
> multiple trees of that buffer, with the default being the same as emacs' widened view, and narrowed views
> being separate parse trees created as needed.
Lisp programs which use TS in a way that causes TS to store such
multiple views will have to widen the buffer at strategic places (when
TS needs access to buffer text).
> I'm suggesting this as an alternative to having emacs+ts
> effectively throw away most of the parse tree every time the user narrows, then have to re-build it on each
> widen.
No one says that when Emacs narrows a buffer for some reason, we need
to communicate that immediately to TS. If the restriction is
ephemeral, it will most probably be lifted by the time we need to
update TS with the editing changes.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 20:05 ` Óscar Fuentes
2021-07-26 21:30 ` Clément Pit-Claudel
@ 2021-07-27 14:02 ` Eli Zaretskii
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-27 14:02 UTC (permalink / raw)
To: Óscar Fuentes; +Cc: emacs-devel
> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Mon, 26 Jul 2021 22:05:58 +0200
>
> I think Eli is trying to control what TS sees because doing so would
> make possible some features and/or simplify implementing them.
Yes, exactly.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-26 23:40 ` Ergus
@ 2021-07-27 14:49 ` Yuan Fu
2021-07-27 16:50 ` Ergus
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-27 14:49 UTC (permalink / raw)
To: Ergus
Cc: Eli Zaretskii, emacs-devel, Stephen Leake,
Clément Pit-Claudel, Stefan Monnier
>
> From my absolute ignorance on tree_sitter and your changes. There is a
> function ts_parser_set_included_ranges that is a way I used once to
> reduce the parsing region and improve (notably) the performance in a
> test api.
>
> Can't narrow regions use that? I think it is the same idea but I am
> probably wrong.
We could use ts_parser_set_included_ranges to implement narrowing, but that would limit the usefulness of ts_parser_set_included_ranges: ts_parser_set_included_ranges allows us to set multiple discontinuous ranges, and narrowing only allows us to narrow to a single continuous range. Therefore I’d like to expose ts_parser_set_included_ranges in a separate function.
>
> Limiting the region to parse to the modified region (that in emacs may
> be known thanks to the gap and maybe the undo-tree) and using the output
> tree from the previous parse as the `old_tree` parameter in
> ts_parser_parse_string made tree_sitter incredibly fast in my case (and
> useful to run it on every key press).
Interesting, the official documentation doesn’t mention that trick. It only tells me to re-parse with the old tree. If I limit the range to the modified region before re-parse, re-parse, do I get the tree for the entire buffer, or do I only get the tree of the limited range?
>
> In my case using old_tree reduced the time by a factor of 10 in a big
> source file; and limiting the parser to the "changed" region only made
> it almost instantly in more than 80% of the executions with small
> modifications. (I repeat; it was a much simpler use case)
>
>> And about language definitions and font-locking, I just realized that
>> tree-sitter language definitions provides highlighting patterns, and we
>> only need to minimally modify them to use them for Emacs, so there
>> aren’t much manual effort involved.
>>
> I think tree-sitter has many more language definitions than Emacs in
> some languages, and probably we may want to properly support them. So
> maybe: instead of just modifying what is on tree-sitter to make it
> similar to what emacs currently has; we could just use the node's
> syntactic information and then let emacs use it adding more faces if
> needed... Does it makes sense?
The current code does the latter, if I understand you correctly.
> The idea is to have real syntactic information on the text itself
> because that may help in the future to implement indentation and
> navigation commands using three-sitter's information (commands like
> up-list or forward-sexp) will be the equivalent to
> ts_tree_cursor_goto_parent or ts_tree_cursor_goto_next_sibling.
You mean adding syntactic information to the text as text properties? That’s an interesting idea, maybe that’s easier to use than using tree-sitter’s api.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-27 6:13 ` Stephen Leake
@ 2021-07-27 14:56 ` Yuan Fu
2021-07-28 3:40 ` Stephen Leake
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-27 14:56 UTC (permalink / raw)
To: Stephen Leake
Cc: Eli Zaretskii, Clément Pit-Claudel, monnier, emacs-devel
>>
>> We need to “delete” the hidden text and “re-insert” when we widen the
>> buffer. I’ll try to make it a no-op as long as we remember to widen
>> before calling tree-sitter to parse anything.
>
> First, the only thing TS deletes is tree nodes, not text; it does not
> have a copy of the buffer.
>
> Why do you think we need to delete the tree nodes corresponding to the
> hidden text? They provide exactly the context needed to parse the
> visible text properly.
I don’t think we need to, but I assume that tree-sitter will delete the corresponding nodes if we hide the text from it. For us, the text is there, just hidden; for tree-sitter, the text is deleted.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-27 14:49 ` Yuan Fu
@ 2021-07-27 16:50 ` Ergus
2021-07-27 16:59 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Ergus @ 2021-07-27 16:50 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Clément Pit-Claudel, Stephen Leake,
Stefan Monnier, emacs-devel
On Tue, Jul 27, 2021 at 10:49:44AM -0400, Yuan Fu wrote:
>
>We could use ts_parser_set_included_ranges to implement narrowing, but
>that would limit the usefulness of ts_parser_set_included_ranges:
>ts_parser_set_included_ranges allows us to set multiple discontinuous
>ranges, and narrowing only allows us to narrow to a single continuous
>range.
I agree here.
>Therefore I’d like to expose ts_parser_set_included_ranges in a
>separate function.
>
Exposed in the lisp side could make sense then.
>
>Interesting, the official documentation doesn’t mention that trick. It
>only tells me to re-parse with the old tree. If I limit the range to
>the modified region before re-parse, re-parse, do I get the tree for
>the entire buffer, or do I only get the tree of the limited range?
>
It worked for me; but it was a much simpler use case; maybe in the
general case it breaks. I think the only way to know is to try it.
Any way the official documentation suggests to use
ts_tree_get_changed_ranges.
>
>The current code does the latter, if I understand you correctly.
>
>
>You mean adding syntactic information to the text as text properties?
>That’s an interesting idea, maybe that’s easier to use than using
>tree-sitter’s api.
>
I think that was the initial Eli's idea when this topic came out. But
maybe I understood it wrongly.
Theoretically in a re-parse doing ts_tree_get_changed_ranges will give
the list of changes needed in the whole text, so updating properties
there may be simpler and cheap (even when they are not in the visible
part of the buffer).
Also, any action that doesn't modify the text (scrolling, moving the
cursor, windows split/resize) won't call any tree-sitter and redisplay
could handle almost everything easily on the beginning.
The only concern here may be that adding properties to the entire text
may be memory consuming. Or maybe that this could overlap part of the
font-lock functionality.
But probably Eli can make a more accurate critic of this idea..
>Yuan
Very thanks for doing this!
Ergus.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-27 16:50 ` Ergus
@ 2021-07-27 16:59 ` Eli Zaretskii
2021-07-28 3:45 ` Stephen Leake
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-27 16:59 UTC (permalink / raw)
To: Ergus; +Cc: casouri, emacs-devel, stephen_leake, cpitclaudel, monnier
> Date: Tue, 27 Jul 2021 18:50:40 +0200
> From: Ergus <spacibba@aol.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
>
> On Tue, Jul 27, 2021 at 10:49:44AM -0400, Yuan Fu wrote:
>
> >You mean adding syntactic information to the text as text properties?
> >That’s an interesting idea, maybe that’s easier to use than using
> >tree-sitter’s api.
> >
> I think that was the initial Eli's idea when this topic came out. But
> maybe I understood it wrongly.
>
> Theoretically in a re-parse doing ts_tree_get_changed_ranges will give
> the list of changes needed in the whole text, so updating properties
> there may be simpler and cheap (even when they are not in the visible
> part of the buffer).
>
> Also, any action that doesn't modify the text (scrolling, moving the
> cursor, windows split/resize) won't call any tree-sitter and redisplay
> could handle almost everything easily on the beginning.
>
> The only concern here may be that adding properties to the entire text
> may be memory consuming. Or maybe that this could overlap part of the
> font-lock functionality.
>
> But probably Eli can make a more accurate critic of this idea..
Storing the syntactic information as text properties has definite
advantages: easy access, use of well-known Emacs Lisp features, etc.
I don't feel I know enough about this use of the properties to have a
definitive opinion, though. We should probably try that.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-27 14:56 ` Yuan Fu
@ 2021-07-28 3:40 ` Stephen Leake
2021-07-28 16:36 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-28 3:40 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, Clément Pit-Claudel, monnier, emacs-devel
Yuan Fu <casouri@gmail.com> writes:
>>>
>>> We need to “delete” the hidden text and “re-insert” when we widen the
>>> buffer. I’ll try to make it a no-op as long as we remember to widen
>>> before calling tree-sitter to parse anything.
>>
>> First, the only thing TS deletes is tree nodes, not text; it does not
>> have a copy of the buffer.
>>
>> Why do you think we need to delete the tree nodes corresponding to the
>> hidden text? They provide exactly the context needed to parse the
>> visible text properly.
>
> I don’t think we need to, but I assume that tree-sitter will delete
> the corresponding nodes if we hide the text from it.
No, tree-sitter only deletes nodes that cover changes.
So don't send a change that deletes the hidden text; just send changes
in the visible part of the text (that's the only place the user can make
changes). tree-sitter will only run the scanner on the change regions,
so it will only request text from the visible part of the buffer;
all the requests will succeed.
> For us, the text is there, just hidden; for tree-sitter, the text is
> deleted.
No, it simply won't notice that it can't access that part of the buffer,
because it will never try.
What, exactly, will the buffer-text fetch code do if tree-sitter
violates the narrowing (by some error in tree-sitter or user code)?
throw an exception? return a null string?
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-27 16:59 ` Eli Zaretskii
@ 2021-07-28 3:45 ` Stephen Leake
0 siblings, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-28 3:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Ergus, emacs-devel, cpitclaudel, casouri, monnier
Eli Zaretskii <eliz@gnu.org> writes:
> Storing the syntactic information as text properties has definite
> advantages: easy access, use of well-known Emacs Lisp features, etc.
> I don't feel I know enough about this use of the properties to have a
> definitive opinion, though. We should probably try that.
ada-mode does this now, via wisi. It marks each name that might be used
for cross-reference with a "name" property; the start and end of each
procedure and statement with "start/end" properties, so it is easy to
jump there; other similar things.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 3:40 ` Stephen Leake
@ 2021-07-28 16:36 ` Yuan Fu
2021-07-28 16:41 ` Eli Zaretskii
2021-07-28 16:43 ` Eli Zaretskii
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-28 16:36 UTC (permalink / raw)
To: Stephen Leake
Cc: Eli Zaretskii, Clément Pit-Claudel, monnier, emacs-devel
> On Jul 27, 2021, at 11:40 PM, Stephen Leake <stephen_leake@stephe-leake.org> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>>>>
>>>> We need to “delete” the hidden text and “re-insert” when we widen the
>>>> buffer. I’ll try to make it a no-op as long as we remember to widen
>>>> before calling tree-sitter to parse anything.
>>>
>>> First, the only thing TS deletes is tree nodes, not text; it does not
>>> have a copy of the buffer.
>>>
>>> Why do you think we need to delete the tree nodes corresponding to the
>>> hidden text? They provide exactly the context needed to parse the
>>> visible text properly.
>>
>> I don’t think we need to, but I assume that tree-sitter will delete
>> the corresponding nodes if we hide the text from it.
>
> No, tree-sitter only deletes nodes that cover changes.
>
> So don't send a change that deletes the hidden text; just send changes
> in the visible part of the text (that's the only place the user can make
> changes). tree-sitter will only run the scanner on the change regions,
> so it will only request text from the visible part of the buffer;
> all the requests will succeed.
Then we are not hiding the hidden text from tree-sitter. The implementation you described, IIUC, is essentially do nothing special when the buffer is narrowed.
>
>> For us, the text is there, just hidden; for tree-sitter, the text is
>> deleted.
>
> No, it simply won't notice that it can't access that part of the buffer,
> because it will never try.
>
> What, exactly, will the buffer-text fetch code do if tree-sitter
> violates the narrowing (by some error in tree-sitter or user code)?
> throw an exception? return a null string?
In my current implementation, if tree-sitter access buffer content outside of narrowed region, it reads whitespaces, if it access buffer content outside of the buffer, it reads null string. Neither case throws an error.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 16:36 ` Yuan Fu
@ 2021-07-28 16:41 ` Eli Zaretskii
2021-07-29 22:58 ` Stephen Leake
2021-07-28 16:43 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-28 16:41 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 28 Jul 2021 12:36:33 -0400
> Cc: Eli Zaretskii <eliz@gnu.org>,
> emacs-devel <emacs-devel@gnu.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> monnier@iro.umontreal.ca
>
> > So don't send a change that deletes the hidden text; just send changes
> > in the visible part of the text (that's the only place the user can make
> > changes). tree-sitter will only run the scanner on the change regions,
> > so it will only request text from the visible part of the buffer;
> > all the requests will succeed.
>
> Then we are not hiding the hidden text from tree-sitter. The implementation you described, IIUC, is essentially do nothing special when the buffer is narrowed.
If the TS parser is called while the narrowing is in effect, it will
be unable to access text beyond BEGV..ZV. So in that case the
narrowing _will_ affect TS.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 16:36 ` Yuan Fu
2021-07-28 16:41 ` Eli Zaretskii
@ 2021-07-28 16:43 ` Eli Zaretskii
2021-07-28 17:47 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-28 16:43 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 28 Jul 2021 12:36:33 -0400
> Cc: Eli Zaretskii <eliz@gnu.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> monnier@iro.umontreal.ca, emacs-devel <emacs-devel@gnu.org>
>
> > What, exactly, will the buffer-text fetch code do if tree-sitter
> > violates the narrowing (by some error in tree-sitter or user code)?
> > throw an exception? return a null string?
>
> In my current implementation, if tree-sitter access buffer content outside of narrowed region, it reads whitespaces, if it access buffer content outside of the buffer, it reads null string. Neither case throws an error.
What does TS expect the reader function to return when it hits the
beginning or end of buffer text? I think we should behave the same
when it tries to go beyond the accessible portion. There should be no
difference between going beyond the restriction and going beyond EOB.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 16:43 ` Eli Zaretskii
@ 2021-07-28 17:47 ` Yuan Fu
2021-07-28 17:54 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-28 17:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, Stephen Leake, monnier, emacs-devel
> On Jul 28, 2021, at 12:43 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Wed, 28 Jul 2021 12:36:33 -0400
>> Cc: Eli Zaretskii <eliz@gnu.org>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> monnier@iro.umontreal.ca, emacs-devel <emacs-devel@gnu.org>
>>
>>> What, exactly, will the buffer-text fetch code do if tree-sitter
>>> violates the narrowing (by some error in tree-sitter or user code)?
>>> throw an exception? return a null string?
>>
>> In my current implementation, if tree-sitter access buffer content outside of narrowed region, it reads whitespaces, if it access buffer content outside of the buffer, it reads null string. Neither case throws an error.
>
> What does TS expect the reader function to return when it hits the
> beginning or end of buffer text? I think we should behave the same
> when it tries to go beyond the accessible portion. There should be no
> difference between going beyond the restriction and going beyond EOB.
>
It expect the read function set *read_bytes to 0 when it reached the end of the buffer. Tree-sitter never “hit the beginning of the buffer text” because it doesn’t read backward. I’m pretty sure tree-sitter expects to always be able to read from BOB.
Could you describe the desired effect on tree-sitter when the buffer is narrowed? If we just deny accessibility of the hidden region from tree-sitter, tree-sitter is still aware of the hidden text, because it has previously parsed the hidden text and stored the result in the parse tree.
My current implementation is to “replace” the hidden region with whitespaces. When the buffer is narrowed and tree-sitter is asked to re-parse (by some user command), I tell tree-sitter that the hidden portion of the buffer has changed, then during the re-parse, tree-sitter will re-scan those parts, and reads whitespaces.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 17:47 ` Yuan Fu
@ 2021-07-28 17:54 ` Eli Zaretskii
2021-07-28 18:46 ` Yuan Fu
2021-07-29 23:01 ` Stephen Leake
0 siblings, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-28 17:54 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 28 Jul 2021 13:47:42 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
>
> Could you describe the desired effect on tree-sitter when the buffer is narrowed?
The behavior should be the same as if the text before and after the
narrowed region didn't exist.
If we just deny accessibility of the hidden region from tree-sitter, tree-sitter is still aware of the hidden text, because it has previously parsed the hidden text and stored the result in the parse tree.
The adherence to narrowing is for the use cases where TS is _always_
invoked on the same narrowed region. You seem to be thinking about
changes in the narrowing while TS is parsing, or between consecutive
re-parsing calls, but I see no interesting/important use cases which
would need to do that. And if there are some tricky cases which do
need this, the respective Lisp programs will have to deal with the
problem.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 17:54 ` Eli Zaretskii
@ 2021-07-28 18:46 ` Yuan Fu
2021-07-28 19:00 ` Eli Zaretskii
` (2 more replies)
2021-07-29 23:01 ` Stephen Leake
1 sibling, 3 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-28 18:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, Stephen Leake, monnier, emacs-devel
> On Jul 28, 2021, at 1:54 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Wed, 28 Jul 2021 13:47:42 -0400
>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
>> cpitclaudel@gmail.com,
>> monnier@iro.umontreal.ca,
>> emacs-devel@gnu.org
>>
>> Could you describe the desired effect on tree-sitter when the buffer is narrowed?
>
> The behavior should be the same as if the text before and after the
> narrowed region didn't exist.
>
> If we just deny accessibility of the hidden region from tree-sitter, tree-sitter is still aware of the hidden text, because it has previously parsed the hidden text and stored the result in the parse tree.
>
> The adherence to narrowing is for the use cases where TS is _always_
> invoked on the same narrowed region. You seem to be thinking about
> changes in the narrowing while TS is parsing, or between consecutive
> re-parsing calls, but I see no interesting/important use cases which
> would need to do that. And if there are some tricky cases which do
> need this, the respective Lisp programs will have to deal with the
> problem.
That makes sense. However it bring up a problem. Consider such a buffer: XXAAXX. Say lisp narrows to AA and creates a tree-sitter parser. Then lisp widens the buffer, and user inserts B in front of AA. Now the buffer is XXBAAXX. Emacs has two options to convey this change to the tree-sitter parser: 1) it does not, then tree-sitter still thinks the buffer is AA, essentially the portion where tree-sitter sees is pushed forward by one character, 2) it tells tree-sitter the user inserted a character at the beginning, then tree-sitter thinks the buffer is BAA. Which option is correct depends on how does lisp later narrows: if lisp narrows to AA, then option 1 is correct, if lisp narrows to BAA, then option 2 is correct. But how do we know which option is correct before lisp narrows?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 18:46 ` Yuan Fu
@ 2021-07-28 19:00 ` Eli Zaretskii
2021-07-29 14:35 ` Yuan Fu
2021-07-29 23:06 ` How to add pseudo vector types Stephen Leake
2021-07-30 0:35 ` Richard Stallman
2 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-28 19:00 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 28 Jul 2021 14:46:03 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
>
> > The adherence to narrowing is for the use cases where TS is _always_
> > invoked on the same narrowed region. You seem to be thinking about
> > changes in the narrowing while TS is parsing, or between consecutive
> > re-parsing calls, but I see no interesting/important use cases which
> > would need to do that. And if there are some tricky cases which do
> > need this, the respective Lisp programs will have to deal with the
> > problem.
>
> That makes sense. However it bring up a problem. Consider such a buffer: XXAAXX. Say lisp narrows to AA and creates a tree-sitter parser. Then lisp widens the buffer, and user inserts B in front of AA. Now the buffer is XXBAAXX. Emacs has two options to convey this change to the tree-sitter parser: 1) it does not, then tree-sitter still thinks the buffer is AA, essentially the portion where tree-sitter sees is pushed forward by one character, 2) it tells tree-sitter the user inserted a character at the beginning, then tree-sitter thinks the buffer is BAA. Which option is correct depends on how does lisp later narrows: if lisp narrows to AA, then option 1 is correct, if lisp narrows to BAA, then option 2 is correct. But how do we know which option is correct before lisp narrows?
We don't need to know. The Lisp program which needs to handle this
situation will have to figure out what is right in that case, "right"
in the sense that it produces the desired results after communicating
the changes to TS.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 19:00 ` Eli Zaretskii
@ 2021-07-29 14:35 ` Yuan Fu
2021-07-29 15:28 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-29 14:35 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Stephen Leake, Stefan Monnier,
emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1728 bytes --]
>>
>> That makes sense. However it bring up a problem. Consider such a buffer: XXAAXX. Say lisp narrows to AA and creates a tree-sitter parser. Then lisp widens the buffer, and user inserts B in front of AA. Now the buffer is XXBAAXX. Emacs has two options to convey this change to the tree-sitter parser: 1) it does not, then tree-sitter still thinks the buffer is AA, essentially the portion where tree-sitter sees is pushed forward by one character, 2) it tells tree-sitter the user inserted a character at the beginning, then tree-sitter thinks the buffer is BAA. Which option is correct depends on how does lisp later narrows: if lisp narrows to AA, then option 1 is correct, if lisp narrows to BAA, then option 2 is correct. But how do we know which option is correct before lisp narrows?
>
> We don't need to know. The Lisp program which needs to handle this
> situation will have to figure out what is right in that case, "right"
> in the sense that it produces the desired results after communicating
> the changes to TS.
The difficulty is that what tree-sitter sees must be consistent. If Emacs updates tree-sitter with option 1 and lisp later choose option 2, the content that tree-sitter sees is not consistent. Anyway, I found a way that avoids this issue: the bounds of tree-sitter’s visible region never changes, and the next time when lisp narrows to a different region, we update tree-sitter’s bound to match that of the narrowing. Here is the latest patch. If the code is not entirely straightforward, I’m happy to add more comment to explain it.
I set up a linux machine and tried to debug the crashing problem, but it didn’t crash. Seems the crash only appears on my Mac...
Yuan
[-- Attachment #2: ts.5.patch --]
[-- Type: application/octet-stream, Size: 23721 bytes --]
From 62fc019a7f57119329d53b9b8a3e8b5c1e61b27f Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Wed, 28 Jul 2021 21:08:43 -0400
Subject: [PATCH] checkpoint 5
- Move define_error out of json.c
- Add narrowing support
---
lisp/tree-sitter.el | 11 +-
src/eval.c | 13 ++
src/json.c | 16 ---
src/lisp.h | 5 +
src/tree_sitter.c | 231 +++++++++++++++++++++++-----------
src/tree_sitter.h | 15 ++-
test/src/tree-sitter-tests.el | 53 ++++++++
7 files changed, 251 insertions(+), 93 deletions(-)
diff --git a/lisp/tree-sitter.el b/lisp/tree-sitter.el
index a6ecb09386..8a887bb406 100644
--- a/lisp/tree-sitter.el
+++ b/lisp/tree-sitter.el
@@ -102,12 +102,13 @@ tree-sitter-font-lock-settings
PATTERN is a tree-sitter query pattern. (See manual for how to
write query patterns.) This pattern should capture nodes with
-either face names or function names. If captured with a face
-name, the node's corresponding text in the buffer is fontified
-with that face; if captured with a function name, the function is
-called with three arguments, BEG END NODE, where BEG and END
+either face symbols or function symbols. If captured with a face
+symbol, the node's corresponding text in the buffer is fontified
+with that face; if captured with a function symbol, the function
+is called with three arguments, BEG END NODE, where BEG and END
marks the span of the corresponding text, and NODE is the node
-itself.")
+itself. If a symbol is both a face and a function, it is treated
+as a face.")
(defun tree-sitter-fontify-region-function (beg end &optional verbose)
"Fontify the region between BEG and END.
diff --git a/src/eval.c b/src/eval.c
index 18faa0b9b1..33c0763f38 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -1956,6 +1956,19 @@ signal_error (const char *s, Lisp_Object arg)
xsignal (Qerror, Fcons (build_string (s), arg));
}
+void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent)
+{
+ eassert (SYMBOLP (name));
+ eassert (SYMBOLP (parent));
+ Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
+ eassert (CONSP (parent_conditions));
+ eassert (!NILP (Fmemq (parent, parent_conditions)));
+ eassert (NILP (Fmemq (name, parent_conditions)));
+ Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
+ Fput (name, Qerror_message, build_pure_c_string (message));
+}
+
/* Use this for arithmetic overflow, e.g., when an integer result is
too large even for a bignum. */
void
diff --git a/src/json.c b/src/json.c
index 3f1d27ad7f..ff28143a3c 100644
--- a/src/json.c
+++ b/src/json.c
@@ -1098,22 +1098,6 @@ DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
return unbind_to (count, lisp);
}
-/* Simplified version of 'define-error' that works with pure
- objects. */
-
-static void
-define_error (Lisp_Object name, const char *message, Lisp_Object parent)
-{
- eassert (SYMBOLP (name));
- eassert (SYMBOLP (parent));
- Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
- eassert (CONSP (parent_conditions));
- eassert (!NILP (Fmemq (parent, parent_conditions)));
- eassert (NILP (Fmemq (name, parent_conditions)));
- Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
- Fput (name, Qerror_message, build_pure_c_string (message));
-}
-
void
syms_of_json (void)
{
diff --git a/src/lisp.h b/src/lisp.h
index e439447283..d30509b61a 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -5127,6 +5127,11 @@ maybe_gc (void)
maybe_garbage_collect ();
}
+/* Simplified version of 'define-error' that works with pure
+ objects. */
+void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent);
+
INLINE_HEADER_END
#endif /* EMACS_LISP_H */
diff --git a/src/tree_sitter.c b/src/tree_sitter.c
index e9f8ddc7e3..5e16df7758 100644
--- a/src/tree_sitter.c
+++ b/src/tree_sitter.c
@@ -19,17 +19,8 @@ Copyright (C) 2021 Free Software Foundation, Inc.
#include <config.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/param.h>
-#include <errno.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-
#include "lisp.h"
#include "buffer.h"
-#include "coding.h"
#include "tree_sitter.h"
/* parser.h defines a macro ADVANCE that conflicts with alloc.c. */
@@ -61,6 +52,16 @@ DEFUN ("tree-sitter-node-p",
/*** Parsing functions */
+static inline void
+ts_tree_edit_1 (TSTree *tree, ptrdiff_t start_byte,
+ ptrdiff_t old_end_byte, ptrdiff_t new_end_byte)
+{
+ TSPoint dummy_point = {0, 0};
+ TSInputEdit edit = {start_byte, old_end_byte, new_end_byte,
+ dummy_point, dummy_point, dummy_point};
+ ts_tree_edit (tree, &edit);
+}
+
/* Update each parser's tree after the user made an edit. This
function does not parse the buffer and only updates the tree. (So it
should be very fast.) */
@@ -68,18 +69,38 @@ DEFUN ("tree-sitter-node-p",
ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
ptrdiff_t new_end_byte)
{
+ eassert(start_byte <= old_end_byte);
+ eassert(start_byte <= new_end_byte);
+
Lisp_Object parser_list = Fsymbol_value (Qtree_sitter_parser_list);
- TSPoint dummy_point = {0, 0};
- TSInputEdit edit = {start_byte, old_end_byte, new_end_byte,
- dummy_point, dummy_point, dummy_point};
+
while (!NILP (parser_list))
{
Lisp_Object lisp_parser = Fcar (parser_list);
TSTree *tree = XTS_PARSER (lisp_parser)->tree;
if (tree != NULL)
- ts_tree_edit (tree, &edit);
- XTS_PARSER (lisp_parser)->need_reparse = true;
- parser_list = Fcdr (parser_list);
+ {
+ /* We "clip" the change to between visible_beg and
+ visible_end. It is okay if visible_end ends up larger
+ than BUF_Z, tree-sitter only access buffer text during
+ re-parse, and we will adjust visible_beg/end before
+ re-parse. */
+ ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
+ ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
+
+ ptrdiff_t visible_start =
+ max (visible_beg, start_byte) - visible_beg;
+ ptrdiff_t visible_old_end =
+ min (visible_end, old_end_byte) - visible_beg;
+ ptrdiff_t visible_new_end =
+ min (visible_end, new_end_byte) - visible_beg;
+
+ ts_tree_edit_1 (tree, visible_start, visible_old_end,
+ visible_new_end);
+ XTS_PARSER (lisp_parser)->need_reparse = true;
+
+ parser_list = Fcdr (parser_list);
+ }
}
}
@@ -93,16 +114,67 @@ ts_ensure_parsed (Lisp_Object parser)
TSParser *ts_parser = XTS_PARSER (parser)->parser;
TSTree *tree = XTS_PARSER(parser)->tree;
TSInput input = XTS_PARSER (parser)->input;
+ struct buffer *buffer = XTS_PARSER (parser)->buffer;
+
+ /* Before we parse, catch up with the narrowing situation. We
+ change visible_beg and visible_end to match BUF_BEGV_BYTE and
+ BUF_ZV_BYTE, and inform tree-sitter of the change. */
+ ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg;
+ ptrdiff_t visible_end = XTS_PARSER (parser)->visible_end;
+ /* Before re-parse, we want to move the visible range of tree-sitter
+ to matched the narrowed range. For example:
+ Move ________|____|__
+ to |____|__________ */
+
+ /* 1. Make sure visible_beg <= BUF_BEGV_BYTE. */
+ if (visible_beg > BUF_BEGV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: insert at the beginning. */
+ ts_tree_edit_1 (tree, 0, 0, visible_beg - BUF_BEGV_BYTE (buffer));
+ visible_beg = BUF_BEGV_BYTE (buffer);
+ }
+ /* 2. Make sure visible_end = BUF_ZV_BYTE. */
+ if (visible_end < BUF_ZV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: insert at the end. */
+ ts_tree_edit_1 (tree, visible_end - visible_beg,
+ visible_end - visible_beg,
+ BUF_ZV_BYTE (buffer) - visible_beg);
+ visible_end = BUF_ZV_BYTE (buffer);
+ }
+ else if (visible_end > BUF_ZV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: delete at the end. */
+ ts_tree_edit_1 (tree, BUF_ZV_BYTE (buffer) - visible_beg,
+ visible_end - visible_beg,
+ BUF_ZV_BYTE (buffer) - visible_beg);
+ visible_end = BUF_ZV_BYTE (buffer);
+ }
+ /* 3. Make sure visible_beg = BUF_BEGV_BYTE. */
+ if (visible_beg < BUF_BEGV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: delete at the beginning. */
+ ts_tree_edit_1 (tree, 0, BUF_BEGV_BYTE (buffer) - visible_beg, 0);
+ visible_beg = BUF_BEGV_BYTE (buffer);
+ }
+ XTS_PARSER (parser)->visible_beg = visible_beg;
+ XTS_PARSER (parser)->visible_end = visible_end;
+
TSTree *new_tree = ts_parser_parse(ts_parser, tree, input);
- /* This should be very rare: it only happens when 1) language is not
- set (impossible in Emacs because the user has to supply a
- language to create a parser), 2) parse canceled due to timeout
- (impossible because we don't set a timeout), 3) parse canceled
- due to cancellation flag (impossible because we don't set the
- flag). (See comments for ts_parser_parse in
+ /* This should be very rare (impossible, really): it only happens
+ when 1) language is not set (impossible in Emacs because the user
+ has to supply a language to create a parser), 2) parse canceled
+ due to timeout (impossible because we don't set a timeout), 3)
+ parse canceled due to cancellation flag (impossible because we
+ don't set the flag). (See comments for ts_parser_parse in
tree_sitter/api.h.) */
if (new_tree == NULL)
- signal_error ("Parse failed", parser);
+ {
+ Lisp_Object buf;
+ XSETBUFFER(buf, buffer);
+ xsignal1 (Qtree_sitter_parse_error, buf);
+ }
+
ts_tree_delete (tree);
XTS_PARSER (parser)->tree = new_tree;
XTS_PARSER (parser)->need_reparse = false;
@@ -110,13 +182,18 @@ ts_ensure_parsed (Lisp_Object parser)
}
/* This is the read function provided to tree-sitter to read from a
- buffer. It reads one character at a time and automatically skip
+ buffer. It reads one character at a time and automatically skips
the gap. */
const char*
-ts_read_buffer (void *buffer, uint32_t byte_index,
+ts_read_buffer (void *parser, uint32_t byte_index,
TSPoint position, uint32_t *bytes_read)
{
- ptrdiff_t byte_pos = byte_index + 1;
+ struct buffer *buffer = ((struct Lisp_TS_Parser *) parser)->buffer;
+ ptrdiff_t visible_beg = ((struct Lisp_TS_Parser *) parser)->visible_beg;
+ ptrdiff_t byte_pos = byte_index + visible_beg;
+ /* We will make sure visible_beg >= BUF_BEG_BYTE before re-parse (in
+ ts_ensure_parsed), so byte_pos will never be smaller than
+ BUF_BEG_BYTE (unless byte_index < 0). */
/* Read one character. Tree-sitter wants us to set bytes_read to 0
if it reads to the end of buffer. It doesn't say what it wants
@@ -126,26 +203,26 @@ ts_read_buffer (void *buffer, uint32_t byte_index,
int len;
/* This function could run from a user command, so it is better to
do nothing instead of raising an error. (It was a pain in the a**
- to read mega-if-conditions in Emacs source, so I write the two
- branches separately, hoping the compiler can merge them.) */
- if (!BUFFER_LIVE_P ((struct buffer *) buffer))
+ to decrypt mega-if-conditions in Emacs source, so I wrote the two
+ branches separately.) */
+ if (!BUFFER_LIVE_P (buffer))
{
beg = "";
len = 0;
}
- // TODO BUF_ZV_BYTE?
- else if (byte_pos >= BUF_Z_BYTE ((struct buffer *) buffer))
+ /* Reached visible end-of-buffer, tell tree-sitter to read no more. */
+ else if (byte_pos >= BUF_ZV_BYTE (buffer))
{
beg = "";
len = 0;
}
+ /* Normal case, read a character. */
else
{
beg = (char *) BUF_BYTE_ADDRESS (buffer, byte_pos);
- len = BYTES_BY_CHAR_HEAD ((int) beg);
+ len = BYTES_BY_CHAR_HEAD ((int) *beg);
}
*bytes_read = (uint32_t) len;
-
return beg;
}
@@ -158,13 +235,16 @@ make_ts_parser (struct buffer *buffer, TSParser *parser,
{
struct Lisp_TS_Parser *lisp_parser
= ALLOCATE_PSEUDOVECTOR (struct Lisp_TS_Parser, name, PVEC_TS_PARSER);
+
lisp_parser->name = name;
lisp_parser->buffer = buffer;
lisp_parser->parser = parser;
lisp_parser->tree = tree;
- TSInput input = {buffer, ts_read_buffer, TSInputEncodingUTF8};
+ TSInput input = {lisp_parser, ts_read_buffer, TSInputEncodingUTF8};
lisp_parser->input = input;
lisp_parser->need_reparse = true;
+ lisp_parser->visible_beg = BUF_BEGV (buffer);
+ lisp_parser->visible_end = BUF_ZV (buffer);
return make_lisp_ptr (lisp_parser, Lisp_Vectorlike);
}
@@ -287,7 +367,7 @@ DEFUN ("tree-sitter-parse-string",
/* See comment in ts_ensure_parsed for possible reasons for a
failure. */
if (tree == NULL)
- signal_error ("Failed to parse STRING", string);
+ xsignal1 (Qtree_sitter_parse_error, string);
TSNode root_node = ts_tree_root_node (tree);
@@ -535,7 +615,9 @@ DEFUN ("tree-sitter-node-first-child-for-byte",
{
CHECK_INTEGER (pos);
- struct buffer *buf = (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ struct buffer *buf = XTS_PARSER (XTS_NODE (node)->parser)->buffer;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
ptrdiff_t byte_pos = XFIXNUM (pos);
if (byte_pos < BUF_BEGV_BYTE (buf) || byte_pos > BUF_ZV_BYTE (buf))
@@ -544,9 +626,10 @@ DEFUN ("tree-sitter-node-first-child-for-byte",
TSNode ts_node = XTS_NODE (node)->node;
TSNode child;
if (NILP (named))
- child = ts_node_first_child_for_byte (ts_node, byte_pos - 1);
+ child = ts_node_first_child_for_byte (ts_node, byte_pos - visible_beg);
else
- child = ts_node_first_named_child_for_byte (ts_node, byte_pos - 1);
+ child = ts_node_first_named_child_for_byte
+ (ts_node, byte_pos - visible_beg);
if (ts_node_is_null(child))
return Qnil;
@@ -566,7 +649,9 @@ DEFUN ("tree-sitter-node-descendant-for-byte-range",
CHECK_INTEGER (beg);
CHECK_INTEGER (end);
- struct buffer *buf = (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ struct buffer *buf = XTS_PARSER (XTS_NODE (node)->parser)->buffer;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
ptrdiff_t byte_beg = XFIXNUM (beg);
ptrdiff_t byte_end = XFIXNUM (end);
@@ -580,10 +665,10 @@ DEFUN ("tree-sitter-node-descendant-for-byte-range",
TSNode child;
if (NILP (named))
child = ts_node_descendant_for_byte_range
- (ts_node, byte_beg - 1 , byte_end - 1);
+ (ts_node, byte_beg - visible_beg , byte_end - visible_beg);
else
child = ts_node_named_descendant_for_byte_range
- (ts_node, byte_beg - 1, byte_end - 1);
+ (ts_node, byte_beg - visible_beg, byte_end - visible_beg);
if (ts_node_is_null(child))
return Qnil;
@@ -593,31 +678,24 @@ DEFUN ("tree-sitter-node-descendant-for-byte-range",
/* Query functions */
-Lisp_Object ts_query_error_to_string (TSQueryError error)
+char*
+ts_query_error_to_string (TSQueryError error)
{
- char *error_name;
switch (error)
{
case TSQueryErrorNone:
- error_name = "none";
- break;
+ return "none";
case TSQueryErrorSyntax:
- error_name = "syntax";
- break;
+ return "syntax";
case TSQueryErrorNodeType:
- error_name = "node type";
- break;
+ return "node type";
case TSQueryErrorField:
- error_name = "field";
- break;
+ return "field";
case TSQueryErrorCapture:
- error_name = "capture";
- break;
+ return "capture";
case TSQueryErrorStructure:
- error_name = "structure";
- break;
+ return "structure";
}
- return make_pure_c_string (error_name, strlen(error_name));
}
DEFUN ("tree-sitter-query-capture",
@@ -634,7 +712,7 @@ DEFUN ("tree-sitter-query-capture",
BEG and END, if _both_ non-nil, specifies the range in which the query
is executed.
-Return nil if the query failed. */)
+Raise an tree-sitter-query-error if PATTERN is malformed. */)
(Lisp_Object node, Lisp_Object pattern,
Lisp_Object beg, Lisp_Object end)
{
@@ -643,47 +721,56 @@ DEFUN ("tree-sitter-query-capture",
TSNode ts_node = XTS_NODE (node)->node;
Lisp_Object lisp_parser = XTS_NODE (node)->parser;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
const TSLanguage *lang = ts_parser_language
(XTS_PARSER (lisp_parser)->parser);
char *source = SSDATA (pattern);
+
uint32_t error_offset;
- uint32_t error_type;
+ TSQueryError error_type;
TSQuery *query = ts_query_new (lang, source, strlen (source),
&error_offset, &error_type);
TSQueryCursor *cursor = ts_query_cursor_new ();
if (query == NULL)
{
- // FIXME: Signal an error?
- return Qnil;
+ // FIXME: Still crashes, debug when I can get a gdb.
+ xsignal2 (Qtree_sitter_query_error,
+ make_fixnum (error_offset),
+ build_string (ts_query_error_to_string (error_type)));
}
if (!NILP (beg) && !NILP (end))
{
EMACS_INT beg_byte = XFIXNUM (beg);
EMACS_INT end_byte = XFIXNUM (end);
ts_query_cursor_set_byte_range
- (cursor, (uint32_t) beg_byte - 1, (uint32_t) end_byte - 1);
+ (cursor, (uint32_t) beg_byte - visible_beg,
+ (uint32_t) end_byte - visible_beg);
}
ts_query_cursor_exec (cursor, query, ts_node);
TSQueryMatch match;
- TSQueryCapture capture;
+
Lisp_Object result = Qnil;
- Lisp_Object entry;
- Lisp_Object captured_node;
- const char *capture_name;
- uint32_t capture_name_len;
while (ts_query_cursor_next_match (cursor, &match))
{
const TSQueryCapture *captures = match.captures;
for (int idx=0; idx < match.capture_count; idx++)
{
+ TSQueryCapture capture;
+ Lisp_Object captured_node;
+ const char *capture_name;
+ Lisp_Object entry;
+ uint32_t capture_name_len;
+
capture = captures[idx];
captured_node = make_ts_node(lisp_parser, capture.node);
capture_name = ts_query_capture_name_for_id
(query, capture.index, &capture_name_len);
- entry = Fcons (intern_c_string (capture_name),
+ entry = Fcons (intern_c_string_1
+ (capture_name, capture_name_len),
captured_node);
result = Fcons (entry, result);
}
@@ -705,11 +792,15 @@ syms_of_tree_sitter (void)
DEFSYM (Qhas_changes, "has-changes");
DEFSYM (Qhas_error, "has-error");
+ DEFSYM(Qtree_sitter_error, "tree-sitter-error");
DEFSYM (Qtree_sitter_query_error, "tree-sitter-query-error");
- Fput (Qtree_sitter_query_error, Qerror_conditions,
- pure_list (Qtree_sitter_query_error, Qerror));
- Fput (Qtree_sitter_query_error, Qerror_message,
- build_pure_c_string ("Error with query pattern"))
+ DEFSYM (Qtree_sitter_parse_error, "tree-sitter-parse-error")
+ define_error (Qtree_sitter_error, "Generic tree-sitter error", Qerror);
+ define_error (Qtree_sitter_query_error, "Query pattern is malformed",
+ Qtree_sitter_error);
+ define_error (Qtree_sitter_parse_error, "Parse failed",
+ Qtree_sitter_error);
+
DEFSYM (Qtree_sitter_parser_list, "tree-sitter-parser-list");
DEFVAR_LISP ("tree-sitter-parser-list", Vtree_sitter_parser_list,
diff --git a/src/tree_sitter.h b/src/tree_sitter.h
index e9b4a71326..7e0fec0ee9 100644
--- a/src/tree_sitter.h
+++ b/src/tree_sitter.h
@@ -20,8 +20,6 @@ Copyright (C) 2021 Free Software Foundation, Inc.
#ifndef EMACS_TREE_SITTER_H
#define EMACS_TREE_SITTER_H
-#include <sys/types.h>
-
#include "lisp.h"
#include <tree_sitter/api.h>
@@ -33,12 +31,25 @@ #define EMACS_TREE_SITTER_H
struct Lisp_TS_Parser
{
union vectorlike_header header;
+ /* A parser's name is just a convenient tag, see docstring for
+ 'tree-sitter-make-parser', and 'tree-sitter-get-parser'. */
Lisp_Object name;
struct buffer *buffer;
TSParser *parser;
TSTree *tree;
TSInput input;
+ /* Re-parsing an unchanged buffer is not free for tree-sitter, so we
+ only make it re-parse when need_reparse == true. That usually
+ means some change is made in the buffer. But others could set
+ this field to true to force tree-sitter to re-parse. */
bool need_reparse;
+ /* This two positions record the byte position of the "visible
+ region" that tree-sitter sees. Unlike markers, These two
+ positions do not change as the user inserts and deletes text
+ around them. Before re-parse, we move these positions to match
+ BUF_BEGV_BYTE and BUF_ZV_BYTE. */
+ ptrdiff_t visible_beg;
+ ptrdiff_t visible_end;
};
/* A wrapper around a tree-sitter node. */
diff --git a/test/src/tree-sitter-tests.el b/test/src/tree-sitter-tests.el
index c61ad678d2..69104568de 100644
--- a/test/src/tree-sitter-tests.el
+++ b/test/src/tree-sitter-tests.el
@@ -148,5 +148,58 @@ tree-sitter-query-api
(cdr entry))))
(tree-sitter-query-capture root-node pattern)))))))
+(ert-deftest tree-sitter-narrow ()
+ "Tests if narrowing works."
+ (with-temp-buffer
+ (let (parser root-node pattern doc-node object-node pair-node)
+ (progn
+ (insert "xxx[1,{\"name\": \"Bob\"},2,3]xxx")
+ (narrow-to-region (+ (point-min) 3) (- (point-max) 3))
+ (setq parser (tree-sitter-create-parser
+ (current-buffer) (tree-sitter-json)))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
+ ;; This test is from the basic test.
+ (should
+ (equal
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number) (number)))"))
+
+ (widen)
+ (goto-char (point-min))
+ (insert "ooo")
+ (should (equal "oooxxx[1,{\"name\": \"Bob\"},2,3]xxx"
+ (buffer-string)))
+ (delete-region 10 26)
+ (should (equal "oooxxx[1,2,3]xxx"
+ (buffer-string)))
+ (narrow-to-region (+ (point-min) 6) (- (point-max) 3))
+ ;; This test is also from the basic test.
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number) (number)))"))
+ (widen)
+ (goto-char (point-max))
+ (insert "[1,2]")
+ (should (equal "oooxxx[1,2,3]xxx[1,2]"
+ (buffer-string)))
+ (narrow-to-region (- (point-max) 5) (point-max))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number)))"))
+ (widen)
+ (goto-char (point-min))
+ (insert "[1]")
+ (should (equal "[1]oooxxx[1,2,3]xxx[1,2]"
+ (buffer-string)))
+ (narrow-to-region (point-min) (+ (point-min) 3))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number)))")))))
+
(provide 'tree-sitter-tests)
;;; tree-sitter-tests.el ends here
--
2.24.3 (Apple Git-128)
^ permalink raw reply related [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-29 14:35 ` Yuan Fu
@ 2021-07-29 15:28 ` Eli Zaretskii
2021-07-29 15:57 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-29 15:28 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 29 Jul 2021 10:35:10 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> > We don't need to know. The Lisp program which needs to handle this
> > situation will have to figure out what is right in that case, "right"
> > in the sense that it produces the desired results after communicating
> > the changes to TS.
>
> The difficulty is that what tree-sitter sees must be consistent. If Emacs updates tree-sitter with option 1 and lisp later choose option 2, the content that tree-sitter sees is not consistent.
If that happens, it means the Lisp program which does that has a bug
that needs to be fixed.
> Anyway, I found a way that avoids this issue: the bounds of tree-sitter’s visible region never changes, and the next time when lisp narrows to a different region, we update tree-sitter’s bound to match that of the narrowing. Here is the latest patch. If the code is not entirely straightforward, I’m happy to add more comment to explain it.
I'm not sure we should do this, because it means we second-guess what
the Lisp program calling TS intends to do. Why should we do that,
instead of leaving it to the Lisp program to DTRT? And what happens
if our guess is wrong?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-29 15:28 ` Eli Zaretskii
@ 2021-07-29 15:57 ` Yuan Fu
2021-07-29 16:21 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-29 15:57 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Stephen Leake, monnier, emacs-devel
> On Jul 29, 2021, at 11:28 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 29 Jul 2021 10:35:10 -0400
>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Stefan Monnier <monnier@iro.umontreal.ca>,
>> emacs-devel@gnu.org
>>
>>> We don't need to know. The Lisp program which needs to handle this
>>> situation will have to figure out what is right in that case, "right"
>>> in the sense that it produces the desired results after communicating
>>> the changes to TS.
>>
>> The difficulty is that what tree-sitter sees must be consistent. If Emacs updates tree-sitter with option 1 and lisp later choose option 2, the content that tree-sitter sees is not consistent.
>
> If that happens, it means the Lisp program which does that has a bug
> that needs to be fixed.
>
>> Anyway, I found a way that avoids this issue: the bounds of tree-sitter’s visible region never changes, and the next time when lisp narrows to a different region, we update tree-sitter’s bound to match that of the narrowing. Here is the latest patch. If the code is not entirely straightforward, I’m happy to add more comment to explain it.
>
> I'm not sure we should do this, because it means we second-guess what
> the Lisp program calling TS intends to do. Why should we do that,
> instead of leaving it to the Lisp program to DTRT? And what happens
> if our guess is wrong?
I don’t think the current implementation guesses anything. Let me turn around and ask you what is TRT: if the buffer is xxxAAAxxx, and lisp narrows to AAA and creates a parser, parser sees AAA; now widen, user inserts BBB in front of AAA, what do we tell tree-sitter? Nothing changed, or BBB inserted at the beginning? To where should lisp narrow? BBBAAA, or AAA, or BBB?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-29 15:57 ` Yuan Fu
@ 2021-07-29 16:21 ` Eli Zaretskii
2021-07-29 16:59 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-29 16:21 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 29 Jul 2021 11:57:56 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
>
> >> Anyway, I found a way that avoids this issue: the bounds of tree-sitter’s visible region never changes, and the next time when lisp narrows to a different region, we update tree-sitter’s bound to match that of the narrowing. Here is the latest patch. If the code is not entirely straightforward, I’m happy to add more comment to explain it.
> >
> > I'm not sure we should do this, because it means we second-guess what
> > the Lisp program calling TS intends to do. Why should we do that,
> > instead of leaving it to the Lisp program to DTRT? And what happens
> > if our guess is wrong?
>
> I don’t think the current implementation guesses anything. Let me turn around and ask you what is TRT: if the buffer is xxxAAAxxx, and lisp narrows to AAA and creates a parser, parser sees AAA; now widen, user inserts BBB in front of AAA, what do we tell tree-sitter? Nothing changed, or BBB inserted at the beginning?
Neither. We should tell TS that instead of AAA there's now
xxxBBBAAAxxx, because the narrowing was removed.
> To where should lisp narrow? BBBAAA, or AAA, or BBB?
It's the question for the Lisp program, not for the low-level code
which we are discussing.
Anyway, you are once again bothered by a scenario that should not
happen at all: a Lisp program should not call TS first with, then
without narrowing (or the other way around). I don't see why such
situation should happen, and if they do, the Lisp programs which need
them will have to figure out what to do and how.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-29 16:21 ` Eli Zaretskii
@ 2021-07-29 16:59 ` Yuan Fu
2021-07-29 17:38 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-29 16:59 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Stephen Leake, Stefan Monnier,
emacs-devel
> On Jul 29, 2021, at 12:21 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 29 Jul 2021 11:57:56 -0400
>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> monnier@iro.umontreal.ca,
>> emacs-devel@gnu.org
>>
>>>> Anyway, I found a way that avoids this issue: the bounds of tree-sitter’s visible region never changes, and the next time when lisp narrows to a different region, we update tree-sitter’s bound to match that of the narrowing. Here is the latest patch. If the code is not entirely straightforward, I’m happy to add more comment to explain it.
>>>
>>> I'm not sure we should do this, because it means we second-guess what
>>> the Lisp program calling TS intends to do. Why should we do that,
>>> instead of leaving it to the Lisp program to DTRT? And what happens
>>> if our guess is wrong?
>>
>> I don’t think the current implementation guesses anything. Let me turn around and ask you what is TRT: if the buffer is xxxAAAxxx, and lisp narrows to AAA and creates a parser, parser sees AAA; now widen, user inserts BBB in front of AAA, what do we tell tree-sitter? Nothing changed, or BBB inserted at the beginning?
>
> Neither. We should tell TS that instead of AAA there's now
> xxxBBBAAAxxx, because the narrowing was removed.
This is the common usage that I imagined:
Narrow
Calls tree-sitter (for fontification etc)
Widen
Users edit the buffer
narrow
Calls tree-sitter (for fontification etc)
Widen
Ideally, tree-sitter only sees the narrowed region because everytime it is called, the buffer is narrowed. However, tree-sitter doesn’t work that way, it needs to be updated when user edits the buffer, when the buffer is widened. If your goal is give lisp control of what tree-sitter sees, we can’t just give tree-sitter the whole buffer whenever the user makes some change.
>
>> To where should lisp narrow? BBBAAA, or AAA, or BBB?
>
> It's the question for the Lisp program, not for the low-level code
> which we are discussing.
>
> Anyway, you are once again bothered by a scenario that should not
> happen at all: a Lisp program should not call TS first with, then
> without narrowing (or the other way around). I don't see why such
> situation should happen, and if they do, the Lisp programs which need
> them will have to figure out what to do and how.
Even if lisp always call tree-sitter with narrowing, we still need to update tree-sitter when the buffer is widened. Does that make sense?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-29 16:59 ` Yuan Fu
@ 2021-07-29 17:38 ` Eli Zaretskii
2021-07-29 17:55 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-29 17:38 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 29 Jul 2021 12:59:43 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> >> I don’t think the current implementation guesses anything. Let me turn around and ask you what is TRT: if the buffer is xxxAAAxxx, and lisp narrows to AAA and creates a parser, parser sees AAA; now widen, user inserts BBB in front of AAA, what do we tell tree-sitter? Nothing changed, or BBB inserted at the beginning?
> >
> > Neither. We should tell TS that instead of AAA there's now
> > xxxBBBAAAxxx, because the narrowing was removed.
>
> This is the common usage that I imagined:
>
> Narrow
> Calls tree-sitter (for fontification etc)
> Widen
>
> Users edit the buffer
>
> narrow
> Calls tree-sitter (for fontification etc)
> Widen
>
> Ideally, tree-sitter only sees the narrowed region because everytime it is called, the buffer is narrowed. However, tree-sitter doesn’t work that way, it needs to be updated when user edits the buffer, when the buffer is widened. If your goal is give lisp control of what tree-sitter sees, we can’t just give tree-sitter the whole buffer whenever the user makes some change.
In the above scenario, then the Lisp program that narrows the buffer
should figure out how to do that correctly. The call to TS will then
express the changes in the narrowed region only.
> > Anyway, you are once again bothered by a scenario that should not
> > happen at all: a Lisp program should not call TS first with, then
> > without narrowing (or the other way around). I don't see why such
> > situation should happen, and if they do, the Lisp programs which need
> > them will have to figure out what to do and how.
>
> Even if lisp always call tree-sitter with narrowing, we still need to update tree-sitter when the buffer is widened.
No, I don't think so. Why would we need to? From the TS POV the text
outside the restriction doesn't exist because it never sees it.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-29 17:38 ` Eli Zaretskii
@ 2021-07-29 17:55 ` Yuan Fu
2021-07-29 18:37 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-29 17:55 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, Stephen Leake, monnier, emacs-devel
>>
>> Ideally, tree-sitter only sees the narrowed region because everytime it is called, the buffer is narrowed. However, tree-sitter doesn’t work that way, it needs to be updated when user edits the buffer, when the buffer is widened. If your goal is give lisp control of what tree-sitter sees, we can’t just give tree-sitter the whole buffer whenever the user makes some change.
>
> In the above scenario, then the Lisp program that narrows the buffer
> should figure out how to do that correctly. The call to TS will then
> express the changes in the narrowed region only.
>
>>> Anyway, you are once again bothered by a scenario that should not
>>> happen at all: a Lisp program should not call TS first with, then
>>> without narrowing (or the other way around). I don't see why such
>>> situation should happen, and if they do, the Lisp programs which need
>>> them will have to figure out what to do and how.
>>
>> Even if lisp always call tree-sitter with narrowing, we still need to update tree-sitter when the buffer is widened.
>
> No, I don't think so. Why would we need to? From the TS POV the text
> outside the restriction doesn't exist because it never sees it.
Actually, that sounds like how it works in my code right now. After the last few exchanges, I still have the feeling that we are not on the same page. Could you have a look at the code in ts_ensure_parsed and ts_record_change, and see if it aligns with what you consider to be the right thing? If you have read them already and think you understand what are they doing, could you tell me how exactly should these two functions behave, in your opinion? Thanks.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-29 17:55 ` Yuan Fu
@ 2021-07-29 18:37 ` Eli Zaretskii
2021-07-29 18:57 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-29 18:37 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 29 Jul 2021 13:55:48 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
>
> Actually, that sounds like how it works in my code right now. After the last few exchanges, I still have the feeling that we are not on the same page. Could you have a look at the code in ts_ensure_parsed and ts_record_change, and see if it aligns with what you consider to be the right thing? If you have read them already and think you understand what are they doing, could you tell me how exactly should these two functions behave, in your opinion? Thanks.
Where do I find the latest version of the code?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-29 18:37 ` Eli Zaretskii
@ 2021-07-29 18:57 ` Yuan Fu
2021-07-30 6:47 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-07-29 18:57 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Stephen Leake, Stefan Monnier,
emacs-devel
[-- Attachment #1: Type: text/plain, Size: 925 bytes --]
> On Jul 29, 2021, at 2:37 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 29 Jul 2021 13:55:48 -0400
>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
>> cpitclaudel@gmail.com,
>> monnier@iro.umontreal.ca,
>> emacs-devel@gnu.org
>>
>> Actually, that sounds like how it works in my code right now. After the last few exchanges, I still have the feeling that we are not on the same page. Could you have a look at the code in ts_ensure_parsed and ts_record_change, and see if it aligns with what you consider to be the right thing? If you have read them already and think you understand what are they doing, could you tell me how exactly should these two functions behave, in your opinion? Thanks.
>
> Where do I find the latest version of the code?
A few messages back I attached a patch, ts.5.patch. Actually I can just attach it again, here.
Yuan
[-- Attachment #2: ts.5.patch --]
[-- Type: application/octet-stream, Size: 23721 bytes --]
From 62fc019a7f57119329d53b9b8a3e8b5c1e61b27f Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Wed, 28 Jul 2021 21:08:43 -0400
Subject: [PATCH] checkpoint 5
- Move define_error out of json.c
- Add narrowing support
---
lisp/tree-sitter.el | 11 +-
src/eval.c | 13 ++
src/json.c | 16 ---
src/lisp.h | 5 +
src/tree_sitter.c | 231 +++++++++++++++++++++++-----------
src/tree_sitter.h | 15 ++-
test/src/tree-sitter-tests.el | 53 ++++++++
7 files changed, 251 insertions(+), 93 deletions(-)
diff --git a/lisp/tree-sitter.el b/lisp/tree-sitter.el
index a6ecb09386..8a887bb406 100644
--- a/lisp/tree-sitter.el
+++ b/lisp/tree-sitter.el
@@ -102,12 +102,13 @@ tree-sitter-font-lock-settings
PATTERN is a tree-sitter query pattern. (See manual for how to
write query patterns.) This pattern should capture nodes with
-either face names or function names. If captured with a face
-name, the node's corresponding text in the buffer is fontified
-with that face; if captured with a function name, the function is
-called with three arguments, BEG END NODE, where BEG and END
+either face symbols or function symbols. If captured with a face
+symbol, the node's corresponding text in the buffer is fontified
+with that face; if captured with a function symbol, the function
+is called with three arguments, BEG END NODE, where BEG and END
marks the span of the corresponding text, and NODE is the node
-itself.")
+itself. If a symbol is both a face and a function, it is treated
+as a face.")
(defun tree-sitter-fontify-region-function (beg end &optional verbose)
"Fontify the region between BEG and END.
diff --git a/src/eval.c b/src/eval.c
index 18faa0b9b1..33c0763f38 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -1956,6 +1956,19 @@ signal_error (const char *s, Lisp_Object arg)
xsignal (Qerror, Fcons (build_string (s), arg));
}
+void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent)
+{
+ eassert (SYMBOLP (name));
+ eassert (SYMBOLP (parent));
+ Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
+ eassert (CONSP (parent_conditions));
+ eassert (!NILP (Fmemq (parent, parent_conditions)));
+ eassert (NILP (Fmemq (name, parent_conditions)));
+ Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
+ Fput (name, Qerror_message, build_pure_c_string (message));
+}
+
/* Use this for arithmetic overflow, e.g., when an integer result is
too large even for a bignum. */
void
diff --git a/src/json.c b/src/json.c
index 3f1d27ad7f..ff28143a3c 100644
--- a/src/json.c
+++ b/src/json.c
@@ -1098,22 +1098,6 @@ DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
return unbind_to (count, lisp);
}
-/* Simplified version of 'define-error' that works with pure
- objects. */
-
-static void
-define_error (Lisp_Object name, const char *message, Lisp_Object parent)
-{
- eassert (SYMBOLP (name));
- eassert (SYMBOLP (parent));
- Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
- eassert (CONSP (parent_conditions));
- eassert (!NILP (Fmemq (parent, parent_conditions)));
- eassert (NILP (Fmemq (name, parent_conditions)));
- Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
- Fput (name, Qerror_message, build_pure_c_string (message));
-}
-
void
syms_of_json (void)
{
diff --git a/src/lisp.h b/src/lisp.h
index e439447283..d30509b61a 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -5127,6 +5127,11 @@ maybe_gc (void)
maybe_garbage_collect ();
}
+/* Simplified version of 'define-error' that works with pure
+ objects. */
+void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent);
+
INLINE_HEADER_END
#endif /* EMACS_LISP_H */
diff --git a/src/tree_sitter.c b/src/tree_sitter.c
index e9f8ddc7e3..5e16df7758 100644
--- a/src/tree_sitter.c
+++ b/src/tree_sitter.c
@@ -19,17 +19,8 @@ Copyright (C) 2021 Free Software Foundation, Inc.
#include <config.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/param.h>
-#include <errno.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-
#include "lisp.h"
#include "buffer.h"
-#include "coding.h"
#include "tree_sitter.h"
/* parser.h defines a macro ADVANCE that conflicts with alloc.c. */
@@ -61,6 +52,16 @@ DEFUN ("tree-sitter-node-p",
/*** Parsing functions */
+static inline void
+ts_tree_edit_1 (TSTree *tree, ptrdiff_t start_byte,
+ ptrdiff_t old_end_byte, ptrdiff_t new_end_byte)
+{
+ TSPoint dummy_point = {0, 0};
+ TSInputEdit edit = {start_byte, old_end_byte, new_end_byte,
+ dummy_point, dummy_point, dummy_point};
+ ts_tree_edit (tree, &edit);
+}
+
/* Update each parser's tree after the user made an edit. This
function does not parse the buffer and only updates the tree. (So it
should be very fast.) */
@@ -68,18 +69,38 @@ DEFUN ("tree-sitter-node-p",
ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
ptrdiff_t new_end_byte)
{
+ eassert(start_byte <= old_end_byte);
+ eassert(start_byte <= new_end_byte);
+
Lisp_Object parser_list = Fsymbol_value (Qtree_sitter_parser_list);
- TSPoint dummy_point = {0, 0};
- TSInputEdit edit = {start_byte, old_end_byte, new_end_byte,
- dummy_point, dummy_point, dummy_point};
+
while (!NILP (parser_list))
{
Lisp_Object lisp_parser = Fcar (parser_list);
TSTree *tree = XTS_PARSER (lisp_parser)->tree;
if (tree != NULL)
- ts_tree_edit (tree, &edit);
- XTS_PARSER (lisp_parser)->need_reparse = true;
- parser_list = Fcdr (parser_list);
+ {
+ /* We "clip" the change to between visible_beg and
+ visible_end. It is okay if visible_end ends up larger
+ than BUF_Z, tree-sitter only access buffer text during
+ re-parse, and we will adjust visible_beg/end before
+ re-parse. */
+ ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
+ ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
+
+ ptrdiff_t visible_start =
+ max (visible_beg, start_byte) - visible_beg;
+ ptrdiff_t visible_old_end =
+ min (visible_end, old_end_byte) - visible_beg;
+ ptrdiff_t visible_new_end =
+ min (visible_end, new_end_byte) - visible_beg;
+
+ ts_tree_edit_1 (tree, visible_start, visible_old_end,
+ visible_new_end);
+ XTS_PARSER (lisp_parser)->need_reparse = true;
+
+ parser_list = Fcdr (parser_list);
+ }
}
}
@@ -93,16 +114,67 @@ ts_ensure_parsed (Lisp_Object parser)
TSParser *ts_parser = XTS_PARSER (parser)->parser;
TSTree *tree = XTS_PARSER(parser)->tree;
TSInput input = XTS_PARSER (parser)->input;
+ struct buffer *buffer = XTS_PARSER (parser)->buffer;
+
+ /* Before we parse, catch up with the narrowing situation. We
+ change visible_beg and visible_end to match BUF_BEGV_BYTE and
+ BUF_ZV_BYTE, and inform tree-sitter of the change. */
+ ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg;
+ ptrdiff_t visible_end = XTS_PARSER (parser)->visible_end;
+ /* Before re-parse, we want to move the visible range of tree-sitter
+ to matched the narrowed range. For example:
+ Move ________|____|__
+ to |____|__________ */
+
+ /* 1. Make sure visible_beg <= BUF_BEGV_BYTE. */
+ if (visible_beg > BUF_BEGV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: insert at the beginning. */
+ ts_tree_edit_1 (tree, 0, 0, visible_beg - BUF_BEGV_BYTE (buffer));
+ visible_beg = BUF_BEGV_BYTE (buffer);
+ }
+ /* 2. Make sure visible_end = BUF_ZV_BYTE. */
+ if (visible_end < BUF_ZV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: insert at the end. */
+ ts_tree_edit_1 (tree, visible_end - visible_beg,
+ visible_end - visible_beg,
+ BUF_ZV_BYTE (buffer) - visible_beg);
+ visible_end = BUF_ZV_BYTE (buffer);
+ }
+ else if (visible_end > BUF_ZV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: delete at the end. */
+ ts_tree_edit_1 (tree, BUF_ZV_BYTE (buffer) - visible_beg,
+ visible_end - visible_beg,
+ BUF_ZV_BYTE (buffer) - visible_beg);
+ visible_end = BUF_ZV_BYTE (buffer);
+ }
+ /* 3. Make sure visible_beg = BUF_BEGV_BYTE. */
+ if (visible_beg < BUF_BEGV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: delete at the beginning. */
+ ts_tree_edit_1 (tree, 0, BUF_BEGV_BYTE (buffer) - visible_beg, 0);
+ visible_beg = BUF_BEGV_BYTE (buffer);
+ }
+ XTS_PARSER (parser)->visible_beg = visible_beg;
+ XTS_PARSER (parser)->visible_end = visible_end;
+
TSTree *new_tree = ts_parser_parse(ts_parser, tree, input);
- /* This should be very rare: it only happens when 1) language is not
- set (impossible in Emacs because the user has to supply a
- language to create a parser), 2) parse canceled due to timeout
- (impossible because we don't set a timeout), 3) parse canceled
- due to cancellation flag (impossible because we don't set the
- flag). (See comments for ts_parser_parse in
+ /* This should be very rare (impossible, really): it only happens
+ when 1) language is not set (impossible in Emacs because the user
+ has to supply a language to create a parser), 2) parse canceled
+ due to timeout (impossible because we don't set a timeout), 3)
+ parse canceled due to cancellation flag (impossible because we
+ don't set the flag). (See comments for ts_parser_parse in
tree_sitter/api.h.) */
if (new_tree == NULL)
- signal_error ("Parse failed", parser);
+ {
+ Lisp_Object buf;
+ XSETBUFFER(buf, buffer);
+ xsignal1 (Qtree_sitter_parse_error, buf);
+ }
+
ts_tree_delete (tree);
XTS_PARSER (parser)->tree = new_tree;
XTS_PARSER (parser)->need_reparse = false;
@@ -110,13 +182,18 @@ ts_ensure_parsed (Lisp_Object parser)
}
/* This is the read function provided to tree-sitter to read from a
- buffer. It reads one character at a time and automatically skip
+ buffer. It reads one character at a time and automatically skips
the gap. */
const char*
-ts_read_buffer (void *buffer, uint32_t byte_index,
+ts_read_buffer (void *parser, uint32_t byte_index,
TSPoint position, uint32_t *bytes_read)
{
- ptrdiff_t byte_pos = byte_index + 1;
+ struct buffer *buffer = ((struct Lisp_TS_Parser *) parser)->buffer;
+ ptrdiff_t visible_beg = ((struct Lisp_TS_Parser *) parser)->visible_beg;
+ ptrdiff_t byte_pos = byte_index + visible_beg;
+ /* We will make sure visible_beg >= BUF_BEG_BYTE before re-parse (in
+ ts_ensure_parsed), so byte_pos will never be smaller than
+ BUF_BEG_BYTE (unless byte_index < 0). */
/* Read one character. Tree-sitter wants us to set bytes_read to 0
if it reads to the end of buffer. It doesn't say what it wants
@@ -126,26 +203,26 @@ ts_read_buffer (void *buffer, uint32_t byte_index,
int len;
/* This function could run from a user command, so it is better to
do nothing instead of raising an error. (It was a pain in the a**
- to read mega-if-conditions in Emacs source, so I write the two
- branches separately, hoping the compiler can merge them.) */
- if (!BUFFER_LIVE_P ((struct buffer *) buffer))
+ to decrypt mega-if-conditions in Emacs source, so I wrote the two
+ branches separately.) */
+ if (!BUFFER_LIVE_P (buffer))
{
beg = "";
len = 0;
}
- // TODO BUF_ZV_BYTE?
- else if (byte_pos >= BUF_Z_BYTE ((struct buffer *) buffer))
+ /* Reached visible end-of-buffer, tell tree-sitter to read no more. */
+ else if (byte_pos >= BUF_ZV_BYTE (buffer))
{
beg = "";
len = 0;
}
+ /* Normal case, read a character. */
else
{
beg = (char *) BUF_BYTE_ADDRESS (buffer, byte_pos);
- len = BYTES_BY_CHAR_HEAD ((int) beg);
+ len = BYTES_BY_CHAR_HEAD ((int) *beg);
}
*bytes_read = (uint32_t) len;
-
return beg;
}
@@ -158,13 +235,16 @@ make_ts_parser (struct buffer *buffer, TSParser *parser,
{
struct Lisp_TS_Parser *lisp_parser
= ALLOCATE_PSEUDOVECTOR (struct Lisp_TS_Parser, name, PVEC_TS_PARSER);
+
lisp_parser->name = name;
lisp_parser->buffer = buffer;
lisp_parser->parser = parser;
lisp_parser->tree = tree;
- TSInput input = {buffer, ts_read_buffer, TSInputEncodingUTF8};
+ TSInput input = {lisp_parser, ts_read_buffer, TSInputEncodingUTF8};
lisp_parser->input = input;
lisp_parser->need_reparse = true;
+ lisp_parser->visible_beg = BUF_BEGV (buffer);
+ lisp_parser->visible_end = BUF_ZV (buffer);
return make_lisp_ptr (lisp_parser, Lisp_Vectorlike);
}
@@ -287,7 +367,7 @@ DEFUN ("tree-sitter-parse-string",
/* See comment in ts_ensure_parsed for possible reasons for a
failure. */
if (tree == NULL)
- signal_error ("Failed to parse STRING", string);
+ xsignal1 (Qtree_sitter_parse_error, string);
TSNode root_node = ts_tree_root_node (tree);
@@ -535,7 +615,9 @@ DEFUN ("tree-sitter-node-first-child-for-byte",
{
CHECK_INTEGER (pos);
- struct buffer *buf = (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ struct buffer *buf = XTS_PARSER (XTS_NODE (node)->parser)->buffer;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
ptrdiff_t byte_pos = XFIXNUM (pos);
if (byte_pos < BUF_BEGV_BYTE (buf) || byte_pos > BUF_ZV_BYTE (buf))
@@ -544,9 +626,10 @@ DEFUN ("tree-sitter-node-first-child-for-byte",
TSNode ts_node = XTS_NODE (node)->node;
TSNode child;
if (NILP (named))
- child = ts_node_first_child_for_byte (ts_node, byte_pos - 1);
+ child = ts_node_first_child_for_byte (ts_node, byte_pos - visible_beg);
else
- child = ts_node_first_named_child_for_byte (ts_node, byte_pos - 1);
+ child = ts_node_first_named_child_for_byte
+ (ts_node, byte_pos - visible_beg);
if (ts_node_is_null(child))
return Qnil;
@@ -566,7 +649,9 @@ DEFUN ("tree-sitter-node-descendant-for-byte-range",
CHECK_INTEGER (beg);
CHECK_INTEGER (end);
- struct buffer *buf = (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ struct buffer *buf = XTS_PARSER (XTS_NODE (node)->parser)->buffer;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
ptrdiff_t byte_beg = XFIXNUM (beg);
ptrdiff_t byte_end = XFIXNUM (end);
@@ -580,10 +665,10 @@ DEFUN ("tree-sitter-node-descendant-for-byte-range",
TSNode child;
if (NILP (named))
child = ts_node_descendant_for_byte_range
- (ts_node, byte_beg - 1 , byte_end - 1);
+ (ts_node, byte_beg - visible_beg , byte_end - visible_beg);
else
child = ts_node_named_descendant_for_byte_range
- (ts_node, byte_beg - 1, byte_end - 1);
+ (ts_node, byte_beg - visible_beg, byte_end - visible_beg);
if (ts_node_is_null(child))
return Qnil;
@@ -593,31 +678,24 @@ DEFUN ("tree-sitter-node-descendant-for-byte-range",
/* Query functions */
-Lisp_Object ts_query_error_to_string (TSQueryError error)
+char*
+ts_query_error_to_string (TSQueryError error)
{
- char *error_name;
switch (error)
{
case TSQueryErrorNone:
- error_name = "none";
- break;
+ return "none";
case TSQueryErrorSyntax:
- error_name = "syntax";
- break;
+ return "syntax";
case TSQueryErrorNodeType:
- error_name = "node type";
- break;
+ return "node type";
case TSQueryErrorField:
- error_name = "field";
- break;
+ return "field";
case TSQueryErrorCapture:
- error_name = "capture";
- break;
+ return "capture";
case TSQueryErrorStructure:
- error_name = "structure";
- break;
+ return "structure";
}
- return make_pure_c_string (error_name, strlen(error_name));
}
DEFUN ("tree-sitter-query-capture",
@@ -634,7 +712,7 @@ DEFUN ("tree-sitter-query-capture",
BEG and END, if _both_ non-nil, specifies the range in which the query
is executed.
-Return nil if the query failed. */)
+Raise an tree-sitter-query-error if PATTERN is malformed. */)
(Lisp_Object node, Lisp_Object pattern,
Lisp_Object beg, Lisp_Object end)
{
@@ -643,47 +721,56 @@ DEFUN ("tree-sitter-query-capture",
TSNode ts_node = XTS_NODE (node)->node;
Lisp_Object lisp_parser = XTS_NODE (node)->parser;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
const TSLanguage *lang = ts_parser_language
(XTS_PARSER (lisp_parser)->parser);
char *source = SSDATA (pattern);
+
uint32_t error_offset;
- uint32_t error_type;
+ TSQueryError error_type;
TSQuery *query = ts_query_new (lang, source, strlen (source),
&error_offset, &error_type);
TSQueryCursor *cursor = ts_query_cursor_new ();
if (query == NULL)
{
- // FIXME: Signal an error?
- return Qnil;
+ // FIXME: Still crashes, debug when I can get a gdb.
+ xsignal2 (Qtree_sitter_query_error,
+ make_fixnum (error_offset),
+ build_string (ts_query_error_to_string (error_type)));
}
if (!NILP (beg) && !NILP (end))
{
EMACS_INT beg_byte = XFIXNUM (beg);
EMACS_INT end_byte = XFIXNUM (end);
ts_query_cursor_set_byte_range
- (cursor, (uint32_t) beg_byte - 1, (uint32_t) end_byte - 1);
+ (cursor, (uint32_t) beg_byte - visible_beg,
+ (uint32_t) end_byte - visible_beg);
}
ts_query_cursor_exec (cursor, query, ts_node);
TSQueryMatch match;
- TSQueryCapture capture;
+
Lisp_Object result = Qnil;
- Lisp_Object entry;
- Lisp_Object captured_node;
- const char *capture_name;
- uint32_t capture_name_len;
while (ts_query_cursor_next_match (cursor, &match))
{
const TSQueryCapture *captures = match.captures;
for (int idx=0; idx < match.capture_count; idx++)
{
+ TSQueryCapture capture;
+ Lisp_Object captured_node;
+ const char *capture_name;
+ Lisp_Object entry;
+ uint32_t capture_name_len;
+
capture = captures[idx];
captured_node = make_ts_node(lisp_parser, capture.node);
capture_name = ts_query_capture_name_for_id
(query, capture.index, &capture_name_len);
- entry = Fcons (intern_c_string (capture_name),
+ entry = Fcons (intern_c_string_1
+ (capture_name, capture_name_len),
captured_node);
result = Fcons (entry, result);
}
@@ -705,11 +792,15 @@ syms_of_tree_sitter (void)
DEFSYM (Qhas_changes, "has-changes");
DEFSYM (Qhas_error, "has-error");
+ DEFSYM(Qtree_sitter_error, "tree-sitter-error");
DEFSYM (Qtree_sitter_query_error, "tree-sitter-query-error");
- Fput (Qtree_sitter_query_error, Qerror_conditions,
- pure_list (Qtree_sitter_query_error, Qerror));
- Fput (Qtree_sitter_query_error, Qerror_message,
- build_pure_c_string ("Error with query pattern"))
+ DEFSYM (Qtree_sitter_parse_error, "tree-sitter-parse-error")
+ define_error (Qtree_sitter_error, "Generic tree-sitter error", Qerror);
+ define_error (Qtree_sitter_query_error, "Query pattern is malformed",
+ Qtree_sitter_error);
+ define_error (Qtree_sitter_parse_error, "Parse failed",
+ Qtree_sitter_error);
+
DEFSYM (Qtree_sitter_parser_list, "tree-sitter-parser-list");
DEFVAR_LISP ("tree-sitter-parser-list", Vtree_sitter_parser_list,
diff --git a/src/tree_sitter.h b/src/tree_sitter.h
index e9b4a71326..7e0fec0ee9 100644
--- a/src/tree_sitter.h
+++ b/src/tree_sitter.h
@@ -20,8 +20,6 @@ Copyright (C) 2021 Free Software Foundation, Inc.
#ifndef EMACS_TREE_SITTER_H
#define EMACS_TREE_SITTER_H
-#include <sys/types.h>
-
#include "lisp.h"
#include <tree_sitter/api.h>
@@ -33,12 +31,25 @@ #define EMACS_TREE_SITTER_H
struct Lisp_TS_Parser
{
union vectorlike_header header;
+ /* A parser's name is just a convenient tag, see docstring for
+ 'tree-sitter-make-parser', and 'tree-sitter-get-parser'. */
Lisp_Object name;
struct buffer *buffer;
TSParser *parser;
TSTree *tree;
TSInput input;
+ /* Re-parsing an unchanged buffer is not free for tree-sitter, so we
+ only make it re-parse when need_reparse == true. That usually
+ means some change is made in the buffer. But others could set
+ this field to true to force tree-sitter to re-parse. */
bool need_reparse;
+ /* This two positions record the byte position of the "visible
+ region" that tree-sitter sees. Unlike markers, These two
+ positions do not change as the user inserts and deletes text
+ around them. Before re-parse, we move these positions to match
+ BUF_BEGV_BYTE and BUF_ZV_BYTE. */
+ ptrdiff_t visible_beg;
+ ptrdiff_t visible_end;
};
/* A wrapper around a tree-sitter node. */
diff --git a/test/src/tree-sitter-tests.el b/test/src/tree-sitter-tests.el
index c61ad678d2..69104568de 100644
--- a/test/src/tree-sitter-tests.el
+++ b/test/src/tree-sitter-tests.el
@@ -148,5 +148,58 @@ tree-sitter-query-api
(cdr entry))))
(tree-sitter-query-capture root-node pattern)))))))
+(ert-deftest tree-sitter-narrow ()
+ "Tests if narrowing works."
+ (with-temp-buffer
+ (let (parser root-node pattern doc-node object-node pair-node)
+ (progn
+ (insert "xxx[1,{\"name\": \"Bob\"},2,3]xxx")
+ (narrow-to-region (+ (point-min) 3) (- (point-max) 3))
+ (setq parser (tree-sitter-create-parser
+ (current-buffer) (tree-sitter-json)))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
+ ;; This test is from the basic test.
+ (should
+ (equal
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number) (number)))"))
+
+ (widen)
+ (goto-char (point-min))
+ (insert "ooo")
+ (should (equal "oooxxx[1,{\"name\": \"Bob\"},2,3]xxx"
+ (buffer-string)))
+ (delete-region 10 26)
+ (should (equal "oooxxx[1,2,3]xxx"
+ (buffer-string)))
+ (narrow-to-region (+ (point-min) 6) (- (point-max) 3))
+ ;; This test is also from the basic test.
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number) (number)))"))
+ (widen)
+ (goto-char (point-max))
+ (insert "[1,2]")
+ (should (equal "oooxxx[1,2,3]xxx[1,2]"
+ (buffer-string)))
+ (narrow-to-region (- (point-max) 5) (point-max))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number)))"))
+ (widen)
+ (goto-char (point-min))
+ (insert "[1]")
+ (should (equal "[1]oooxxx[1,2,3]xxx[1,2]"
+ (buffer-string)))
+ (narrow-to-region (point-min) (+ (point-min) 3))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number)))")))))
+
(provide 'tree-sitter-tests)
;;; tree-sitter-tests.el ends here
--
2.24.3 (Apple Git-128)
^ permalink raw reply related [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 16:41 ` Eli Zaretskii
@ 2021-07-29 22:58 ` Stephen Leake
2021-07-30 6:00 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-07-29 22:58 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, cpitclaudel, monnier, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Wed, 28 Jul 2021 12:36:33 -0400
>> Cc: Eli Zaretskii <eliz@gnu.org>,
>> emacs-devel <emacs-devel@gnu.org>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> monnier@iro.umontreal.ca
>>
>> > So don't send a change that deletes the hidden text; just send changes
>> > in the visible part of the text (that's the only place the user can make
>> > changes). tree-sitter will only run the scanner on the change regions,
>> > so it will only request text from the visible part of the buffer;
>> > all the requests will succeed.
>>
>> Then we are not hiding the hidden text from tree-sitter. The
>> implementation you described, IIUC, is essentially do nothing
>> special when the buffer is narrowed.
>
> If the TS parser is called while the narrowing is in effect, it will
> be unable to access text beyond BEGV..ZV. So in that case the
> narrowing _will_ affect TS.
Please read again; TS is affected in principle, but in practice, in the
absence of programming errors, it will never try to access text outside
the narrowing, so it won't notice.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 17:54 ` Eli Zaretskii
2021-07-28 18:46 ` Yuan Fu
@ 2021-07-29 23:01 ` Stephen Leake
1 sibling, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-29 23:01 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, emacs-devel, cpitclaudel, monnier
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Wed, 28 Jul 2021 13:47:42 -0400
>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
>> cpitclaudel@gmail.com,
>> monnier@iro.umontreal.ca,
>> emacs-devel@gnu.org
>>
>> Could you describe the desired effect on tree-sitter when the buffer is narrowed?
>
> The behavior should be the same as if the text before and after the
> narrowed region didn't exist.
That would be true for the multi-major-mode use case, but not for the
temporarily narrow-to-defun case.
In other words, this should be up to the major mode to determine; the
low-level code should support either case (there are probably other use
cases out there).
>> If we just deny accessibility of the hidden region from tree-sitter,
>> tree-sitter is still aware of the hidden text, because it has
>> previously parsed the hidden text and stored the result in the parse
>> tree.
>
> The adherence to narrowing is for the use cases where TS is _always_
> invoked on the same narrowed region.
right; the multi-major-mode case.
> You seem to be thinking about changes in the narrowing while TS is
> parsing, or between consecutive re-parsing calls, but I see no
> interesting/important use cases which would need to do that. And if
> there are some tricky cases which do need this, the respective Lisp
> programs will have to deal with the problem.
Right.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 18:46 ` Yuan Fu
2021-07-28 19:00 ` Eli Zaretskii
@ 2021-07-29 23:06 ` Stephen Leake
2021-07-30 0:35 ` Richard Stallman
2 siblings, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-07-29 23:06 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, cpitclaudel, monnier
Yuan Fu <casouri@gmail.com> writes:
>> On Jul 28, 2021, at 1:54 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>>
>>> From: Yuan Fu <casouri@gmail.com>
>> The adherence to narrowing is for the use cases where TS is _always_
>> invoked on the same narrowed region. You seem to be thinking about
>> changes in the narrowing while TS is parsing, or between consecutive
>> re-parsing calls, but I see no interesting/important use cases which
>> would need to do that. And if there are some tricky cases which do
>> need this, the respective Lisp programs will have to deal with the
>> problem.
>
> That makes sense. However it bring up a problem. Consider such a
> buffer: XXAAXX.
There is always a delimiter in the text that defines the boundary
between XX and AA; say "{{" for example, with "}}" at the other end of
AA.
> Say lisp narrows to AA and creates a tree-sitter parser. Then lisp
> widens the buffer, and user inserts B in front of AA. Now the buffer
> is XXBAAXX.
before or after the delimiter?
XX {{ BAA }} XX : B is a change to AA
XXB {{ AA }} XX : B is a change to XX
> Emacs has two options to convey this change to the tree-sitter parser:
> 1) it does not, then tree-sitter still thinks the buffer is AA,
> essentially the portion where tree-sitter sees is pushed forward by
> one character, 2) it tells tree-sitter the user inserted a character
> at the beginning, then tree-sitter thinks the buffer is BAA. Which
> option is correct depends on how does lisp later narrows: if lisp
> narrows to AA, then option 1 is correct, if lisp narrows to BAA, then
> option 2 is correct. But how do we know which option is correct before
> lisp narrows?
The major mode determines the boundaries and the narrowing, so leave it
up to that code to be consistent, not your code.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-28 18:46 ` Yuan Fu
2021-07-28 19:00 ` Eli Zaretskii
2021-07-29 23:06 ` How to add pseudo vector types Stephen Leake
@ 2021-07-30 0:35 ` Richard Stallman
2021-07-30 0:46 ` Alexandre Garreau
2021-07-30 6:35 ` Eli Zaretskii
2 siblings, 2 replies; 370+ messages in thread
From: Richard Stallman @ 2021-07-30 0:35 UTC (permalink / raw)
To: Yuan Fu; +Cc: eliz, emacs-devel, stephen_leake, cpitclaudel, monnier
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> s two options to convey this change to the tree-sitter parser: 1)
> it does not, then tree-sitter still thinks the buffer is AA,
> essentially the portion where tree-sitter sees is pushed forward
> by one character, 2) it tells tree-sitter the user inserted a
> character at the beginning, then tree-sitter thinks the buffer is
> BAA.
> Which option is correct depends on how does lisp later narrows: if
> lisp narrows to AA, then option 1 is correct, if lisp narrows to
> BAA, then option 2 is correct. But how do we know which option is
> correct before lisp narrows?
I suggest we create a way for the program to declare the purpose for
each instance of narrowing.
I know of two kinds of purposes for using narrowing.
1. To focus operations on syntactic entity in a buffer containing
other things which are essentially unrelated. Let's call this "semantic" narrowing.
For instance, when Rmail narrows the file buffer to just one message,
that is semantic narrowing. Whatever is outside the buffer bounds is
unrelated to parsing the current message.
2. To show just part of the text you're looking at. This is a display
feature, usually temporary, and would be enabled or disabled by the
user. Let's call it "display" narrowing.
I don't think Emacs can tell heuristically which kind of narrowing a
program is doing.
I propose we create a way for Lisp programs to declare when they do
semantic narrowing. They could specify markers for the beginning and
end of that narrowing.
Facilities for parsing the buffer should heed semantic narrowing but
disregard display narrowing.
Various kinds of semantic narrowing should be able to nest, and
display narrowing should be able to nest inside semantic narrowings.
Comments or critiques?
--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-30 0:35 ` Richard Stallman
@ 2021-07-30 0:46 ` Alexandre Garreau
2021-07-30 6:35 ` Eli Zaretskii
1 sibling, 0 replies; 370+ messages in thread
From: Alexandre Garreau @ 2021-07-30 0:46 UTC (permalink / raw)
To: emacs-devel, rms; +Cc: Yuan Fu, stephen_leake, eliz, cpitclaudel, monnier
Le vendredi 30 juillet 2021, 02:35:33 CEST Richard Stallman a écrit :
> I propose we create a way for Lisp programs to declare when they do
> semantic narrowing. They could specify markers for the beginning and
> end of that narrowing.
>
> Facilities for parsing the buffer should heed semantic narrowing but
> disregard display narrowing.
>
> Various kinds of semantic narrowing should be able to nest, and
> display narrowing should be able to nest inside semantic narrowings.
>
> Comments or critiques?
Nesting? that’s very interesting, I always felt that emacs’ separation of
data in “atomic” buffers, unnested, was limiting… Couldn’t a such facility
come with some semantics that could ease the working with multi-modes and
multiple-formats files, such as php files (including html), html pages
(including javascript and css), org-mode and its source blocks (currently
opening another buffer to work), makefiles including a lot of shell-script
programs, bison/yacc files including C, etc.?
PS: that makes me think of some other reaaaally handy feature that would
be so convenient: the ability to *include* the content of a buffer inside
some other buffer, so both’s data are connected, and you can see many small
files’ content at once while working on some multi-semantics file… but maybe
it’s a stupid/useless idea (it could be synchronized maybe? or be overly
difficult, dunno u.u)
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-29 22:58 ` Stephen Leake
@ 2021-07-30 6:00 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-30 6:00 UTC (permalink / raw)
To: Stephen Leake; +Cc: casouri, cpitclaudel, monnier, emacs-devel
> From: Stephen Leake <stephen_leake@stephe-leake.org>
> Cc: Yuan Fu <casouri@gmail.com>, emacs-devel@gnu.org,
> cpitclaudel@gmail.com, monnier@iro.umontreal.ca
> Date: Thu, 29 Jul 2021 15:58:39 -0700
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> >> From: Yuan Fu <casouri@gmail.com>
> >> Date: Wed, 28 Jul 2021 12:36:33 -0400
> >> Cc: Eli Zaretskii <eliz@gnu.org>,
> >> emacs-devel <emacs-devel@gnu.org>,
> >> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> >> monnier@iro.umontreal.ca
> >>
> >> > So don't send a change that deletes the hidden text; just send changes
> >> > in the visible part of the text (that's the only place the user can make
> >> > changes). tree-sitter will only run the scanner on the change regions,
> >> > so it will only request text from the visible part of the buffer;
> >> > all the requests will succeed.
> >>
> >> Then we are not hiding the hidden text from tree-sitter. The
> >> implementation you described, IIUC, is essentially do nothing
> >> special when the buffer is narrowed.
> >
> > If the TS parser is called while the narrowing is in effect, it will
> > be unable to access text beyond BEGV..ZV. So in that case the
> > narrowing _will_ affect TS.
>
> Please read again; TS is affected in principle, but in practice, in the
> absence of programming errors, it will never try to access text outside
> the narrowing, so it won't notice.
Sorry, I don't understand what you wanted me to re-read. As the
subsequent discussions revealed, Yuan had in mind a scenario where the
text outside of the restriction was changed.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-30 0:35 ` Richard Stallman
2021-07-30 0:46 ` Alexandre Garreau
@ 2021-07-30 6:35 ` Eli Zaretskii
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-30 6:35 UTC (permalink / raw)
To: rms; +Cc: casouri, emacs-devel, stephen_leake, cpitclaudel, monnier
> From: Richard Stallman <rms@gnu.org>
> Cc: eliz@gnu.org, cpitclaudel@gmail.com,
> stephen_leake@stephe-leake.org, monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
> Date: Thu, 29 Jul 2021 20:35:33 -0400
>
> I suggest we create a way for the program to declare the purpose for
> each instance of narrowing.
>
> I know of two kinds of purposes for using narrowing.
>
> 1. To focus operations on syntactic entity in a buffer containing
> other things which are essentially unrelated. Let's call this "semantic" narrowing.
>
> For instance, when Rmail narrows the file buffer to just one message,
> that is semantic narrowing. Whatever is outside the buffer bounds is
> unrelated to parsing the current message.
>
> 2. To show just part of the text you're looking at. This is a display
> feature, usually temporary, and would be enabled or disabled by the
> user. Let's call it "display" narrowing.
So another way of discerning between the two is to distinguish the
"Lisp narrowing" from the "user narrowing".
> I don't think Emacs can tell heuristically which kind of narrowing a
> program is doing.
If we agree that the second kind is only done by the user, then no
heuristic is needed.
But I agree that having this recorded explicitly would be a good idea.
We could provide something similar to prog-indentation-context for
this purpose.
> I propose we create a way for Lisp programs to declare when they do
> semantic narrowing. They could specify markers for the beginning and
> end of that narrowing.
>
> Facilities for parsing the buffer should heed semantic narrowing but
> disregard display narrowing.
>
> Various kinds of semantic narrowing should be able to nest, and
> display narrowing should be able to nest inside semantic narrowings.
>
> Comments or critiques?
We had a long discussion of a similar proposal, see
https://lists.gnu.org/archive/html/emacs-devel/2017-02/msg00765.html
At the time, we were unable to come to an agreed-upon design, so this
feature was never implemented in mainline Emacs. Maybe we should
revisit it now.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-29 18:57 ` Yuan Fu
@ 2021-07-30 6:47 ` Eli Zaretskii
2021-07-30 14:17 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-07-30 6:47 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 29 Jul 2021 14:57:19 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> >> Actually, that sounds like how it works in my code right now. After the last few exchanges, I still have the feeling that we are not on the same page. Could you have a look at the code in ts_ensure_parsed and ts_record_change, and see if it aligns with what you consider to be the right thing? If you have read them already and think you understand what are they doing, could you tell me how exactly should these two functions behave, in your opinion? Thanks.
> >
> > Where do I find the latest version of the code?
>
> A few messages back I attached a patch, ts.5.patch. Actually I can just attach it again, here.
That's not the whole code, that's a patch against some previous
version of the code. So I cannot answer your questions with 100%
certainty, until I see the entire code of the TS support. For
example, I'm not sure I have a clear idea when are the two functions
ts_ensure_parsed and ts_record_change called.
That said, it looks like the code is correct: you should record the
changes in the entire buffer, but only pass to TS the changes inside
the restriction BEGV..ZV that is in effect at the time of the re-parse
call. Btw, I don't see the code that filters changes reported to TS
by their positions against the restriction; did I miss something?
And one more question: I understand that ts_read_buffer doesn't check
against BUF_BEGV_BYTE because TS never reads before the "visible beg"
position, is that right? But if so, why do we need the similar test
for BUF_ZV_BYTE? could TS attempt to read beyond the "visible end"?
Thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-30 6:47 ` Eli Zaretskii
@ 2021-07-30 14:17 ` Yuan Fu
2021-08-03 10:24 ` Fu Yuan
2021-08-03 11:47 ` Eli Zaretskii
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-07-30 14:17 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Stephen Leake, Stefan Monnier,
emacs-devel
>
> That's not the whole code, that's a patch against some previous
> version of the code. So I cannot answer your questions with 100%
> certainty, until I see the entire code of the TS support. For
> example, I'm not sure I have a clear idea when are the two functions
> ts_ensure_parsed and ts_record_change called.
Oops, I thought you have all prior patches. You can clone the “ts” branch from
https://github.com/casouri/emacs.git
If this is ok, I’ll push to this branch instead of sending patches from now on.
>
> That said, it looks like the code is correct: you should record the
> changes in the entire buffer, but only pass to TS the changes inside
> the restriction BEGV..ZV that is in effect at the time of the re-parse
> call. Btw, I don't see the code that filters changes reported to TS
> by their positions against the restriction; did I miss something?
Yes, I do clip the change to the visible portion:
ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
ptrdiff_t new_end_byte)
{
eassert(start_byte <= old_end_byte);
eassert(start_byte <= new_end_byte);
Lisp_Object parser_list = Fsymbol_value (Qtree_sitter_parser_list);
while (!NILP (parser_list))
{
Lisp_Object lisp_parser = Fcar (parser_list);
TSTree *tree = XTS_PARSER (lisp_parser)->tree;
if (tree != NULL)
{
/* We "clip" the change to between visible_beg and
visible_end. It is okay if visible_end ends up larger
than BUF_Z, tree-sitter only access buffer text during
re-parse, and we will adjust visible_beg/end before
re-parse. */
ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
ptrdiff_t visible_start =
max (visible_beg, start_byte) - visible_beg;
ptrdiff_t visible_old_end =
min (visible_end, old_end_byte) - visible_beg;
ptrdiff_t visible_new_end =
min (visible_end, new_end_byte) - visible_beg;
ts_tree_edit_1 (tree, visible_start, visible_old_end,
visible_new_end);
XTS_PARSER (lisp_parser)->need_reparse = true;
parser_list = Fcdr (parser_list);
}
}
}
> And one more question: I understand that ts_read_buffer doesn't check
> against BUF_BEGV_BYTE because TS never reads before the "visible beg"
> position, is that right?
Yes, we always update visible_beg and visible_end to match BUF_BEGV_BYTE and BUF_ZV_BYTE before we instruct tree-sitter to re-parse. So when tree-sitter reads at byte position 0, it translates to buffer byte position 0 + visible_beg = BUF_BEGV_BYTE.
> But if so, why do we need the similar test
> for BUF_ZV_BYTE? could TS attempt to read beyond the "visible end”?
Tree-sitter doesn’t know the size of the buffer, it just keeps reading until the read function sets bytes_read to 0, signaling that it has reached the end.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-30 14:17 ` Yuan Fu
@ 2021-08-03 10:24 ` Fu Yuan
2021-08-03 11:42 ` Eli Zaretskii
2021-08-03 11:47 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Fu Yuan @ 2021-08-03 10:24 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Stephen Leake, Stefan Monnier,
emacs-devel
I’m about to change all lisp-facing functions from using byte position to using point. Point is much easier to work with. If lisp wants byte positions, they can just convert from point themselves. Any objections?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 10:24 ` Fu Yuan
@ 2021-08-03 11:42 ` Eli Zaretskii
2021-08-03 11:53 ` Fu Yuan
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-08-03 11:42 UTC (permalink / raw)
To: Fu Yuan; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Fu Yuan <casouri@gmail.com>
> Date: Tue, 3 Aug 2021 06:24:34 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
>
> I’m about to change all lisp-facing functions from using byte position to using point.
I don't understand how can you do this. Point is set by Lisp, and
generally cannot be changed from C, except for very short durations of
time (or if the C code is the implementation of a Lisp command that
just moved point). If you need to access some buffer position, you
cannot in general use point, because you cannot control where point
is.
> Point is much easier to work with.
In what way is it easier? I feel that I'm missing something here.
> If lisp wants byte positions, they can just convert from point themselves.
??? What do you mean by that? Can you show an example?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-07-30 14:17 ` Yuan Fu
2021-08-03 10:24 ` Fu Yuan
@ 2021-08-03 11:47 ` Eli Zaretskii
2021-08-03 12:00 ` Fu Yuan
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-08-03 11:47 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 30 Jul 2021 10:17:22 -0400
> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> > That said, it looks like the code is correct: you should record the
> > changes in the entire buffer, but only pass to TS the changes inside
> > the restriction BEGV..ZV that is in effect at the time of the re-parse
> > call. Btw, I don't see the code that filters changes reported to TS
> > by their positions against the restriction; did I miss something?
>
> Yes, I do clip the change to the visible portion:
>
> ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
> ptrdiff_t new_end_byte)
> {
> eassert(start_byte <= old_end_byte);
> eassert(start_byte <= new_end_byte);
>
> Lisp_Object parser_list = Fsymbol_value (Qtree_sitter_parser_list);
>
> while (!NILP (parser_list))
> {
> Lisp_Object lisp_parser = Fcar (parser_list);
> TSTree *tree = XTS_PARSER (lisp_parser)->tree;
> if (tree != NULL)
> {
> /* We "clip" the change to between visible_beg and
> visible_end. It is okay if visible_end ends up larger
> than BUF_Z, tree-sitter only access buffer text during
> re-parse, and we will adjust visible_beg/end before
> re-parse. */
> ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
> ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
>
> ptrdiff_t visible_start =
> max (visible_beg, start_byte) - visible_beg;
> ptrdiff_t visible_old_end =
> min (visible_end, old_end_byte) - visible_beg;
> ptrdiff_t visible_new_end =
> min (visible_end, new_end_byte) - visible_beg;
>
> ts_tree_edit_1 (tree, visible_start, visible_old_end,
> visible_new_end);
> XTS_PARSER (lisp_parser)->need_reparse = true;
>
> parser_list = Fcdr (parser_list);
Hmm... so a change that begins before the restriction and ends inside
the restriction will be sent as if it began at BEGV? And the rest of
the change will be discarded? Shouldn't you split such changes in
tow, send to TS the part inside the restriction, and store the rest
for the future, when/if the buffer is widened?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 11:42 ` Eli Zaretskii
@ 2021-08-03 11:53 ` Fu Yuan
2021-08-03 12:21 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Fu Yuan @ 2021-08-03 11:53 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> 在 2021年8月3日,上午7:42,Eli Zaretskii <eliz@gnu.org> 写道:
>
>
>>
>> From: Fu Yuan <casouri@gmail.com>
>> Date: Tue, 3 Aug 2021 06:24:34 -0400
>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
>>
>> I’m about to change all lisp-facing functions from using byte position to using point.
>
> I don't understand how can you do this. Point is set by Lisp, and
> generally cannot be changed from C, except for very short durations of
> time (or if the C code is the implementation of a Lisp command that
> just moved point). If you need to access some buffer position, you
> cannot in general use point, because you cannot control where point
> is.
>
>> Point is much easier to work with.
>
> In what way is it easier? I feel that I'm missing something here.
>
>> If lisp wants byte positions, they can just convert from point themselves.
>
> ??? What do you mean by that? Can you show an example?
Oh no, I don’t mean that. I meant that, for example, functions like node_start_byte, which returns the byte position of the beginning of the node, will now be node_start_pos, which returns a point position. And if I want the byte position of the beginning of the node, I can use (position-to-byte (tree-sitter-node-start-pos node))
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 11:47 ` Eli Zaretskii
@ 2021-08-03 12:00 ` Fu Yuan
2021-08-03 12:24 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Fu Yuan @ 2021-08-03 12:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> 在 2021年8月3日,上午7:48,Eli Zaretskii <eliz@gnu.org> 写道:
>
>
>>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 30 Jul 2021 10:17:22 -0400
>> Cc: Stephen Leake <stephen_leake@stephe-leake.org>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Stefan Monnier <monnier@iro.umontreal.ca>,
>> emacs-devel@gnu.org
>>
>>> That said, it looks like the code is correct: you should record the
>>> changes in the entire buffer, but only pass to TS the changes inside
>>> the restriction BEGV..ZV that is in effect at the time of the re-parse
>>> call. Btw, I don't see the code that filters changes reported to TS
>>> by their positions against the restriction; did I miss something?
>>
>> Yes, I do clip the change to the visible portion:
>>
>> ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
>> ptrdiff_t new_end_byte)
>> {
>> eassert(start_byte <= old_end_byte);
>> eassert(start_byte <= new_end_byte);
>>
>> Lisp_Object parser_list = Fsymbol_value (Qtree_sitter_parser_list);
>>
>> while (!NILP (parser_list))
>> {
>> Lisp_Object lisp_parser = Fcar (parser_list);
>> TSTree *tree = XTS_PARSER (lisp_parser)->tree;
>> if (tree != NULL)
>> {
>> /* We "clip" the change to between visible_beg and
>> visible_end. It is okay if visible_end ends up larger
>> than BUF_Z, tree-sitter only access buffer text during
>> re-parse, and we will adjust visible_beg/end before
>> re-parse. */
>> ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
>> ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
>>
>> ptrdiff_t visible_start =
>> max (visible_beg, start_byte) - visible_beg;
>> ptrdiff_t visible_old_end =
>> min (visible_end, old_end_byte) - visible_beg;
>> ptrdiff_t visible_new_end =
>> min (visible_end, new_end_byte) - visible_beg;
>>
>> ts_tree_edit_1 (tree, visible_start, visible_old_end,
>> visible_new_end);
>> XTS_PARSER (lisp_parser)->need_reparse = true;
>>
>> parser_list = Fcdr (parser_list);
>
> Hmm... so a change that begins before the restriction and ends inside
> the restriction will be sent as if it began at BEGV? And the rest of
> the change will be discarded? Shouldn't you split such changes in
> tow, send to TS the part inside the restriction, and store the rest
> for the future, when/if the buffer is widened?
Tree-sitter doesn’t care about the content in a change, it will re-scan the buffer content when it re-parses. We only need to inform it the range of the change, so it knows where to re-scan when it re-parses. When the buffer is widened, we will tell tree-sitter that range [BUF_BEG, BUF_BEGV] has changed, and it will re-scan that part when re-parsing. So the part outside the narrowed region will be parsed correctly.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 11:53 ` Fu Yuan
@ 2021-08-03 12:21 ` Eli Zaretskii
2021-08-03 12:50 ` Fu Yuan
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-08-03 12:21 UTC (permalink / raw)
To: Fu Yuan; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Fu Yuan <casouri@gmail.com>
> Date: Tue, 3 Aug 2021 07:53:54 -0400
> Cc: stephen_leake@stephe-leake.org, cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca, emacs-devel@gnu.org
>
> Oh no, I don’t mean that. I meant that, for example, functions like node_start_byte, which returns the byte position of the beginning of the node, will now be node_start_pos, which returns a point position.
That's called "character position". Let's use the accepted
terminology, to minimize misunderstandings.
So in what sense are character positions easier to use than byte
positions?
> And if I want the byte position of the beginning of the node, I can use (position-to-byte (tree-sitter-node-start-pos node))
Caveat: position-to-byte can be expensive. So in time-critical code,
such as the display engine, we keep both character position and byte
position, and update them in sync. Then you can use whichever is
easier in each case.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 12:00 ` Fu Yuan
@ 2021-08-03 12:24 ` Eli Zaretskii
2021-08-03 13:00 ` Fu Yuan
2021-08-03 13:28 ` Stefan Monnier
0 siblings, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-08-03 12:24 UTC (permalink / raw)
To: Fu Yuan; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Fu Yuan <casouri@gmail.com>
> Date: Tue, 3 Aug 2021 08:00:46 -0400
> Cc: stephen_leake@stephe-leake.org, cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca, emacs-devel@gnu.org
>
> > Hmm... so a change that begins before the restriction and ends inside
> > the restriction will be sent as if it began at BEGV? And the rest of
> > the change will be discarded? Shouldn't you split such changes in
> > tow, send to TS the part inside the restriction, and store the rest
> > for the future, when/if the buffer is widened?
>
> Tree-sitter doesn’t care about the content in a change, it will re-scan the buffer content when it re-parses. We only need to inform it the range of the change, so it knows where to re-scan when it re-parses. When the buffer is widened, we will tell tree-sitter that range [BUF_BEG, BUF_BEGV] has changed, and it will re-scan that part when re-parsing.
But that's sub-optimal, no? Imagine a very large buffer which was
narrowed to a small portion near EOB, then a modification made very
close to EOB but partially before BEGV, then the buffer widened. With
your method, TS will now have to re-parse almost the entire buffer,
whereas we know it needs to re-parse a very small portion of it.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 12:21 ` Eli Zaretskii
@ 2021-08-03 12:50 ` Fu Yuan
2021-08-03 13:03 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Fu Yuan @ 2021-08-03 12:50 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> 在 2021年8月3日,上午8:22,Eli Zaretskii <eliz@gnu.org> 写道:
>
>
>>
>> From: Fu Yuan <casouri@gmail.com>
>> Date: Tue, 3 Aug 2021 07:53:54 -0400
>> Cc: stephen_leake@stephe-leake.org, cpitclaudel@gmail.com,
>> monnier@iro.umontreal.ca, emacs-devel@gnu.org
>>
>> Oh no, I don’t mean that. I meant that, for example, functions like node_start_byte, which returns the byte position of the beginning of the node, will now be node_start_pos, which returns a point position.
>
> That's called "character position". Let's use the accepted
> terminology, to minimize misunderstandings.
Ah, got it.
> So in what sense are character positions easier to use than byte
> positions?
Here are what you can do with positions:
- find the smallest node that encloses a range (BEG . END)
- get the beginning and end of a node
Since all other functions use character position (eg, put-text-property, point), using character positions saves lisp code some ‘position-to-bytes’.
>> And if I want the byte position of the beginning of the node, I can use (position-to-byte (tree-sitter-node-start-pos node))
>
> Caveat: position-to-byte can be expensive. So in time-critical code,
> such as the display engine, we keep both character position and byte
> position, and update them in sync. Then you can use whichever is
> easier in each case.
Internally, tree_sitter.c will continue to use byte positions, of course.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 12:24 ` Eli Zaretskii
@ 2021-08-03 13:00 ` Fu Yuan
2021-08-03 13:28 ` Stefan Monnier
1 sibling, 0 replies; 370+ messages in thread
From: Fu Yuan @ 2021-08-03 13:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> 在 2021年8月3日,上午8:25,Eli Zaretskii <eliz@gnu.org> 写道:
>
>
>>
>> From: Fu Yuan <casouri@gmail.com>
>> Date: Tue, 3 Aug 2021 08:00:46 -0400
>> Cc: stephen_leake@stephe-leake.org, cpitclaudel@gmail.com,
>> monnier@iro.umontreal.ca, emacs-devel@gnu.org
>>
>>> Hmm... so a change that begins before the restriction and ends inside
>>> the restriction will be sent as if it began at BEGV? And the rest of
>>> the change will be discarded? Shouldn't you split such changes in
>>> tow, send to TS the part inside the restriction, and store the rest
>>> for the future, when/if the buffer is widened?
>>
>> Tree-sitter doesn’t care about the content in a change, it will re-scan the buffer content when it re-parses. We only need to inform it the range of the change, so it knows where to re-scan when it re-parses. When the buffer is widened, we will tell tree-sitter that range [BUF_BEG, BUF_BEGV] has changed, and it will re-scan that part when re-parsing.
>
> But that's sub-optimal, no? Imagine a very large buffer which was
> narrowed to a small portion near EOB, then a modification made very
> close to EOB but partially before BEGV, then the buffer widened. With
> your method, TS will now have to re-parse almost the entire buffer,
> whereas we know it needs to re-parse a very small portion of it.
It is indeed, but that’s unavoidable by the way we hide the hidden part of the buffer from tree-sitter. We pretend BUF_BEGV is the beginning of the buffer and nothing exists before it. Then when we widen, we need to “insert” the content between BUF_BEG and BUF_BEGV. I.e., as far as tree-sitter can tell, we inserted that text.
If you want to hide something then re-show it to tree-sitter, and want tree-sitter to know how to re-parse minimally, you should use tree-sitter-parser-set-included-ranges (ts_parser_set_included_ranges). I’ve wrote the lisp binding for it but haven’t pushed the change.
The reason why I didn’t implement narrow with set-ranges was explained earlier.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 12:50 ` Fu Yuan
@ 2021-08-03 13:03 ` Eli Zaretskii
2021-08-03 13:08 ` Fu Yuan
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-08-03 13:03 UTC (permalink / raw)
To: Fu Yuan; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Fu Yuan <casouri@gmail.com>
> Date: Tue, 3 Aug 2021 08:50:45 -0400
> Cc: stephen_leake@stephe-leake.org, cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca, emacs-devel@gnu.org
>
> > So in what sense are character positions easier to use than byte
> > positions?
>
> Here are what you can do with positions:
>
> - find the smallest node that encloses a range (BEG . END)
> - get the beginning and end of a node
>
> Since all other functions use character position (eg, put-text-property, point), using character positions saves lisp code some ‘position-to-bytes’.
If you are talking about Lisp, then yes, character positions are a
much better interface. But on the C level, sometimes you need byte
positions, sometimes character positions, and sometimes both. Since
you didn't say what level was this about, I cannot say something more
intelligent.
> Internally, tree_sitter.c will continue to use byte positions, of course.
"Internally", as opposed to what? And what is "internal" in this
context? I thought we were talking only about the internals.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 13:03 ` Eli Zaretskii
@ 2021-08-03 13:08 ` Fu Yuan
0 siblings, 0 replies; 370+ messages in thread
From: Fu Yuan @ 2021-08-03 13:08 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> 在 2021年8月3日,上午9:03,Eli Zaretskii <eliz@gnu.org> 写道:
>
>
>>
>> From: Fu Yuan <casouri@gmail.com>
>> Date: Tue, 3 Aug 2021 08:50:45 -0400
>> Cc: stephen_leake@stephe-leake.org, cpitclaudel@gmail.com,
>> monnier@iro.umontreal.ca, emacs-devel@gnu.org
>>
>>> So in what sense are character positions easier to use than byte
>>> positions?
>>
>> Here are what you can do with positions:
>>
>> - find the smallest node that encloses a range (BEG . END)
>> - get the beginning and end of a node
>>
>> Since all other functions use character position (eg, put-text-property, point), using character positions saves lisp code some ‘position-to-bytes’.
>
> If you are talking about Lisp, then yes, character positions are a
> much better interface. But on the C level, sometimes you need byte
> positions, sometimes character positions, and sometimes both. Since
> you didn't say what level was this about, I cannot say something more
> intelligent.
>
>> Internally, tree_sitter.c will continue to use byte positions, of course.
>
> "Internally", as opposed to what? And what is "internal" in this
> context? I thought we were talking only about the internals.
By internally I mean C level. I will change lisp interface functions to accept and return character positions, and C level code will keep using byte positions.
I’ll try to make myself clearer next time :-)
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 12:24 ` Eli Zaretskii
2021-08-03 13:00 ` Fu Yuan
@ 2021-08-03 13:28 ` Stefan Monnier
2021-08-03 13:34 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-08-03 13:28 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Fu Yuan, stephen_leake, cpitclaudel, emacs-devel
> But that's sub-optimal, no? Imagine a very large buffer which was
> narrowed to a small portion near EOB, then a modification made very
> close to EOB but partially before BEGV, then the buffer widened. With
> your method, TS will now have to re-parse almost the entire buffer,
> whereas we know it needs to re-parse a very small portion of it.
As a general rule, we will most likely want to work hard to avoid
exposing the narrowed buffer to TS (i.e. most calls to TS will first
`widen`).
Or we will want to keep several parse trees (one per narrowing).
We have the same problem already with `syntax-ppss` which we solve by
keeping two sets of data (`syntax-ppss-wide` and `syntax-ppss-narrow`).
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 13:28 ` Stefan Monnier
@ 2021-08-03 13:34 ` Eli Zaretskii
2021-08-06 3:22 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-08-03 13:34 UTC (permalink / raw)
To: Stefan Monnier; +Cc: casouri, stephen_leake, cpitclaudel, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Fu Yuan <casouri@gmail.com>, stephen_leake@stephe-leake.org,
> cpitclaudel@gmail.com, emacs-devel@gnu.org
> Date: Tue, 03 Aug 2021 09:28:57 -0400
>
> > But that's sub-optimal, no? Imagine a very large buffer which was
> > narrowed to a small portion near EOB, then a modification made very
> > close to EOB but partially before BEGV, then the buffer widened. With
> > your method, TS will now have to re-parse almost the entire buffer,
> > whereas we know it needs to re-parse a very small portion of it.
>
> As a general rule, we will most likely want to work hard to avoid
> exposing the narrowed buffer to TS (i.e. most calls to TS will first
> `widen`).
Sure. I was thinking about those corner cases where we won't.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-03 13:34 ` Eli Zaretskii
@ 2021-08-06 3:22 ` Yuan Fu
2021-08-06 6:37 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-08-06 3:22 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, stephen_leake, Stefan Monnier, emacs-devel
I’ve added bindings for set_ranges and pushed the latest code to https://github.com/casouri/emacs/tree/ts
As for now, I’ve created bindings for most of the functions I want to expose. Next I’ll probably write some more tests (in addition to responding reviews and comments).
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: How to add pseudo vector types
2021-08-06 3:22 ` Yuan Fu
@ 2021-08-06 6:37 ` Eli Zaretskii
2021-08-07 5:31 ` Tree-sitter api (Was: Re: How to add pseudo vector types) Fu Yuan
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-08-06 6:37 UTC (permalink / raw)
To: Yuan Fu; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 5 Aug 2021 23:22:17 -0400
> Cc: cpitclaudel@gmail.com, stephen_leake@stephe-leake.org,
> Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
>
> I’ve added bindings for set_ranges and pushed the latest code to https://github.com/casouri/emacs/tree/ts
>
> As for now, I’ve created bindings for most of the functions I want to expose. Next I’ll probably write some more tests (in addition to responding reviews and comments).
Thanks.
We should probably start thinking how to integrate TS-related
functionalities into Emacs in general. E.g., should there be an
option to activate it? should this option be per major mode? something
else?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Tree-sitter api (Was: Re: How to add pseudo vector types)
2021-08-06 6:37 ` Eli Zaretskii
@ 2021-08-07 5:31 ` Fu Yuan
2021-08-07 6:26 ` Eli Zaretskii
2021-08-07 15:47 ` Tree-sitter api Stefan Monnier
0 siblings, 2 replies; 370+ messages in thread
From: Fu Yuan @ 2021-08-07 5:31 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> 在 2021年8月6日,上午1:37,Eli Zaretskii <eliz@gnu.org> 写道:
>
>
>>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 5 Aug 2021 23:22:17 -0400
>> Cc: cpitclaudel@gmail.com, stephen_leake@stephe-leake.org,
>> Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
>>
>> I’ve added bindings for set_ranges and pushed the latest code to https://github.com/casouri/emacs/tree/ts
>>
>> As for now, I’ve created bindings for most of the functions I want to expose. Next I’ll probably write some more tests (in addition to responding reviews and comments).
>
> Thanks.
>
> We should probably start thinking how to integrate TS-related
> functionalities into Emacs in general. E.g., should there be an
> option to activate it? should this option be per major mode? something
> else?
We should have a user option to control tree-sitter on major mode level. Maybe an alist where each car is a major node symbol and each cdr is a Boolean value toggling tree-sitter for that node.
We also need tree-sitter-maximum-buffer-size, so that buffer larger than this size won’t enable tree-sitter. (And we need to make sure we never use tree-sitter on buffers larger than 4GB because tree-sitter uses unint32.)
And we can provide a function free-sitter-should-activate-p that computes if we should enable tree-sitter in the current buffer by variables mentioned above, that can be used by major-modes when setting up.
I’m also thinking about having a tree-sitter-defaults that’s analogous to font-lock-defaults, that is set by each major node and used to generate tree-sitter-font-lock-settings.
As for indentation, we could provide some infrastructure like we do for font-locking, or we can just let major modes implement their indent function with tree-sitter api.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api (Was: Re: How to add pseudo vector types)
2021-08-07 5:31 ` Tree-sitter api (Was: Re: How to add pseudo vector types) Fu Yuan
@ 2021-08-07 6:26 ` Eli Zaretskii
2021-08-07 15:47 ` Tree-sitter api Stefan Monnier
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-08-07 6:26 UTC (permalink / raw)
To: Fu Yuan; +Cc: cpitclaudel, stephen_leake, monnier, emacs-devel
> From: Fu Yuan <casouri@gmail.com>
> Date: Sat, 7 Aug 2021 00:31:36 -0500
> Cc: cpitclaudel@gmail.com, stephen_leake@stephe-leake.org,
> monnier@iro.umontreal.ca, emacs-devel@gnu.org
>
> > We should probably start thinking how to integrate TS-related
> > functionalities into Emacs in general. E.g., should there be an
> > option to activate it? should this option be per major mode? something
> > else?
>
> We should have a user option to control tree-sitter on major mode level. Maybe an alist where each car is a major node symbol and each cdr is a Boolean value toggling tree-sitter for that node.
>
> We also need tree-sitter-maximum-buffer-size, so that buffer larger than this size won’t enable tree-sitter. (And we need to make sure we never use tree-sitter on buffers larger than 4GB because tree-sitter uses unint32.)
>
> And we can provide a function free-sitter-should-activate-p that computes if we should enable tree-sitter in the current buffer by variables mentioned above, that can be used by major-modes when setting up.
>
> I’m also thinking about having a tree-sitter-defaults that’s analogous to font-lock-defaults, that is set by each major node and used to generate tree-sitter-font-lock-settings.
>
> As for indentation, we could provide some infrastructure like we do for font-locking, or we can just let major modes implement their indent function with tree-sitter api.
SGTM, thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-07 5:31 ` Tree-sitter api (Was: Re: How to add pseudo vector types) Fu Yuan
2021-08-07 6:26 ` Eli Zaretskii
@ 2021-08-07 15:47 ` Stefan Monnier
2021-08-07 18:40 ` Theodor Thornhill
2021-08-08 22:56 ` Yuan Fu
1 sibling, 2 replies; 370+ messages in thread
From: Stefan Monnier @ 2021-08-07 15:47 UTC (permalink / raw)
To: Fu Yuan; +Cc: Eli Zaretskii, cpitclaudel, stephen_leake, emacs-devel
> We should have a user option to control tree-sitter on major mode
> level. Maybe an alist where each car is a major node symbol and each cdr is
> a Boolean value toggling tree-sitter for that node.
The more traditional approach is to use a buffer-local var set by the
major mode or set via (add-hook '<MODE>-hook ...).
> As for indentation, we could provide some infrastructure like we do for
> font-locking, or we can just let major modes implement their indent function
> with tree-sitter api.
We should definitely provide the infrastructure (even if it's fairly
simple) so that major modes only have to provide some rules.
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-07 15:47 ` Tree-sitter api Stefan Monnier
@ 2021-08-07 18:40 ` Theodor Thornhill
2021-08-07 19:53 ` Stefan Monnier
2021-08-08 22:56 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Theodor Thornhill @ 2021-08-07 18:40 UTC (permalink / raw)
To: Stefan Monnier, Fu Yuan
Cc: Eli Zaretskii, cpitclaudel, stephen_leake, emacs-devel
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> We should have a user option to control tree-sitter on major mode
>> level. Maybe an alist where each car is a major node symbol and each cdr is
>> a Boolean value toggling tree-sitter for that node.
>
> The more traditional approach is to use a buffer-local var set by the
> major mode or set via (add-hook '<MODE>-hook ...).
>
>> As for indentation, we could provide some infrastructure like we do for
>> font-locking, or we can just let major modes implement their indent function
>> with tree-sitter api.
>
> We should definitely provide the infrastructure (even if it's fairly
> simple) so that major modes only have to provide some rules.
>
Yeah, though that quickly becomes not so simple, considering that
different languages have their own idiosyncrasies with indentation. C#,
for instance, is a rats nest of particularities. And this is not
considering variations of style guides etc. It would be nice to get an
api similar to what CC mode has. Font locking is an easier problem,
since it's just "fontify from node-start to node-end".
I'm not sure how to best provide this api, but I've worked a lot with CC
mode and the new tree-sitter-indent [1]. It quickly gets confusing and
reminds me of `display-buffer`. Providing both a
`tree-sitter-indent-engine` mode as well as the low level api for major
mode authors would be nice as well.
Providing something too simple would just make people not use it, since
the weirder cases won't be covered.
--
Theo
[1]: https://codeberg.org/FelipeLema/tree-sitter-indent.el
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-07 18:40 ` Theodor Thornhill
@ 2021-08-07 19:53 ` Stefan Monnier
2021-08-17 6:18 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-08-07 19:53 UTC (permalink / raw)
To: Theodor Thornhill
Cc: Fu Yuan, Eli Zaretskii, cpitclaudel, stephen_leake, emacs-devel
> Yeah, though that quickly becomes not so simple, considering that
> different languages have their own idiosyncrasies with indentation. C#,
> for instance, is a rats nest of particularities. And this is not
> considering variations of style guides etc. It would be nice to get an
> api similar to what CC mode has.
I'm thinking of rules specified via a function that takes a TS node
(from which the function can explore the rest of the TS tree) and return
the indentation to use, represented as a pair (POSITION . OFFSET)
(meaning to indent OFFSET columns further than the column position of
POSITION).
The infrastructure would limit itself to making sure we have an uptodate
tree (computed from a properly widened buffer), find the node
corresponding to point pass it to the function and then turn the return
value into an actual column and indent the text accordingly (paying
attention to the usual difference between when point is "within the
indentation" vs "within the text").
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-07 15:47 ` Tree-sitter api Stefan Monnier
2021-08-07 18:40 ` Theodor Thornhill
@ 2021-08-08 22:56 ` Yuan Fu
2021-08-08 23:24 ` Stefan Monnier
1 sibling, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-08-08 22:56 UTC (permalink / raw)
To: Stefan Monnier
Cc: Eli Zaretskii, Stephen Leake, Clément Pit-Claudel,
emacs-devel
> On Aug 7, 2021, at 10:47 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
>> We should have a user option to control tree-sitter on major mode
>> level. Maybe an alist where each car is a major node symbol and each cdr is
>> a Boolean value toggling tree-sitter for that node.
>
> The more traditional approach is to use a buffer-local var set by the
> major mode or set via (add-hook '<MODE>-hook ...).
The major-mode would setup things like font-lock-defaults and tree-sitter-defaults before major-mode-hook runs, so I think enabling/disabling tree-sitter in the hook is too late, no?
>
>> As for indentation, we could provide some infrastructure like we do for
>> font-locking, or we can just let major modes implement their indent function
>> with tree-sitter api.
>
> We should definitely provide the infrastructure (even if it's fairly
> simple) so that major modes only have to provide some rules.
I don’t really know much about indenting but I’ll try my best. Suggestions are definitely welcome.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-08 22:56 ` Yuan Fu
@ 2021-08-08 23:24 ` Stefan Monnier
2021-08-09 0:06 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-08-08 23:24 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, Clément Pit-Claudel, Stephen Leake,
emacs-devel
Yuan Fu [2021-08-08 17:56:33] wrote:
>> On Aug 7, 2021, at 10:47 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>>> We should have a user option to control tree-sitter on major mode
>>> level. Maybe an alist where each car is a major node symbol and each cdr is
>>> a Boolean value toggling tree-sitter for that node.
>> The more traditional approach is to use a buffer-local var set by the
>> major mode or set via (add-hook '<MODE>-hook ...).
> The major-mode would setup things like font-lock-defaults and
> tree-sitter-defaults before major-mode-hook runs, so I think
> enabling/disabling tree-sitter in the hook is too late, no?
I don't see why. Presumably the major mode would set some vars relevant
to the tree-sitter support, but then whether those vars are used will
depend on the buffer-local boolean var (let's call it `tree-sitter-mode`).
I'm sure there will be issues w.r.t initialization order, e.g. in case
`font-lock-mode` is enabled before `tree-sitter-mode`, but that doesn't
seem very serious (`font-lock-mode` doesn't do much anyway, since the
real work is postponed until the next redisplay by jit-lock, so we could
"refresh" font-lock settings fairly cheaply within `tree-sitter-mode`).
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-08 23:24 ` Stefan Monnier
@ 2021-08-09 0:06 ` Yuan Fu
0 siblings, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2021-08-09 0:06 UTC (permalink / raw)
To: Stefan Monnier
Cc: Eli Zaretskii, Stephen Leake, Clément Pit-Claudel,
emacs-devel
> On Aug 8, 2021, at 6:24 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> Yuan Fu [2021-08-08 17:56:33] wrote:
>>> On Aug 7, 2021, at 10:47 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>>>> We should have a user option to control tree-sitter on major mode
>>>> level. Maybe an alist where each car is a major node symbol and each cdr is
>>>> a Boolean value toggling tree-sitter for that node.
>>> The more traditional approach is to use a buffer-local var set by the
>>> major mode or set via (add-hook '<MODE>-hook ...).
>> The major-mode would setup things like font-lock-defaults and
>> tree-sitter-defaults before major-mode-hook runs, so I think
>> enabling/disabling tree-sitter in the hook is too late, no?
>
> I don't see why. Presumably the major mode would set some vars relevant
> to the tree-sitter support, but then whether those vars are used will
> depend on the buffer-local boolean var (let's call it `tree-sitter-mode`).
>
> I'm sure there will be issues w.r.t initialization order, e.g. in case
> `font-lock-mode` is enabled before `tree-sitter-mode`, but that doesn't
> seem very serious (`font-lock-mode` doesn't do much anyway, since the
> real work is postponed until the next redisplay by jit-lock, so we could
> "refresh" font-lock settings fairly cheaply within `tree-sitter-mode`).
Instead of a tree-sitter-mode, I made font-lock use tree-sitter features in addition to using keywords. Basically I added another fontification pass (tree-sitter pass) in addition to the current two, syntactic pass and regex pass.
(Syntactic pass is probably unnecessary if tree-sitter is enabled, tho). This way someone can still add regexp-based fontification even he uses tree-sitter for “standard” fontification.
And under this scheme, a major-mode would want something like this in the major-mode definition:
(if (tree-sitter-should-enable-p)
(progn (setq-local font-lock-tree-sitter-defaults
'((ts-c-tree-sitter-settings-1)))
(setq-local font-lock-defaults
(ignore t nil nil nil)))
(setq-local font-lock-defaults
'((c-font-lock-keywords
c-font-lock-keywords-1
c-font-lock-keywords-2
c-font-lock-keywords-3)
nil nil
((95 . "w")
(36 . "w"))
c-beginning-of-syntax
(font-lock-mark-block-function . c-mark-function))))
In this scheme, changing whether to enable tree-sitter is too late in major-mode-hook (not impossible, of course).
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-07 19:53 ` Stefan Monnier
@ 2021-08-17 6:18 ` Yuan Fu
2021-08-18 18:27 ` Stephen Leake
` (2 more replies)
0 siblings, 3 replies; 370+ messages in thread
From: Yuan Fu @ 2021-08-17 6:18 UTC (permalink / raw)
To: Stefan Monnier
Cc: Stephen Leake, Eli Zaretskii, Theodor Thornhill,
Clément Pit-Claudel, emacs-devel
>
> I'm thinking of rules specified via a function that takes a TS node
> (from which the function can explore the rest of the TS tree) and return
> the indentation to use, represented as a pair (POSITION . OFFSET)
> (meaning to indent OFFSET columns further than the column position of
> POSITION).
>
> The infrastructure would limit itself to making sure we have an uptodate
> tree (computed from a properly widened buffer), find the node
> corresponding to point pass it to the function and then turn the return
> value into an actual column and indent the text accordingly (paying
> attention to the usual difference between when point is "within the
> indentation" vs "within the text”).
Okay, here is the (ad-hoc) infrastructure I came up with:
We have a tree-sitter-simple-indent-function. Major-mode authors can set indent-line-function to it to use the simple-indent system. tree-sitter-simple-indent-function indents according to tree-sitter-simple-indent-rules. Doc string of tree-sitter-simple-indent-rules reads:
A list of indent rule settings.
Each indent rule setting should be (LANGUAGE . RULES),
where LANGUAGE is a language symbol, and RULES is a list of
(MATCHER ANCHOR OFFSET).
MATCHER determines whether this rule applies, ANCHOR and OFFSET
together determines which column to indent to.
A MATCHER is a function that takes three arguments (NODE PARENT
BOL). NODE is the largest (highest-in-tree) node starting at
point. PARENT is the parent of NODE. BOL is the point where we
are indenting: the beginning of line content, the position of the
first non-whitespace character.
If MATCHER returns non-nil, meaning the rule matches, Emacs then
uses ANCHOR to find an anchor, it should be a function that takes
the same argument (NODE PARENT BOL) and returns a point.
Finally Emacs computes the column of that point returned by ANCHOR
and adds OFFSET to it, and indent the line to that column.
For MATCHER and ANCHOR, Emacs provides some convenient presets.
See `tree-sitter-simple-indent-presets’.
And doc string for tree-sitter-simple-indent-presets:
A list of presets.
These presets can be used as MATHER and ANCHOR in
`tree-sitter-simple-indent-rules'.
MATCHER:
(match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN NODE-INDEX-MAX)
NODE-TYPE checks for node's type, PARENT-TYPE check for
parent's type, NODE-FIELD checks for the filed name of node
in the parent, NODE-INDEX-MIN and NODE-INDEX-MAX checks for
the node's index in the parent. Therefore, to match the
first child where parent is \"argument_list\", use (match nil
\"argument_list\" nil nil 0 0).
no-node
Matches the case where node is nil, i.e., there is no node
that starts at point. This is the case when indenting an
empty line.
(node-at-point TYPE NAMED)
Check that the node at point -- not the largest node starting at
point -- has type TYPE. If NAMED non-nil, check the named node
at point.
(parent-is TYPE)
Check that the parent has type TYPE.
(node-is TYPE)
Checks that the node has type TYPE.
(parent-match PATTERN)
Checks that the parent matches PATTERN, a query pattern.
(node-match PATTERN)
Checks that the node matches PATTERN, a query pattern.
ANCHOR:
first-child
Find the first child of the parent.
parent
Find the parent.
prev-sibling
Find node's previous sibling.
no-indent
Do nothing.
prev-line
Find the named node on previous line. This can be used when
indenting an empty line: just indent like the previous node.
An example of using these facility can be found in ts-c-tree-sitter-indent-rules.
For example,
((match nil "function_definition" "body") parent 0)
means “match the node which it’s parent’s type is “function_definition” and its field name is “body”, indent to the start of its parent. That indents the starting braces in
int main ()
{
}
((parent-is "call_expression") parent 2)
Means “match the node which its’ parent’s type is “call_expression”, and indent to the start of its parent + 2. That indents the second line in
my_cool_function
(arg1, arg2, arg3)
I’ve implemented some indentation rules for C in ts-c-mode as usual. I expect someone more knowledgeable in C to actually implement it later.
So… do you think this is ok, or convoluted? In particular, is there a better way to implement those “presets”? I don’t want to define them as normal functions, because then their name will be super long (parent-is -> tree-sitter-simple-indent-parent-is) and annoying to use when writing rules, but putting them in an alist (tree-sitter-simple-indent-presets) is a bit ad-hoc. I call these presets with tree-sitter--simple-apply, which basically looks up tree-sitter-simple-indent-presets, get the function and apply it.
You can find the latest version at https://github.com/casouri/emacs/tree/ts
I.e., git clone https://github.com/casouri/emacs.git --branch ts
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-17 6:18 ` Yuan Fu
@ 2021-08-18 18:27 ` Stephen Leake
2021-08-18 21:30 ` Yuan Fu
2021-08-23 6:51 ` Yuan Fu
2021-08-22 2:43 ` Yuan Fu
2021-08-25 0:21 ` Stefan Monnier
2 siblings, 2 replies; 370+ messages in thread
From: Stephen Leake @ 2021-08-18 18:27 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Theodor Thornhill, Stefan Monnier,
Clément Pit-Claudel, emacs-devel
This looks very interesting, but I have a migraine right now, so I'll
have to look at it later.
You could try writing indent rules for Ada; current ada-mode code is in
https://savannah.nongnu.org/git/?group=ada-mode. See the test/ directory
for examples of known good indentation.
ada-mode takes the approach of embedding the indent rules directly in
the grammar, and the functions that do that provide a few more options
than yours. To see the definition of those functions, you'll have to
install the wisi package, and look in wisi.info, section Grammar
actions. (it would be nice if that info/html file was linked from the
GNU ELPA package page; I'll start a new thread for that).
Yuan Fu <casouri@gmail.com> writes:
>>
>> I'm thinking of rules specified via a function that takes a TS node
>> (from which the function can explore the rest of the TS tree) and return
>> the indentation to use, represented as a pair (POSITION . OFFSET)
>> (meaning to indent OFFSET columns further than the column position of
>> POSITION).
>>
>> The infrastructure would limit itself to making sure we have an uptodate
>> tree (computed from a properly widened buffer), find the node
>> corresponding to point pass it to the function and then turn the return
>> value into an actual column and indent the text accordingly (paying
>> attention to the usual difference between when point is "within the
>> indentation" vs "within the text”).
>
> Okay, here is the (ad-hoc) infrastructure I came up with:
>
> We have a tree-sitter-simple-indent-function. Major-mode authors can set indent-line-function to it to use the simple-indent system. tree-sitter-simple-indent-function indents according to tree-sitter-simple-indent-rules. Doc string of tree-sitter-simple-indent-rules reads:
>
> A list of indent rule settings.
> Each indent rule setting should be (LANGUAGE . RULES),
> where LANGUAGE is a language symbol, and RULES is a list of
> (MATCHER ANCHOR OFFSET).
>
> MATCHER determines whether this rule applies, ANCHOR and OFFSET
> together determines which column to indent to.
>
> A MATCHER is a function that takes three arguments (NODE PARENT
> BOL). NODE is the largest (highest-in-tree) node starting at
> point. PARENT is the parent of NODE. BOL is the point where we
> are indenting: the beginning of line content, the position of the
> first non-whitespace character.
>
> If MATCHER returns non-nil, meaning the rule matches, Emacs then
> uses ANCHOR to find an anchor, it should be a function that takes
> the same argument (NODE PARENT BOL) and returns a point.
>
> Finally Emacs computes the column of that point returned by ANCHOR
> and adds OFFSET to it, and indent the line to that column.
>
> For MATCHER and ANCHOR, Emacs provides some convenient presets.
> See `tree-sitter-simple-indent-presets’.
>
> And doc string for tree-sitter-simple-indent-presets:
>
> A list of presets.
> These presets can be used as MATHER and ANCHOR in
> `tree-sitter-simple-indent-rules'.
>
> MATCHER:
>
> (match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN NODE-INDEX-MAX)
>
> NODE-TYPE checks for node's type, PARENT-TYPE check for
> parent's type, NODE-FIELD checks for the filed name of node
> in the parent, NODE-INDEX-MIN and NODE-INDEX-MAX checks for
> the node's index in the parent. Therefore, to match the
> first child where parent is \"argument_list\", use (match nil
> \"argument_list\" nil nil 0 0).
>
> no-node
>
> Matches the case where node is nil, i.e., there is no node
> that starts at point. This is the case when indenting an
> empty line.
>
> (node-at-point TYPE NAMED)
>
> Check that the node at point -- not the largest node starting at
> point -- has type TYPE. If NAMED non-nil, check the named node
> at point.
>
> (parent-is TYPE)
>
> Check that the parent has type TYPE.
>
> (node-is TYPE)
>
> Checks that the node has type TYPE.
>
> (parent-match PATTERN)
>
> Checks that the parent matches PATTERN, a query pattern.
>
> (node-match PATTERN)
>
> Checks that the node matches PATTERN, a query pattern.
>
> ANCHOR:
>
> first-child
>
> Find the first child of the parent.
>
> parent
>
> Find the parent.
>
> prev-sibling
>
> Find node's previous sibling.
>
> no-indent
>
> Do nothing.
>
> prev-line
>
> Find the named node on previous line. This can be used when
> indenting an empty line: just indent like the previous node.
>
> An example of using these facility can be found in ts-c-tree-sitter-indent-rules.
>
> For example,
>
> ((match nil "function_definition" "body") parent 0)
>
> means “match the node which it’s parent’s type is “function_definition” and its field name is “body”, indent to the start of its parent. That indents the starting braces in
>
> int main ()
> {
> }
>
> ((parent-is "call_expression") parent 2)
>
> Means “match the node which its’ parent’s type is “call_expression”, and indent to the start of its parent + 2. That indents the second line in
>
> my_cool_function
> (arg1, arg2, arg3)
>
> I’ve implemented some indentation rules for C in ts-c-mode as usual. I expect someone more knowledgeable in C to actually implement it later.
>
> So… do you think this is ok, or convoluted? In particular, is there a better way to implement those “presets”? I don’t want to define them as normal functions, because then their name will be super long (parent-is -> tree-sitter-simple-indent-parent-is) and annoying to use when writing rules, but putting them in an alist (tree-sitter-simple-indent-presets) is a bit ad-hoc. I call these presets with tree-sitter--simple-apply, which basically looks up tree-sitter-simple-indent-presets, get the function and apply it.
>
> You can find the latest version at https://github.com/casouri/emacs/tree/ts
> I.e., git clone https://github.com/casouri/emacs.git --branch ts
>
> Yuan
>
>
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-18 18:27 ` Stephen Leake
@ 2021-08-18 21:30 ` Yuan Fu
2021-08-20 0:12 ` [SPAM UNSURE] " Stephen Leake
2021-08-23 6:51 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-08-18 21:30 UTC (permalink / raw)
To: Stephen Leake
Cc: Eli Zaretskii, Theodor Thornhill, Stefan Monnier,
Clément Pit-Claudel, emacs-devel
>
> This looks very interesting, but I have a migraine right now, so I'll
> have to look at it later.
Hope you get better soon :-)
> You could try writing indent rules for Ada; current ada-mode code is in
> https://savannah.nongnu.org/git/?group=ada-mode. See the test/ directory
> for examples of known good indentation.
>
> ada-mode takes the approach of embedding the indent rules directly in
> the grammar, and the functions that do that provide a few more options
> than yours. To see the definition of those functions, you'll have to
> install the wisi package, and look in wisi.info, section Grammar
> actions. (it would be nice if that info/html file was linked from the
> GNU ELPA package page; I'll start a new thread for that).
Thanks. I’ll see what I can do; I know nearly nothing about Ada except that it is commissioned by the department of defense :-)
BTW, while I was reading the manual, I noticed a typo:
If token labels are used in a right hand side, they must be
given explicitly in the indent arguments, using he lisp "cons"
^
syntax. Labels are normally only used with EBNF grammars,
which expand into multiple right hand sides, with optional
tokens simply left out. Explicit labels on the indent
arguments allow them to be left out as well.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: [SPAM UNSURE] Re: Tree-sitter api
2021-08-18 21:30 ` Yuan Fu
@ 2021-08-20 0:12 ` Stephen Leake
0 siblings, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-08-20 0:12 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Theodor Thornhill, Stefan Monnier,
Clément Pit-Claudel, emacs-devel
Yuan Fu <casouri@gmail.com> writes:
>> You could try writing indent rules for Ada; current ada-mode code is in
>> https://savannah.nongnu.org/git/?group=ada-mode. See the test/ directory
>> for examples of known good indentation.
>>
>> ada-mode takes the approach of embedding the indent rules directly in
>> the grammar, and the functions that do that provide a few more options
>> than yours. To see the definition of those functions, you'll have to
>> install the wisi package, and look in wisi.info, section Grammar
>> actions. (it would be nice if that info/html file was linked from the
>> GNU ELPA package page; I'll start a new thread for that).
>
> Thanks. I’ll see what I can do; I know nearly nothing about Ada except
> that it is commissioned by the department of defense :-)
Was, a long time ago. Now it is used by high-security, high-reliability
applications (train control, spacecraft (European, not NASA, sigh), banks).
AdaCore is a company thriving on the business model of selling support
for the Gnu Ada compiler and associated tools.
> BTW, while I was reading the manual, I noticed a typo:
Thanks.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-17 6:18 ` Yuan Fu
2021-08-18 18:27 ` Stephen Leake
@ 2021-08-22 2:43 ` Yuan Fu
2021-08-22 3:46 ` Yuan Fu
2021-08-22 6:15 ` Eli Zaretskii
2021-08-25 0:21 ` Stefan Monnier
2 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-08-22 2:43 UTC (permalink / raw)
To: Stefan Monnier
Cc: Stephen Leake, Eli Zaretskii, Theodor Thornhill,
Clément Pit-Claudel, emacs-devel
I’m trying to automate building the dynamic modules for each language definition. The files are largely identical for each language, I just need to replace language-specific names in each file. Can I use awk?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-22 2:43 ` Yuan Fu
@ 2021-08-22 3:46 ` Yuan Fu
2021-08-22 6:16 ` Eli Zaretskii
2021-08-22 6:15 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-08-22 3:46 UTC (permalink / raw)
To: Stefan Monnier
Cc: Stephen Leake, Eli Zaretskii, Theodor Thornhill,
Clément Pit-Claudel, emacs-devel
> On Aug 21, 2021, at 7:43 PM, Yuan Fu <casouri@gmail.com> wrote:
>
> I’m trying to automate building the dynamic modules for each language definition. The files are largely identical for each language, I just need to replace language-specific names in each file. Can I use awk?
Actually, after searching for a bit more, I think what I need is sed. Or there are better tools that I don’t know about? Maybe I can just use emacs --batch?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-22 2:43 ` Yuan Fu
2021-08-22 3:46 ` Yuan Fu
@ 2021-08-22 6:15 ` Eli Zaretskii
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-08-22 6:15 UTC (permalink / raw)
To: Yuan Fu; +Cc: stephen_leake, cpitclaudel, theo, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 21 Aug 2021 19:43:31 -0700
> Cc: Theodor Thornhill <theo@thornhill.no>,
> Eli Zaretskii <eliz@gnu.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> emacs-devel <emacs-devel@gnu.org>
>
> I’m trying to automate building the dynamic modules for each language definition. The files are largely identical for each language, I just need to replace language-specific names in each file. Can I use awk?
Yes. We already use Awk in a couple of places in the build process.
Another possibility is to use Emacs, if what you need to do is not
part of bootstrap.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-22 3:46 ` Yuan Fu
@ 2021-08-22 6:16 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-08-22 6:16 UTC (permalink / raw)
To: Yuan Fu; +Cc: stephen_leake, cpitclaudel, theo, monnier, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 21 Aug 2021 20:46:45 -0700
> Cc: Theodor Thornhill <theo@thornhill.no>,
> Eli Zaretskii <eliz@gnu.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> emacs-devel <emacs-devel@gnu.org>
>
> Actually, after searching for a bit more, I think what I need is sed. Or there are better tools that I don’t know about? Maybe I can just use emacs --batch?
Both are possible, but if what you need to do must be done as part of
bootstrap, Emacs might not be available yet.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-18 18:27 ` Stephen Leake
2021-08-18 21:30 ` Yuan Fu
@ 2021-08-23 6:51 ` Yuan Fu
2021-08-24 14:59 ` [SPAM UNSURE] " Stephen Leake
2021-08-24 22:51 ` Stefan Monnier
1 sibling, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-08-23 6:51 UTC (permalink / raw)
To: Stephen Leake
Cc: Eli Zaretskii, Theodor Thornhill, Stefan Monnier,
Clément Pit-Claudel, emacs-devel
>
> ada-mode takes the approach of embedding the indent rules directly in
> the grammar, and the functions that do that provide a few more options
> than yours. To see the definition of those functions, you'll have to
> install the wisi package, and look in wisi.info, section Grammar
> actions. (it would be nice if that info/html file was linked from the
> GNU ELPA package page; I'll start a new thread for that).
I had a cursory look at the manual for indent in wisi and have some questions. Why does wisi indent from “low-level productions”? (I think most indentation engine works line-by-line from the first line.) I don’t know much about how wisi works, but the indentation system seems to stem from circumstances quite different from that of tree-sitter. For example, wiki’s indent is devised alongside the grammar definition, while for tree-sitter, all the hard work of defining grammar is done for me and I’m merely a user of the grammar: that makes indenting with tree-sitter a much simpler job.
A problem I have with smie (and maybe wisi, but I didn’t look into wisi) is its seeming complexity. I’m merely a 22-year-old who drank too much coca-cola, and smie is too complicated for my soaked brain to comprehend. Having a traumatized experience trying to use smie[1], I want to make my indentation system as straightforward as possible. It doesn’t have to be complicated anyway, since it does so much less than wisi and smie. Right now I’d say it’s pretty simple, and most tasks (in indenting C) can be reasonably done, and I imagine difficult cases can be solved by writing custom matcher and anchor functions.
Stefan, can you have a look at tree-sitter-simple-indent? It’s like two messages up? It goes generally along the (pos . offset) idea but has some twists.
[1] Of course, I need to define the grammar when using smie while not when using tree-sitter, so it’s like comparing apple to pears, but I can’t resist finally telling a joke on the list.
Thanks,
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: [SPAM UNSURE] Re: Tree-sitter api
2021-08-23 6:51 ` Yuan Fu
@ 2021-08-24 14:59 ` Stephen Leake
2021-08-27 5:18 ` [SPAM UNSURE] " Yuan Fu
2021-08-24 22:51 ` Stefan Monnier
1 sibling, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-08-24 14:59 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Theodor Thornhill, Stefan Monnier,
Clément Pit-Claudel, emacs-devel
Yuan Fu <casouri@gmail.com> writes:
>>
>> ada-mode takes the approach of embedding the indent rules directly in
>> the grammar, and the functions that do that provide a few more options
>> than yours. To see the definition of those functions, you'll have to
>> install the wisi package, and look in wisi.info, section Grammar
>> actions. (it would be nice if that info/html file was linked from the
>> GNU ELPA package page; I'll start a new thread for that).
>
> I had a cursory look at the manual for indent in wisi and have some
> questions. Why does wisi indent from “low-level productions”?
The indent of every new-line must be specified; low level productions
can contain new-lines.
> (I think most indentation engine works line-by-line from the first
> line.) I don’t know much about how wisi works, but the indentation
> system seems to stem from circumstances quite different from that of
> tree-sitter. For example, wiki’s indent is devised alongside the
> grammar definition, while for tree-sitter, all the hard work of
> defining grammar is done for me and I’m merely a user of the grammar:
> that makes indenting with tree-sitter a much simpler job.
The Ada grammar is taken from the Ada Reference Manual; the indent
information is added after. The indent information could be in a
separate file, as in tree-sitter (wisitoken does not currently support
this; there would need to be a way to specify which production the
indent rule is associated with).
A tree-sitter based indent engine still has to specify the indent of
every new-line; it's the same amount of information.
Taking the examples from your email:
> ((match nil "function_definition" "body") parent 0)
> means “match the node which it’s parent’s type is
> “function_definition” and its field name is “body”, indent to the
> start of its parent. That indents the starting braces in
> int main ()
> {
> }
Refering to the tree-sitter-c grammar at
https://github.com/tree-sitter/tree-sitter-c/blob/master/grammar.js,
there is a C grammar production (in tree-sitter syntax):
function_definition: $ => seq(
optional($.ms_call_modifier),
$._declaration_specifiers,
field('declarator', $._declarator),
field('body', $.compound_statement)
),
In wisitoken syntax, this is:
function_definition : [ms_call_modifier] declaration_specifiers
declarator=declarator body=compound_statement
(the current wisi user guide does not define the "=" syntax for
declaring token names, but it is supported; I'll add it to the user
guide)
The indent rule specifies the indent of the field named 'body',
relative to the start of the production. So in wisitoken, this would
specify one component of the indent action for this production:
{(wisi-indent-action [nil nil nil (body . 0)])}
Presumably there are other rules that specify the indent of the other
tokens in that production, so they would not be 'nil', which in
wisitoken means "undefined"; it is an error for any new-line to have an
undefined indent after all indent actions are applied.
Next example:
((parent-is "call_expression") parent 2)
The production is:
call_expression: $ => prec(PREC.CALL, seq(
field('function', $._expression),
field('arguments', $.argument_list)
)),
In wisitoken syntax (note that wisitoken does not support precedence
declarations (yet)):
call_expression : function=expression arguments=argument_list
{(wisi-indent-action [nil (arguments . 2)])}
So your syntax for indent is much more verbose than the wisi syntax
(because each token gets a separate rule), but specifies the same
information.
Your syntax also requires naming each token that is referenced in an
indent rule; wisitoken can use token position to do that, which is the
main reason indent is specified directly in the grammar file; it's very
easy to associate each indent expression with the corresponding token,
without having to make up names for the tokens. Here are the above
wisitoken productions without the token names:
function_definition : [ms_call_modifier] declaration_specifiers
declarator compound_statement
{(wisi-indent-action [nil nil nil 0])}
call_expression : expression argument_list
{(wisi-indent-action [nil 2])}
To be fair, we'd have to look at the other types of rules, to see if
this pattern holds up.
I think you were biased by the "matching" rules tree-sitter supports.
That approach is reasonable when you only want to specify information
for a few nodes in the tree. Wisi assumes you want to specify indent
information for most of the nodes in the tree, so it supports a
tree-traversal model instead. Tree-sitter does support tree traversal,
but doesn't provide an easy way to add information for each node, as the
wisi indent-action syntax does.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-23 6:51 ` Yuan Fu
2021-08-24 14:59 ` [SPAM UNSURE] " Stephen Leake
@ 2021-08-24 22:51 ` Stefan Monnier
1 sibling, 0 replies; 370+ messages in thread
From: Stefan Monnier @ 2021-08-24 22:51 UTC (permalink / raw)
To: Yuan Fu
Cc: Stephen Leake, Eli Zaretskii, Theodor Thornhill,
Clément Pit-Claudel, emacs-devel
> (I think most indentation engine works line-by-line from the
> first line.)
FWIW, the vast majority of the code performing indentation in the
various major modes in Emacs does it by parsing backward from the
position of point and doesn't work "line by line".
The "line by line" is only used for `indent-region` but the workhorse
function is in `indent-line-function` and only performs indentation of
a single line without touching anything else.
IOW, `indent-region` will usually go "line-by-line" but for each line
the actual work will be by parsing backward from that line
(i.e. re-parsing the previous lines that had just been parsed for the
previous line's indentation). This is obviously not ideal in terms of
efficiency, but in practice indenting a single line usually only needs
to parse a small number of lines (I suspect it's almost O(1) of
*amortized* complexity so in most cases the algorithmic complexity of
`indent-region` is not really affected).
> Stefan, can you have a look at tree-sitter-simple-indent? It’s like two
> messages up? It goes generally along the (pos . offset) idea but has
> some twists.
It's in my todo list, yes. I'm still backlog'd, tho.
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-17 6:18 ` Yuan Fu
2021-08-18 18:27 ` Stephen Leake
2021-08-22 2:43 ` Yuan Fu
@ 2021-08-25 0:21 ` Stefan Monnier
2021-08-27 5:45 ` Yuan Fu
2 siblings, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-08-25 0:21 UTC (permalink / raw)
To: Yuan Fu
Cc: Theodor Thornhill, Eli Zaretskii, Clément Pit-Claudel,
Stephen Leake, emacs-devel
> Okay, here is the (ad-hoc) infrastructure I came up with:
It's more than what I proposed, but it looks fairly good.
See patch below which is the "side effect" of reading your code.
You'll see that I removed the "-function" from the function name (this
suffix is used for variables holding functions rather than for the
function themselves) and I split that function into two, the outer one
(tree-sitter-indent) implementing basically what I suggested and the
inner one (tree-sitter-simple-indent) implementing the extra structure
you added to it, mediated by a new var `tree-sitter-indent-function`
which modes can set if they want to use another algorithm than the one
you implemented in `tree-sitter-simple-indent`.
The reason why I divided it this way is that my experience with
indentation code is that it can be useful occasionally to call
recursively the indentation code to know where a node *would* be
indented. This comes in handy when you want to be able to provide
indentation styles like:
let myvariable = if (foo) {
bar
} else {
baz
}
where the body of the `if` branches needs to be indented relative to the
position where the `if` itself would be indented if it were on its own line.
Stefan
PS: The patch also adds some space before open-paren-in-column-0-in-strings
to circumvent some problems with outline-minor-mode incorrectly thinking
those open-parens correspond to actual top-level definitions :-(
diff --git a/lisp/tree-sitter.el b/lisp/tree-sitter.el
index 83aa2d0d123..2c5d103c42d 100644
--- a/lisp/tree-sitter.el
+++ b/lisp/tree-sitter.el
@@ -52,6 +52,8 @@ tree-sitter-should-enable-p
;;; Parser API supplement
+(defvar tree-sitter-parser-list)
+
(defun tree-sitter-get-parser (language)
"Find the first parser using LANGUAGE in `tree-sitter-parser-list'."
(catch 'found
@@ -196,7 +198,7 @@ tree-sitter-simple-indent-rules
"A list of indent rule settings.
Each indent rule setting should be (LANGUAGE . RULES),
where LANGUAGE is a language symbol, and RULES is a list of
-(MATCHER ANCHOR OFFSET).
+ (MATCHER ANCHOR OFFSET).
MATCHER determines whether this rule applies, ANCHOR and OFFSET
together determines which column to indent to.
@@ -289,7 +291,7 @@ tree-sitter-simple-indent-presets
MATCHER:
-(match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN NODE-INDEX-MAX)
+ (match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN NODE-INDEX-MAX)
NODE-TYPE checks for node's type, PARENT-TYPE check for
parent's type, NODE-FIELD checks for the filed name of node
@@ -304,25 +306,25 @@ tree-sitter-simple-indent-presets
that starts at point. This is the case when indenting an
empty line.
-(node-at-point TYPE NAMED)
+ (node-at-point TYPE NAMED)
Check that the node at point -- not the largest node at
point, has type TYPE. If NAMED non-nil, check the named node
at point.
-(parent-is TYPE)
+ (parent-is TYPE)
Check that the parent has type TYPE.
-(node-is TYPE)
+ (node-is TYPE)
Checks that the node has type TYPE.
-(parent-match PATTERN)
+ (parent-match PATTERN)
Checks that the parent matches PATTERN, a query pattern.
-(node-match PATTERN)
+ (node-match PATTERN)
Checks that the node matches PATTERN, a query pattern.
@@ -356,7 +358,7 @@ tree-sitter--simple-apply
If FN is a key in `tree-sitter-simple-indent-presets', use the
corresponding value as the function."
- (cond ((consp fn)
+ (cond ((consp fn) ;FIXME: This will mis-match for non-compiled lambdas!
(apply (tree-sitter--simple-apply (car fn) (cdr fn))
args))
((and (symbolp fn)
@@ -366,21 +368,46 @@ tree-sitter--simple-apply
((functionp fn) (apply fn args))
(t (error "Couldn't find appropriate function for FN"))))
-(defun tree-sitter-simple-indent-function ()
+(defvar tree-sitter-indent-function #'tree-sitter-simple-indent
+ "Document.")
+
+(defun tree-sitter-indent ()
"Indent according to `tree-sitter-simple-indent-rules'."
- (let* ((orig-pos (point))
- (bol (save-excursion
+ (pcase-let*
+ ((orig-pos (point))
+ (bol (save-excursion
+ (beginning-of-line)
+ (skip-chars-forward " \t")
+ (point)))
+ (node (tree-sitter-parent-while
+ (cl-loop for parser in tree-sitter-parser-list
+ for node = (tree-sitter-node-at
+ bol nil parser)
+ if node return node)
+ (lambda (node)
+ (eq bol (tree-sitter-node-start node)))))
+ (parent (tree-sitter-node-parent node))
+ (`(,anchor . ,offset)
+ (funcall tree-sitter-indent-function node parent)))
+ (let ((col (+ (save-excursion
+ (goto-char anchor)
+ (current-column))
+ offset)))
+ (if (< bol orig-pos)
+ (save-excursion
+ (indent-line-to col))
+ (indent-line-to col))
+ (when tree-sitter--indent-verbose
+ (message "indent to %S (%S position + %S)"
+ col anchor offset)))))
+
+(defun tree-sitter-simple-indent (node parent)
+ (let* ((bol (save-excursion
(beginning-of-line)
(skip-chars-forward " \t")
(point)))
- (node (tree-sitter-parent-while
- (cl-loop for parser in tree-sitter-parser-list
- for node = (tree-sitter-node-at
- bol nil parser)
- if node return node)
- (lambda (node)
- (eq bol (tree-sitter-node-start node)))))
- (parent (tree-sitter-node-parent node))
+ ;; FIXME: Can't we get the language from `node' rather than
+ ;; from `point'?
(language (tree-sitter-language-at (point)))
(rules (alist-get language tree-sitter-simple-indent-rules)))
(cl-loop for rule in rules
@@ -388,20 +415,9 @@ tree-sitter-simple-indent-function
for anchor = (nth 1 rule)
for offset = (nth 2 rule)
if (tree-sitter--simple-apply pred (list node parent bol))
- do (let ((col (+ (save-excursion
- (goto-char
- (tree-sitter--simple-apply
- anchor (list node parent bol)))
- (current-column))
- offset)))
- (if (< bol orig-pos)
- (save-excursion
- (indent-line-to col))
- (indent-line-to col))
- (when tree-sitter--indent-verbose
- (message "matched %S\nindent to %s"
- pred col)))
- and return nil)))
+ do `(,(tree-sitter--simple-apply
+ anchor (list node parent bol))
+ . ,offset))))
;;; Lab
@@ -435,7 +451,7 @@ ts-c-mode
(ignore t nil nil nil)
indent-line-function
- #'tree-sitter-simple-indent-function
+ #'tree-sitter-indent
tree-sitter-simple-indent-rules
ts-c-tree-sitter-indent-rules)
^ permalink raw reply related [flat|nested] 370+ messages in thread
* Re: [SPAM UNSURE] Tree-sitter api
2021-08-24 14:59 ` [SPAM UNSURE] " Stephen Leake
@ 2021-08-27 5:18 ` Yuan Fu
2021-08-31 0:48 ` Stephen Leake
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-08-27 5:18 UTC (permalink / raw)
To: Stephen Leake
Cc: Eli Zaretskii, Theodor Thornhill, Stefan Monnier,
Clément Pit-Claudel, emacs-devel
Thank you very much for spending time on this :-)
> On Aug 24, 2021, at 7:59 AM, Stephen Leake <stephen_leake@stephe-leake.org> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>>>
>>> ada-mode takes the approach of embedding the indent rules directly in
>>> the grammar, and the functions that do that provide a few more options
>>> than yours. To see the definition of those functions, you'll have to
>>> install the wisi package, and look in wisi.info, section Grammar
>>> actions. (it would be nice if that info/html file was linked from the
>>> GNU ELPA package page; I'll start a new thread for that).
>>
>> I had a cursory look at the manual for indent in wisi and have some
>> questions. Why does wisi indent from “low-level productions”?
>
> The indent of every new-line must be specified; low level productions
> can contain new-lines.
Ah, I see, what I did is to find the “largest” node that starts at BOL, and try to match that. IIUC, wisi starts from the “smallest” entity, and goes up (by getting its parent repeatedly) until there is a non-nil indent rule for it?
[snip]
> So your syntax for indent is much more verbose than the wisi syntax
> (because each token gets a separate rule), but specifies the same
> information.
>
> Your syntax also requires naming each token that is referenced in an
> indent rule; wisitoken can use token position to do that, which is the
> main reason indent is specified directly in the grammar file; it's very
> easy to associate each indent expression with the corresponding token,
> without having to make up names for the tokens.
> Here are the above
> wisitoken productions without the token names:
>
> function_definition : [ms_call_modifier] declaration_specifiers
> declarator compound_statement
> {(wisi-indent-action [nil nil nil 0])}
>
> call_expression : expression argument_list
> {(wisi-indent-action [nil 2])}
>
> To be fair, we'd have to look at the other types of rules, to see if
> this pattern holds up.
I tried and all rules can be translated into wisi’s style. However, it ends up as verbose as the previous one. My idea is to write out match patterns (similar to that in wisi) and give names to the interesting ones (so we use names as opposed to position). Then, if any matched node happens to be the node at point, use that node’s corresponding indent rule to indent. And in the indent rule, we can refer to other matched nodes. For example, in the indent rule of list_rest, the anchor is list_first.
Maybe there are better ways to implement this, but at its current stage I don’t think this is better than tree-sitter-simple-indent.
I think part of the reason why wisi’s indent rule can be succinct is that it is written along the grammar definition. It is hard to make tree-sitter’s indent rule as succinct while being easy to understand.
(defvar tree-sitter-query-indent-rules
'((tree-sitter-c
"(function_definition body: (_) @body)
(field_declaration_list) @field_decl
(call_expression (_) @call_child)
(if_statement
(condition) @if_cond
(consequence) @if_cons
(alternative) @if_alt
\"else\" @else)
(switch_statement
(condition) @switch_cond)
(case_statement
(_) @case-child) @case
(compound_statement) @lbracket
\"}\" @rbracket
(compound_statement
. (_) @list_first
(_)* @list_rest)
(initializer_list
. (_) @list_first
(_)* @list_rest)
(argument_list
. (_) @list_first
(_)* @list_rest)
(parameter_list
. (_) @list_first
(_)* @list_rest)
(field_declaration_list
. (_) @list_first
(_)* @list_rest)
"
(body parent 0)
(field_decl parent 0)
(call_child parent 2)
(if_cond parent 2)
(if_cons parent 2)
(if_alt parent 2)
(switch_cond parent 2)
(else parent 0)
(case parent 0)
(case-child parent 2)
(lbracket parent 2)
(rbracket parent 0)
(list_first parent 2)
(list_rest list_first 0)))
"A list of indent rule settings.
Each indent rule setting should be
(LANGUAGE PATTERN INDENT INDENT...)
where LANGUAGE is a language symbol, PATTERN is a query pattern
string, and each INDENT is a list
(CAPTURE_NAME ANCHOR OFFSET)
If a captured node matches
with the node at point, Emacs looks for an INDENT that has a
matching CAPTURE_NAME, and use the ANCHOR and OFFSET of that
INDENT to indent the current line.
ANCHOR should be a capture name, this capture name should capture
another node in PATTERN. Emacs finds the column of that node,
adds OFFSET to it, and indent the current line to that column.
TODO: examples in manual")
>
> I think you were biased by the "matching" rules tree-sitter supports.
> That approach is reasonable when you only want to specify information
> for a few nodes in the tree. Wisi assumes you want to specify indent
> information for most of the nodes in the tree, so it supports a
> tree-traversal model instead.
I assumed that the indent rule for most nodes would be something basic, like “same as previous line”, and we only need to specify indent rules for some “special” nodes.
IIUC, this tree-traversal method that you mentioned is like going bottom-up, and (in tree-sitter terms) match on each level, and accumulate indent delta for each matched indent rule, is that right? Does wisi go all the way up to top-level?
> Tree-sitter does support tree traversal,
> but doesn't provide an easy way to add information for each node, as the
> wisi indent-action syntax does.
Yes, I would still need to use a match pattern and name each node that I want to specify an indent delta for. There is no way to specify indent by position in the match pattern without naming each node.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-25 0:21 ` Stefan Monnier
@ 2021-08-27 5:45 ` Yuan Fu
2021-09-03 19:16 ` Theodor Thornhill
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-08-27 5:45 UTC (permalink / raw)
To: Stefan Monnier
Cc: Stephen Leake, Eli Zaretskii, Theodor Thornhill,
Clément Pit-Claudel, emacs-devel
> On Aug 24, 2021, at 5:21 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
>> Okay, here is the (ad-hoc) infrastructure I came up with:
>
> It's more than what I proposed, but it looks fairly good.
> See patch below which is the "side effect" of reading your code.
>
> You'll see that I removed the "-function" from the function name (this
> suffix is used for variables holding functions rather than for the
> function themselves) and I split that function into two, the outer one
> (tree-sitter-indent) implementing basically what I suggested and the
> inner one (tree-sitter-simple-indent) implementing the extra structure
> you added to it, mediated by a new var `tree-sitter-indent-function`
> which modes can set if they want to use another algorithm than the one
> you implemented in `tree-sitter-simple-indent`.
>
> The reason why I divided it this way is that my experience with
> indentation code is that it can be useful occasionally to call
> recursively the indentation code to know where a node *would* be
> indented. This comes in handy when you want to be able to provide
> indentation styles like:
>
> let myvariable = if (foo) {
> bar
> } else {
> baz
> }
>
> where the body of the `if` branches needs to be indented relative to the
> position where the `if` itself would be indented if it were on its own line.
Thanks, Stefan :-) I applied your patch and fixed the two FIXME’s.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: [SPAM UNSURE] Tree-sitter api
2021-08-27 5:18 ` [SPAM UNSURE] " Yuan Fu
@ 2021-08-31 0:48 ` Stephen Leake
0 siblings, 0 replies; 370+ messages in thread
From: Stephen Leake @ 2021-08-31 0:48 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Theodor Thornhill, Stefan Monnier,
Clément Pit-Claudel, emacs-devel
Yuan Fu <casouri@gmail.com> writes:
> Thank you very much for spending time on this :-)
And thank you for the same; always helpful to have different points of view.
>> The indent of every new-line must be specified; low level productions
>> can contain new-lines.
>
> Ah, I see, what I did is to find the “largest” node that starts at
> BOL, and try to match that. IIUC, wisi starts from the “smallest”
> entity, and goes up (by getting its parent repeatedly) until there is
> a non-nil indent rule for it?
That's almost right. The indent rule for each production is applied
while walking the entire syntax tree in depth-first order.
>> To be fair, we'd have to look at the other types of rules, to see if
>> this pattern holds up.
>
> I tried and all rules can be translated into wisi’s style.
Ok.
> However, it ends up as verbose as the previous one. My idea is to
> write out match patterns (similar to that in wisi) and give names to
> the interesting ones (so we use names as opposed to position). Then,
> if any matched node happens to be the node at point, use that node’s
> corresponding indent rule to indent. And in the indent rule, we can
> refer to other matched nodes. For example, in the indent rule of
> list_rest, the anchor is list_first.
>
> Maybe there are better ways to implement this, but at its current
> stage I don’t think this is better than tree-sitter-simple-indent.
Ok.
> I think part of the reason why wisi’s indent rule can be succinct is
> that it is written along the grammar definition. It is hard to make
> tree-sitter’s indent rule as succinct while being easy to understand.
Right.
> IIUC, this tree-traversal method that you mentioned is like going
> bottom-up, and (in tree-sitter terms) match on each level, and
> accumulate indent delta for each matched indent rule, is that right?
Yes.
> Does wisi go all the way up to top-level?
Yes; the top-level rule says the indent of every line defaults to 0;
that covers any remaining 'nil' values.
I have not tried to make this part of ada-mode incremental yet (ie, only
visit changed nodes). I'm not sure that's possible.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-08-27 5:45 ` Yuan Fu
@ 2021-09-03 19:16 ` Theodor Thornhill
[not found] ` <AF64EB2C-CCEC-4C98-8FE3-37697BEC9098@gmail.com>
0 siblings, 1 reply; 370+ messages in thread
From: Theodor Thornhill @ 2021-09-03 19:16 UTC (permalink / raw)
To: Yuan Fu, Stefan Monnier
Cc: Eli Zaretskii, Clément Pit-Claudel, Stephen Leake,
emacs-devel
Yuan Fu <casouri@gmail.com> writes:
Hi!
>
> Thanks, Stefan :-) I applied your patch and fixed the two FIXME’s.
>
If I were to start experimenting with this in csharp-mode, how would I
start? Right now we support the rust version on melpa, but I'd rather
move to this core-supported package. How far are we from including this
in core, and what can I do to help?
All the best,
Theodor Thornhill
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
[not found] ` <AF64EB2C-CCEC-4C98-8FE3-37697BEC9098@gmail.com>
@ 2021-09-04 12:49 ` Tuấn-Anh Nguyễn
2021-09-04 13:04 ` Eli Zaretskii
2021-09-04 15:31 ` Yuan Fu
2021-09-04 15:14 ` Tuấn-Anh Nguyễn
2021-09-05 21:15 ` Theodor Thornhill
2 siblings, 2 replies; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-04 12:49 UTC (permalink / raw)
To: Yuan Fu
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
On Sat, Sep 4, 2021 at 1:44 PM Yuan Fu <casouri@gmail.com> wrote:
> 1) tree-sitter lacks a way to change its malloc behavior in run-time, I commented on their road-map issue, but no one has replied yet,
>
Do you mean APIs to change its alloc/free functions at run time? Why would we
need to do that? Doesn't simply defining `ts_malloc` and related functions work?
See https://github.com/tree-sitter/tree-sitter/blob/v0.20.0/lib/src/alloc.h#L27.
> 4) I need to work on a better way to build and distribute language dynamic modules.
>
I think there should be 2 mechanisms:
1. The binaries for common platforms should be built on Emacs's build
infrastructure, and distributed through GNU ELPA.
2. There should be Lisp functions to download the grammar sources and compile
them (by invoking the compiler).
> You can find a script for building dynamic modules at https://github.com/casouri/tree-sitter-module
>
I may be missing something here, but the grammars' compiled forms don't need to
be Emacs dynamic modules, right? They only need to be dynamically-loadable
shared libraries.
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-04 12:49 ` Tuấn-Anh Nguyễn
@ 2021-09-04 13:04 ` Eli Zaretskii
2021-09-04 14:49 ` Tuấn-Anh Nguyễn
2021-09-04 15:31 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-04 13:04 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: casouri, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> Date: Sat, 4 Sep 2021 19:49:35 +0700
> Cc: Theodor Thornhill <theo@thornhill.no>, Stephen Leake <stephen_leake@stephe-leake.org>,
> Eli Zaretskii <eliz@gnu.org>, Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel <emacs-devel@gnu.org>
>
> On Sat, Sep 4, 2021 at 1:44 PM Yuan Fu <casouri@gmail.com> wrote:
> > 1) tree-sitter lacks a way to change its malloc behavior in run-time, I commented on their road-map issue, but no one has replied yet,
> >
> Do you mean APIs to change its alloc/free functions at run time? Why would we
> need to do that?
Because what TS does when it runs out of memory is call 'exit'.
That's unacceptable for Emacs. Emacs can handle out-of-memory
situations well enough, but it can only do that if the problem is
reported to it by the memory-allocation functions.
> Doesn't simply defining `ts_malloc` and related functions work?
No, because we want to be able to link against a TS library, we don't
want to require people who build Emacs to build TS as well.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-04 13:04 ` Eli Zaretskii
@ 2021-09-04 14:49 ` Tuấn-Anh Nguyễn
2021-09-04 15:00 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-04 14:49 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, Theodor Thornhill, Clément Pit-Claudel, emacs-devel,
Stefan Monnier, Stephen Leake
On Sat, Sep 4, 2021 at 8:04 PM Eli Zaretskii <eliz@gnu.org> wrote:
> No, because we want to be able to link against a TS library, we don't
> want to require people who build Emacs to build TS as well.
Related questions:
1. Who do we expect to build the TS library? For Linux I assume that would be
the maintainer of the (system) package `libtree-sitter`. Is that correct?
2. Who do we expect to build the grammar binaries?
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-04 14:49 ` Tuấn-Anh Nguyễn
@ 2021-09-04 15:00 ` Eli Zaretskii
2021-09-05 16:34 ` Tuấn-Anh Nguyễn
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-04 15:00 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: casouri, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> Date: Sat, 4 Sep 2021 21:49:29 +0700
> Cc: Yuan Fu <casouri@gmail.com>, Theodor Thornhill <theo@thornhill.no>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel <emacs-devel@gnu.org>
>
> On Sat, Sep 4, 2021 at 8:04 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > No, because we want to be able to link against a TS library, we don't
> > want to require people who build Emacs to build TS as well.
>
> Related questions:
> 1. Who do we expect to build the TS library? For Linux I assume that would be
> the maintainer of the (system) package `libtree-sitter`. Is that correct?
The distro, I'd say. It can alwso be built on the user's machine and
installed separately. Basically, the same as with any other optional
library we use: libpng, harfBuzz, etc.
> 2. Who do we expect to build the grammar binaries?
The ones that TS already provides? They are already built, no? Or
what do you mean by "build the grammar binaries", what kind of
binaries are those? Forgive me my ignorance.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
[not found] ` <AF64EB2C-CCEC-4C98-8FE3-37697BEC9098@gmail.com>
2021-09-04 12:49 ` Tuấn-Anh Nguyễn
@ 2021-09-04 15:14 ` Tuấn-Anh Nguyễn
2021-09-04 15:33 ` Eli Zaretskii
2021-09-04 15:39 ` Yuan Fu
2021-09-05 21:15 ` Theodor Thornhill
2 siblings, 2 replies; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-04 15:14 UTC (permalink / raw)
To: Yuan Fu
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
On Sat, Sep 4, 2021 at 1:44 PM Yuan Fu <casouri@gmail.com> wrote:
> You can find the code at https://github.com/casouri/emacs.git, in “ts” branch. As long as tree-sitter library is in the standard path, Emacs will compile with tree-sitter support.
I'm not on a system with `libtree-sitter` available, so I built and installed it
from source:
export PREFIX=/opt/local
make
sudo make install
It installed to `/opt/local/include` and `/opt/local/lib`, which are already on
my standard include/lib paths. However, I'm getting this error:
configure: error: The following required libraries were not found:
tree-sitter
Do you have more detailed instructions? For example, should the include/lib
paths/flags be set to some special values?
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-04 12:49 ` Tuấn-Anh Nguyễn
2021-09-04 13:04 ` Eli Zaretskii
@ 2021-09-04 15:31 ` Yuan Fu
2021-09-05 16:45 ` Tuấn-Anh Nguyễn
1 sibling, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-04 15:31 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
>
>> You can find a script for building dynamic modules at https://github.com/casouri/tree-sitter-module
>>
> I may be missing something here, but the grammars' compiled forms don't need to
> be Emacs dynamic modules, right? They only need to be dynamically-loadable
> shared libraries.
I packaged language definitions into dynamic modules: the system is there, why not take advantage of it? Do you think this approach can be improved in some way?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-04 15:14 ` Tuấn-Anh Nguyễn
@ 2021-09-04 15:33 ` Eli Zaretskii
2021-09-05 16:48 ` Tuấn-Anh Nguyễn
2021-09-04 15:39 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-04 15:33 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: casouri, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> Date: Sat, 4 Sep 2021 22:14:06 +0700
> Cc: Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>, emacs-devel <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>, Eli Zaretskii <eliz@gnu.org>,
> Stephen Leake <stephen_leake@stephe-leake.org>
>
> On Sat, Sep 4, 2021 at 1:44 PM Yuan Fu <casouri@gmail.com> wrote:
> > You can find the code at https://github.com/casouri/emacs.git, in “ts” branch. As long as tree-sitter library is in the standard path, Emacs will compile with tree-sitter support.
>
> I'm not on a system with `libtree-sitter` available, so I built and installed it
> from source:
>
> export PREFIX=/opt/local
> make
> sudo make install
>
> It installed to `/opt/local/include` and `/opt/local/lib`, which are already on
> my standard include/lib paths. However, I'm getting this error:
>
> configure: error: The following required libraries were not found:
> tree-sitter
Does config.log tell anything useful?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-04 15:14 ` Tuấn-Anh Nguyễn
2021-09-04 15:33 ` Eli Zaretskii
@ 2021-09-04 15:39 ` Yuan Fu
1 sibling, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2021-09-04 15:39 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
> On Sep 4, 2021, at 8:14 AM, Tuấn-Anh Nguyễn <ubolonton@gmail.com> wrote:
>
> On Sat, Sep 4, 2021 at 1:44 PM Yuan Fu <casouri@gmail.com> wrote:
>> You can find the code at https://github.com/casouri/emacs.git, in “ts” branch. As long as tree-sitter library is in the standard path, Emacs will compile with tree-sitter support.
>
> I'm not on a system with `libtree-sitter` available, so I built and installed it
> from source:
>
> export PREFIX=/opt/local
> make
> sudo make install
>
> It installed to `/opt/local/include` and `/opt/local/lib`, which are already on
> my standard include/lib paths. However, I'm getting this error:
>
> configure: error: The following required libraries were not found:
> tree-sitter
>
> Do you have more detailed instructions? For example, should the include/lib
> paths/flags be set to some special values?
Not really, on my machine, tree-sitter is also installed in /opt/local, but I don’t see any problem building Emacs. Maybe give --libdir=DIR and --includedir=DIR a try and see if that helps.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-04 15:00 ` Eli Zaretskii
@ 2021-09-05 16:34 ` Tuấn-Anh Nguyễn
2021-09-05 16:45 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-05 16:34 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, Theodor Thornhill, Clément Pit-Claudel, emacs-devel,
Stefan Monnier, Stephen Leake
On Sat, Sep 4, 2021 at 10:00 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > 2. Who do we expect to build the grammar binaries?
>
> The ones that TS already provides? They are already built, no? Or
> what do you mean by "build the grammar binaries", what kind of
> binaries are those? Forgive me my ignorance.
There are 2 components: TS the library (`libtree-sitter`) provides the generic
parts, not the grammars. The grammars come from various repositories, in source
form. (Some of them are owned by the tree-sitter project, some are not.) Each of
those consists of a generated `parser.c` and an optional `scanner.{c,cc}`. They
provide a function `TSLanguage (*tree_sitter_c) ()`, which specifies details on
how to parse a specific language (e.g. the parse table). They are usually
compiled into dynamically-loadable shared libraries (by a `tree-sitter` CLI
program), and distributed separately from `libtree-sitter`.
Tree-sitter has its own ABI versioning for these 2 components. It's easier to
ensure ABI compatibility if they are both built by the same system. That's the
case for GitHub's internal uses of tree-sitter. That's also the case with
`tree-sitter` and `tree-sitter-langs` packages on MELPA. That's not the case
with NeoVim's tree-sitter integration, and in a source of constant headache
AFAICT.
If we leave `libtree-sitter` to the distro, then it also makes sense for the
distro to provide the `tree-sitter` CLI program, and/or the grammar
binaries/sources.
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-05 16:34 ` Tuấn-Anh Nguyễn
@ 2021-09-05 16:45 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-05 16:45 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: casouri, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> Date: Sun, 5 Sep 2021 23:34:59 +0700
> Cc: Yuan Fu <casouri@gmail.com>, Theodor Thornhill <theo@thornhill.no>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel <emacs-devel@gnu.org>
>
> If we leave `libtree-sitter` to the distro, then it also makes sense for the
> distro to provide the `tree-sitter` CLI program, and/or the grammar
> binaries/sources.
Yes, of course. And users who built TS themselves, will have to build
those grammar files as well.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-04 15:31 ` Yuan Fu
@ 2021-09-05 16:45 ` Tuấn-Anh Nguyễn
2021-09-05 20:19 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-05 16:45 UTC (permalink / raw)
To: Yuan Fu
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
On Sat, Sep 4, 2021 at 10:31 PM Yuan Fu <casouri@gmail.com> wrote:
> I packaged language definitions into dynamic modules: the system is there, why not take advantage of it? Do you think this approach can be improved in some way?
The language definitions just need to come from dynamically-loadable shared
libraries. They don't have to be Emacs dynamic modules, which bring additional
unnecessary complications, e.g. build difficulty, load path pollution, or
inability to load grammar binaries from other sources like distro's package
repos. It's better to just load the shared libs directly without going through
module machinery. Use the functions in `dynlib.h`.
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-04 15:33 ` Eli Zaretskii
@ 2021-09-05 16:48 ` Tuấn-Anh Nguyễn
0 siblings, 0 replies; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-05 16:48 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, Theodor Thornhill, Clément Pit-Claudel, emacs-devel,
Stefan Monnier, Stephen Leake
On Sat, Sep 4, 2021 at 10:33 PM Eli Zaretskii <eliz@gnu.org> wrote:
> Does config.log tell anything useful?
Yeah, it showed that `pkg-config` could not find `tree-sitter`. It was a problem
with my custom setup. I can build it now. Thanks!
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-05 16:45 ` Tuấn-Anh Nguyễn
@ 2021-09-05 20:19 ` Yuan Fu
2021-09-06 0:03 ` Tuấn-Anh Nguyễn
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-05 20:19 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
> On Sep 5, 2021, at 9:45 AM, Tuấn-Anh Nguyễn <ubolonton@gmail.com> wrote:
>
> On Sat, Sep 4, 2021 at 10:31 PM Yuan Fu <casouri@gmail.com> wrote:
>> I packaged language definitions into dynamic modules: the system is there, why not take advantage of it? Do you think this approach can be improved in some way?
>
> The language definitions just need to come from dynamically-loadable shared
> libraries. They don't have to be Emacs dynamic modules, which bring additional
> unnecessary complications, e.g. build difficulty, load path pollution, or
> inability to load grammar binaries from other sources like distro's package
> repos. It's better to just load the shared libs directly without going through
> module machinery. Use the functions in `dynlib.h`.
>
Dynamic modules comes with nice things, for example Emacs looks for them automatically in load-path; Emacs reports errors with it has problem loading one; I can package some additional information with the module; I could maybe distribute them through ordinary package.el facility, etc etc. If I load the shared library directly, I need to reinvent the wheels for loading, error reporting, searching in load-path, and others. On the other hand, dynamic modules don’t come with much complications. Yes, you need additional emacs-modules.h and tree-sitter-<lang>.c to build it, but that’s about it. And I was hoping to distribute pre-built modules anyway, so if all went well, ordinary users don’t need to compile the modules. WDYT?
P.S. what do you mean by “load path pollution”?
P.P.S. My impression is that other applications distribute language definitions by themselves, and it is not common for distort to package language definitions, is that correct?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
[not found] ` <AF64EB2C-CCEC-4C98-8FE3-37697BEC9098@gmail.com>
2021-09-04 12:49 ` Tuấn-Anh Nguyễn
2021-09-04 15:14 ` Tuấn-Anh Nguyễn
@ 2021-09-05 21:15 ` Theodor Thornhill
2021-09-05 23:58 ` Yuan Fu
2 siblings, 1 reply; 370+ messages in thread
From: Theodor Thornhill @ 2021-09-05 21:15 UTC (permalink / raw)
To: Yuan Fu
Cc: Stefan Monnier, Eli Zaretskii, Clément Pit-Claudel,
Stephen Leake, emacs-devel
Yuan Fu <casouri@gmail.com> writes:
> You can find the code at https://github.com/casouri/emacs.git, in “ts” branch. As long as tree-sitter library is in the standard path, Emacs will compile with tree-sitter support. Language definitions are loaded by dynamic modules. You can find a script for building dynamic modules at https://github.com/casouri/tree-sitter-module, you can even just grab the release file. I need to add C# to the list of languages in the build script.
>
> Now you have Emacs and dynamic modules. You can also build the manual, I just wrote the manual entries for tree-sitter API, its a draft but should explain everything the Emacs tree-sitter API provides. To build the manual you want to go to /doc/listpref, and do “make -e HTML_OPTS="--html” elisp.html”, that should compile a manual in elisp.html directory. The tree-sitter part is in “37 Parsing Program Source”. I attached a zip file containing the compiled manual on my machine, you can just use that.
>
> I haven’t written manual for font-lock and indent support, because they are not settled yet. To see how they work, you can read:
>
> 1. the source of ts-c-mode in /lisp/tree-sitter.el,
> 2. doctoring of font-lock-tree-sitter-defaults and font-lock-tree-sitter-settings, and
> 3. docstring of tree-sitter-simple-indent-rules
>
> BTW, tree-sitter-inspect-mode could be helpful.
>
>> Right now we support the rust version on melpa, but I'd rather
>> move to this core-supported package. How far are we from including this
>> in core,
>
> Some blockers that I can think of are 1) tree-sitter lacks a way to change its malloc behavior in run-time, I commented on their road-map issue, but no one has replied yet, 2) font-lock and indent support hasn’t settled, 3) writing, reviewing and editing the manual will take some time, 4) I need to work on a better way to build and distribute language dynamic modules.
>
>> and what can I do to help?
>
> For a starter, could you perhaps have a look at the indentation system (tree-sitter-simple-indent-rules and friends), and tell me if anything is lacking? Too complex, not powerful enough, etc. Do you have any suggestions? The same goes for font-lock.
>
> Also, I’m happy to hear your suggestions on the general tree-sitter API and the manual.
>
Thank you for your thorough instructions. I've been able to compile it
on my system, but I'm having trouble with the c-sharp module. I get
this error:
--------------------------------------------
Cloning into 'tree-sitter-c-sharp'...
remote: Enumerating objects: 62, done.
remote: Counting objects: 100% (62/62), done.
remote: Compressing objects: 100% (57/57), done.
remote: Total 62 (delta 17), reused 19 (delta 0), pack-reused 0
Receiving objects: 100% (62/62), 831.92 KiB | 2.12 MiB/s, done.
Resolving deltas: 100% (17/17), done.
tree-sitter-c-sharp.c:7:33: error: expected ';' after top level declarator
extern TSLanguage *tree_sitter_c-sharp(void);
^
;
tree-sitter-c-sharp.c:16:40: error: implicit declaration of function 'sharp' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
TSLanguage *language = tree_sitter_c-sharp();
^
2 errors generated.
binding.cc:2:10: fatal error: 'node.h' file not found
#include <node.h>
^~~~~~~~
1 error generated.
----------------------------------------------
I'm guessing this is due to the hyphen in the function name. I remember
we had to do some shenanigans in the rust variant some time ago to
translate this properly. In the C files we need to use underscore
rather than hyphen, yes? If this isn't too hard to do I guess I can try
to make a PR to your project, otherwise you at least have a bugreport
here :) I'm also looking into the code now, and it looks nice so far.
I'll come back to you when I have something more!
Theodor
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-05 21:15 ` Theodor Thornhill
@ 2021-09-05 23:58 ` Yuan Fu
0 siblings, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2021-09-05 23:58 UTC (permalink / raw)
To: Theodor Thornhill
Cc: Stephen Leake, Eli Zaretskii, Clément Pit-Claudel,
Stefan Monnier, emacs-devel
>
> Thank you for your thorough instructions. I've been able to compile it
> on my system, but I'm having trouble with the c-sharp module. I get
> this error:
>
> --------------------------------------------
>
> Cloning into 'tree-sitter-c-sharp'...
> remote: Enumerating objects: 62, done.
> remote: Counting objects: 100% (62/62), done.
> remote: Compressing objects: 100% (57/57), done.
> remote: Total 62 (delta 17), reused 19 (delta 0), pack-reused 0
> Receiving objects: 100% (62/62), 831.92 KiB | 2.12 MiB/s, done.
> Resolving deltas: 100% (17/17), done.
> tree-sitter-c-sharp.c:7:33: error: expected ';' after top level declarator
> extern TSLanguage *tree_sitter_c-sharp(void);
> ^
> ;
> tree-sitter-c-sharp.c:16:40: error: implicit declaration of function 'sharp' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
> TSLanguage *language = tree_sitter_c-sharp();
> ^
> 2 errors generated.
> binding.cc:2:10: fatal error: 'node.h' file not found
> #include <node.h>
> ^~~~~~~~
> 1 error generated.
>
> ----------------------------------------------
>
> I'm guessing this is due to the hyphen in the function name. I remember
> we had to do some shenanigans in the rust variant some time ago to
> translate this properly. In the C files we need to use underscore
> rather than hyphen, yes? If this isn't too hard to do I guess I can try
> to make a PR to your project, otherwise you at least have a bugreport
> here :) I'm also looking into the code now, and it looks nice so far.
> I'll come back to you when I have something more!
Thanks for trying out and reporting :-) I’ve fixed the build script and it now should build c-sharp.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-05 20:19 ` Yuan Fu
@ 2021-09-06 0:03 ` Tuấn-Anh Nguyễn
2021-09-06 0:23 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-06 0:03 UTC (permalink / raw)
To: Yuan Fu
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
On Mon, Sep 6, 2021 at 3:19 AM Yuan Fu <casouri@gmail.com> wrote:
> Dynamic modules comes with nice things, for example Emacs looks for them automatically in load-path; Emacs reports errors with it has problem loading one; On the other hand, dynamic modules don’t come with much complications. Yes, you need additional emacs-modules.h and tree-sitter-<lang>.c to build it, but that’s about it.
See my other discussion with Eli. We want to rely on the distro to provide the
binaries and the `tree-sitter` CLI program, and to be able to use shared libs
from other sources as well (like self-built). They are not going to be Emacs
dynamic modules.
> I can package some additional information with the module; I could maybe distribute them through ordinary package.el facility, etc etc.
Neither of these requires it to be a module at all. (Also note that package.el
isn't able to handle platform-specific files at the moment.)
> If I load the shared library directly, I need to reinvent the wheels for loading, error reporting, searching in load-path, and others.
The non-module-specific part of loading is provided by `dynlib.h`. There's no
wheel to reinvent here. What error reporting do you mean? (You are going to need
additional checks for ABI compatibility anyway.) Searching a load path (not the
`load-path`) is not that complicated. What are the others?
> And I was hoping to distribute pre-built modules anyway, so if all went well, ordinary users don’t need to compile the modules. WDYT?
It's good to provide that convenience, but it should not be at the expense of
not being able to use binaries from other sources, or to build the binaries on
their own. The `tree-sitter-langs` package already enables both of these. It
provides both pre-built binaries and functions for users to compile on their
own. And it does so without putting language definitions in dynamic modules.
> P.S. what do you mean by “load path pollution”?
I meant to say load path collision, but since you use `tree-sitter-{lang}` for
the module name, that's less of a problem. Load path pollution is these names
showing up when the user enumerates entries on the load path trying to go to the
source of a Lisp library. That's annoying, but bearable.
> P.P.S. My impression is that other applications distribute language definitions by themselves, and it is not common for distort to package language definitions, is that correct?
I don't understand this. Can you rephrase it?
All in all, you are severely underestimating the amount of complexity and wheels
you will have to reinvent in other places compared to the amount of code you
don't have to write by requiring language definitions to be in dynamic modules.
(It's less than 100, most of which is docstrings and comments.)
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-06 0:03 ` Tuấn-Anh Nguyễn
@ 2021-09-06 0:23 ` Yuan Fu
2021-09-06 5:33 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-06 0:23 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
> On Sep 5, 2021, at 5:03 PM, Tuấn-Anh Nguyễn <ubolonton@gmail.com> wrote:
>
> On Mon, Sep 6, 2021 at 3:19 AM Yuan Fu <casouri@gmail.com> wrote:
>> Dynamic modules comes with nice things, for example Emacs looks for them automatically in load-path; Emacs reports errors with it has problem loading one; On the other hand, dynamic modules don’t come with much complications. Yes, you need additional emacs-modules.h and tree-sitter-<lang>.c to build it, but that’s about it.
>
> See my other discussion with Eli. We want to rely on the distro to provide the
> binaries and the `tree-sitter` CLI program, and to be able to use shared libs
> from other sources as well (like self-built). They are not going to be Emacs
> dynamic modules.
>
>> I can package some additional information with the module; I could maybe distribute them through ordinary package.el facility, etc etc.
>
> Neither of these requires it to be a module at all. (Also note that package.el
> isn't able to handle platform-specific files at the moment.)
>
>> If I load the shared library directly, I need to reinvent the wheels for loading, error reporting, searching in load-path, and others.
>
> The non-module-specific part of loading is provided by `dynlib.h`. There's no
> wheel to reinvent here. What error reporting do you mean? (You are going to need
> additional checks for ABI compatibility anyway.) Searching a load path (not the
> `load-path`) is not that complicated. What are the others?
>
>> And I was hoping to distribute pre-built modules anyway, so if all went well, ordinary users don’t need to compile the modules. WDYT?
>
> It's good to provide that convenience, but it should not be at the expense of
> not being able to use binaries from other sources, or to build the binaries on
> their own. The `tree-sitter-langs` package already enables both of these. It
> provides both pre-built binaries and functions for users to compile on their
> own. And it does so without putting language definitions in dynamic modules.
>
>> P.S. what do you mean by “load path pollution”?
>
> I meant to say load path collision, but since you use `tree-sitter-{lang}` for
> the module name, that's less of a problem. Load path pollution is these names
> showing up when the user enumerates entries on the load path trying to go to the
> source of a Lisp library. That's annoying, but bearable.
>
>> P.P.S. My impression is that other applications distribute language definitions by themselves, and it is not common for distort to package language definitions, is that correct?
>
> I don't understand this. Can you rephrase it?
>
> All in all, you are severely underestimating the amount of complexity and wheels
> you will have to reinvent in other places compared to the amount of code you
> don't have to write by requiring language definitions to be in dynamic modules.
> (It's less than 100, most of which is docstrings and comments.)
I see your point. If no one else object, I’ll change the code to use shared libraries instead of dynamic modules. Thanks for the input :-)
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-06 0:23 ` Yuan Fu
@ 2021-09-06 5:33 ` Eli Zaretskii
2021-09-07 15:38 ` Tuấn-Anh Nguyễn
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-06 5:33 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 5 Sep 2021 17:23:33 -0700
> Cc: Theodor Thornhill <theo@thornhill.no>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Eli Zaretskii <eliz@gnu.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel <emacs-devel@gnu.org>
>
> I see your point. If no one else object, I’ll change the code to use shared libraries instead of dynamic modules. Thanks for the input :-)
Can we please stop for a moment and describe what exactly is required
for loading a language module? I think it would be good to have that
documented in this discussion for posterity, and so that we make sure
we are all on the same page.
I understand that a language module gets compiled into a shared
library, either as part of building TS or separately. But what should
Emacs do to "load" the module, and when should it do that? And how do
we intend to handle the situation where a module is needed, but is not
available (i.e. its loading fails)?
Emacs has a load-on-demand infrastructure for shared libraries, but it
only exists on MS-Windows, where we support installations of Emacs
binaries without some of the optional libraries, and want to handle
that gracefully. However, this doesn't seem to be a similar
situation; for starters, load-on-demand needs to know at Emacs build
time the names of entry points (functions and variables) we need to
import from each shared library. So I guess we are talking about some
(slightly) different mechanism here?
Thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-06 5:33 ` Eli Zaretskii
@ 2021-09-07 15:38 ` Tuấn-Anh Nguyễn
2021-09-07 16:16 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-07 15:38 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, Theodor Thornhill, Clément Pit-Claudel, emacs-devel,
Stefan Monnier, Stephen Leake
On Mon, Sep 6, 2021 at 12:33 PM Eli Zaretskii <eliz@gnu.org> wrote:
> I understand that a language module gets compiled into a shared
> library, either as part of building TS or separately. But what should
> Emacs do to "load" the module, and when should it do that? And how do
> we intend to handle the situation where a module is needed, but is not
> available (i.e. its loading fails)?
Emacs should "load" the module when it's asked to do so, by a function, e.g.
`tree-sitter-load-lang`. When loading fails, it should signal an error.
To locate the module, I think there are 2 possible approaches:
1. Emacs consults a new search path variable to look for the module, which is
named `<lang>[.ext]`, and calls `dynlib_open` with the absolute path.
2. Emacs calls `dynlib_open` with the basename `tree-sitter-<lang>[.ext]`,
relying on the module being correctly put on the system's library search path,
e.g. by the distro's package manager.
Option 2 sounds better to me, but option 1 is how people do it at the moment.
(And no distro has packaged these AFAICT.)
> Emacs has a load-on-demand infrastructure for shared libraries, but it
> only exists on MS-Windows, where we support installations of Emacs
> binaries without some of the optional libraries, and want to handle
> that gracefully. However, this doesn't seem to be a similar
> situation; for starters, load-on-demand needs to know at Emacs build
> time the names of entry points (functions and variables) we need to
> import from each shared library. So I guess we are talking about some
> (slightly) different mechanism here?
For each language, the entry point is a single function `TSLanguage
(*tree_sitter_<lang>) ()`,
where `lang` is the name declared in the grammar's DSL source. It's ensured by
the parser generator (the `tree-sitter` CLI program).
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-07 15:38 ` Tuấn-Anh Nguyễn
@ 2021-09-07 16:16 ` Eli Zaretskii
2021-09-08 3:06 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-07 16:16 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: casouri, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Tuấn-Anh Nguyễn <ubolonton@gmail.com>
> Date: Tue, 7 Sep 2021 22:38:52 +0700
> Cc: Yuan Fu <casouri@gmail.com>, Theodor Thornhill <theo@thornhill.no>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel <emacs-devel@gnu.org>
>
> On Mon, Sep 6, 2021 at 12:33 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > I understand that a language module gets compiled into a shared
> > library, either as part of building TS or separately. But what should
> > Emacs do to "load" the module, and when should it do that? And how do
> > we intend to handle the situation where a module is needed, but is not
> > available (i.e. its loading fails)?
>
> Emacs should "load" the module when it's asked to do so, by a function, e.g.
> `tree-sitter-load-lang`. When loading fails, it should signal an error.
So this has to be an explicit load initiated by a Lisp program? How
would that program know which module to load for a given language? (I
thought TS would load the module it needs whenever support for a
language is requested.)
> To locate the module, I think there are 2 possible approaches:
> 1. Emacs consults a new search path variable to look for the module, which is
> named `<lang>[.ext]`, and calls `dynlib_open` with the absolute path.
> 2. Emacs calls `dynlib_open` with the basename `tree-sitter-<lang>[.ext]`,
> relying on the module being correctly put on the system's library search path,
> e.g. by the distro's package manager.
>
> Option 2 sounds better to me, but option 1 is how people do it at the moment.
> (And no distro has packaged these AFAICT.)
I think 2 is better, since we are relying on others to build and
package these modules.
Thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-07 16:16 ` Eli Zaretskii
@ 2021-09-08 3:06 ` Yuan Fu
2021-09-10 2:06 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-08 3:06 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, emacs-devel, Stefan Monnier,
Stephen Leake
>>
>> Emacs should "load" the module when it's asked to do so, by a function, e.g.
>> `tree-sitter-load-lang`. When loading fails, it should signal an error.
>
> So this has to be an explicit load initiated by a Lisp program? How
> would that program know which module to load for a given language? (I
> thought TS would load the module it needs whenever support for a
> language is requested.)
TS doesn’t load the module, it expects the user to pass it a pointer to the language definition. How does the user get the language definition is not its business. The user is supposed to combine TS and a language definition to create a workable parser. See:
bool ts_parser_set_language(TSParser *self, const TSLanguage *language);
TS only wants a pointer to a TSLanguage.
All the language modules have regular names, i.e., tree-sitter-<lang>.so, so I think we can just calculate the name from the language name; or we can add a backup: use an alist to map language names to module names to cover possible irregular names.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-08 3:06 ` Yuan Fu
@ 2021-09-10 2:06 ` Yuan Fu
2021-09-10 6:32 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-10 2:06 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, emacs-devel, Stefan Monnier,
Stephen Leake
> On Sep 7, 2021, at 8:06 PM, Yuan Fu <casouri@gmail.com> wrote:
>
>>>
>>> Emacs should "load" the module when it's asked to do so, by a function, e.g.
>>> `tree-sitter-load-lang`. When loading fails, it should signal an error.
>>
>> So this has to be an explicit load initiated by a Lisp program? How
>> would that program know which module to load for a given language? (I
>> thought TS would load the module it needs whenever support for a
>> language is requested.)
>
> TS doesn’t load the module, it expects the user to pass it a pointer to the language definition. How does the user get the language definition is not its business. The user is supposed to combine TS and a language definition to create a workable parser. See:
>
> bool ts_parser_set_language(TSParser *self, const TSLanguage *language);
>
> TS only wants a pointer to a TSLanguage.
>
> All the language modules have regular names, i.e., tree-sitter-<lang>.so, so I think we can just calculate the name from the language name; or we can add a backup: use an alist to map language names to module names to cover possible irregular names.
If you think it’s fine, Eli, I’ll start working on this.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-10 2:06 ` Yuan Fu
@ 2021-09-10 6:32 ` Eli Zaretskii
2021-09-10 19:57 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-10 6:32 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 9 Sep 2021 19:06:28 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> > TS doesn’t load the module, it expects the user to pass it a pointer to the language definition. How does the user get the language definition is not its business. The user is supposed to combine TS and a language definition to create a workable parser. See:
> >
> > bool ts_parser_set_language(TSParser *self, const TSLanguage *language);
> >
> > TS only wants a pointer to a TSLanguage.
> >
> > All the language modules have regular names, i.e., tree-sitter-<lang>.so, so I think we can just calculate the name from the language name; or we can add a backup: use an alist to map language names to module names to cover possible irregular names.
>
> If you think it’s fine, Eli, I’ll start working on this.
Sure. I guess we will have to have a database of module names for
each programming language somewhere?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-10 6:32 ` Eli Zaretskii
@ 2021-09-10 19:57 ` Yuan Fu
2021-09-11 3:41 ` Tuấn-Anh Nguyễn
2021-09-11 5:51 ` Eli Zaretskii
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-09-10 19:57 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, emacs-devel, Stefan Monnier,
Stephen Leake
>
> Sure. I guess we will have to have a database of module names for
> each programming language somewhere?
My plan is to translate lisp names to C names by default, and have an override list for irregular names that can’t be translated correctly.
Just realized another problem, how do we make sure the loaded library is GPL-compatible? There certainly won’t be “plugin_is_GPL_compatible” symbol in them… And IIUC Emacs cannot load GPL-incompatible dynamic libraries?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-10 19:57 ` Yuan Fu
@ 2021-09-11 3:41 ` Tuấn-Anh Nguyễn
2021-09-11 4:11 ` Yuan Fu
2021-09-11 5:51 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-11 3:41 UTC (permalink / raw)
To: Yuan Fu
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
> Just realized another problem, how do we make sure the loaded library is GPL-compatible?
This question is rather non-technical, so I can't provide any comments.
> There certainly won’t be “plugin_is_GPL_compatible” symbol in them… And IIUC Emacs cannot load GPL-incompatible dynamic libraries?
That's one of the reasons for using `dynlib.h` APIs directly. The check for
that symbol is at the level of `emacs-module.c`. Let's not conceptually
conflate a "shared library" and an "Emacs dynamic module".
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-11 3:41 ` Tuấn-Anh Nguyễn
@ 2021-09-11 4:11 ` Yuan Fu
2021-09-11 7:23 ` Tuấn-Anh Nguyễn
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-11 4:11 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
> On Sep 10, 2021, at 8:41 PM, Tuấn-Anh Nguyễn <ubolonton@gmail.com> wrote:
>
>> Just realized another problem, how do we make sure the loaded library is GPL-compatible?
>
> This question is rather non-technical, so I can't provide any comments.
>
>> There certainly won’t be “plugin_is_GPL_compatible” symbol in them… And IIUC Emacs cannot load GPL-incompatible dynamic libraries?
>
> That's one of the reasons for using `dynlib.h` APIs directly. The check for
> that symbol is at the level of `emacs-module.c`. Let's not conceptually
> conflate a "shared library" and an "Emacs dynamic module”.
I think you have it backwards. IIUC the reason why every Emacs dynamic module declares “plugin_is_GPL_compatible” is that every shared library that links with Emacs must be GPL compatible, and an Emacs dynamic module is a shared library. But that’s just my understanding, of course. I’m happy to be corrected.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-10 19:57 ` Yuan Fu
2021-09-11 3:41 ` Tuấn-Anh Nguyễn
@ 2021-09-11 5:51 ` Eli Zaretskii
2021-09-11 19:00 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-11 5:51 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 10 Sep 2021 12:57:22 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> emacs-devel@gnu.org
>
> > Sure. I guess we will have to have a database of module names for
> > each programming language somewhere?
>
> My plan is to translate lisp names to C names by default, and have an override list for irregular names that can’t be translated correctly.
What are "Lisp names" in this context? Are you saying that the name
of a programming language, derived from the major mode, can be used to
produce the name of the shared library programmatically? If so, how?
> Just realized another problem, how do we make sure the loaded library is GPL-compatible? There certainly won’t be “plugin_is_GPL_compatible” symbol in them… And IIUC Emacs cannot load GPL-incompatible dynamic libraries?
That's only needed for Emacs modules, not for external libraries that
provide some extra functionality on the level of primitives. For
those, we just make sure their license is compatible with GPL.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-11 4:11 ` Yuan Fu
@ 2021-09-11 7:23 ` Tuấn-Anh Nguyễn
2021-09-11 19:02 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-11 7:23 UTC (permalink / raw)
To: Yuan Fu
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
> I think you have it backwards. IIUC the reason why every Emacs dynamic module declares “plugin_is_GPL_compatible” is that every shared library that links with Emacs must be GPL compatible, and an Emacs dynamic module is a shared library. But that’s just my understanding, of course. I’m happy to be corrected.
That understanding is wrong. To help you understand better: every Emacs dynamic
module is a shared library, but the opposite is not true. If you are still
confused, read the relevant parts in `emacs-module.c`. On another note, shared
libraries in general don't "link" with Emacs. "Linking" has very specific and
precise technical meanings in this context. Please read up on that, starting
from "dynamic linking vs. dynamic loading."
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-11 5:51 ` Eli Zaretskii
@ 2021-09-11 19:00 ` Yuan Fu
2021-09-11 19:14 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-11 19:00 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, theo, cpitclaudel, emacs-devel,
monnier, stephen_leake
> On Sep 10, 2021, at 10:51 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 10 Sep 2021 12:57:22 -0700
>> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
>> Theodor Thornhill <theo@thornhill.no>,
>> Stephen Leake <stephen_leake@stephe-leake.org>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Stefan Monnier <monnier@iro.umontreal.ca>,
>> emacs-devel@gnu.org
>>
>>> Sure. I guess we will have to have a database of module names for
>>> each programming language somewhere?
>>
>> My plan is to translate lisp names to C names by default, and have an override list for irregular names that can’t be translated correctly.
>
> What are "Lisp names" in this context? Are you saying that the name
> of a programming language, derived from the major mode, can be used to
> produce the name of the shared library programmatically? If so, how?
I don’t think it’s a rule, but language definitions are conventionally named tree-sitter-<lang>. E.g. tree-sitter-c, tree-sitter-json, tree-sitter-c-sharp. And the symbol they expose are tree_sitter_<lang>, e.g., tree_sitter_c, tree_sitter_jon, tree_sitter_c_sharp. Currently we use a symbol tree-sitter-<lang> to represent a language, so we can translate the symbol tree-sitter-<lang> to tree-sitter-<lang>.so/dylib/dll to get the shared library name, and to tree_sitter_<lang> to get the C symbol name.
BTW, since dynamic libraries has different extensions on different systems, what I want to do it to try loading the library with .so, then try .dylib, then try .dll, is that a good idea?
>
>> Just realized another problem, how do we make sure the loaded library is GPL-compatible? There certainly won’t be “plugin_is_GPL_compatible” symbol in them… And IIUC Emacs cannot load GPL-incompatible dynamic libraries?
>
> That's only needed for Emacs modules, not for external libraries that
> provide some extra functionality on the level of primitives. For
> those, we just make sure their license is compatible with GPL.
Thanks, that’s all I need to know.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-11 7:23 ` Tuấn-Anh Nguyễn
@ 2021-09-11 19:02 ` Yuan Fu
0 siblings, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2021-09-11 19:02 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: Clément Pit-Claudel, Theodor Thornhill, emacs-devel,
Stefan Monnier, Eli Zaretskii, Stephen Leake
> On Sep 11, 2021, at 12:23 AM, Tuấn-Anh Nguyễn <ubolonton@gmail.com> wrote:
>
>> I think you have it backwards. IIUC the reason why every Emacs dynamic module declares “plugin_is_GPL_compatible” is that every shared library that links with Emacs must be GPL compatible, and an Emacs dynamic module is a shared library. But that’s just my understanding, of course. I’m happy to be corrected.
>
> That understanding is wrong. To help you understand better: every Emacs dynamic
> module is a shared library, but the opposite is not true. If you are still
> confused, read the relevant parts in `emacs-module.c`. On another note, shared
> libraries in general don't "link" with Emacs. "Linking" has very specific and
> precise technical meanings in this context. Please read up on that, starting
> from "dynamic linking vs. dynamic loading.”
I see, thanks for the explanation. Anyway, I’m glad there isn’t an issue.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-11 19:00 ` Yuan Fu
@ 2021-09-11 19:14 ` Eli Zaretskii
2021-09-11 19:17 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-11 19:14 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 11 Sep 2021 12:00:59 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> theo@thornhill.no,
> stephen_leake@stephe-leake.org,
> cpitclaudel@gmail.com,
> monnier@iro.umontreal.ca,
> emacs-devel@gnu.org
>
> >> My plan is to translate lisp names to C names by default, and have an override list for irregular names that can’t be translated correctly.
> >
> > What are "Lisp names" in this context? Are you saying that the name
> > of a programming language, derived from the major mode, can be used to
> > produce the name of the shared library programmatically? If so, how?
>
> I don’t think it’s a rule, but language definitions are conventionally named tree-sitter-<lang>. E.g. tree-sitter-c, tree-sitter-json, tree-sitter-c-sharp. And the symbol they expose are tree_sitter_<lang>, e.g., tree_sitter_c, tree_sitter_jon, tree_sitter_c_sharp. Currently we use a symbol tree-sitter-<lang> to represent a language, so we can translate the symbol tree-sitter-<lang> to tree-sitter-<lang>.so/dylib/dll to get the shared library name, and to tree_sitter_<lang> to get the C symbol name.
But the <lang> part is still needed to be concocted somehow. E.g.,
the conversion from "C#" to "c-sharp" isn't trivial.
> BTW, since dynamic libraries has different extensions on different systems, what I want to do it to try loading the library with .so, then try .dylib, then try .dll, is that a good idea?
We can do better, see load-suffixes.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-11 19:14 ` Eli Zaretskii
@ 2021-09-11 19:17 ` Eli Zaretskii
2021-09-11 20:29 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-11 19:17 UTC (permalink / raw)
To: casouri; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> Date: Sat, 11 Sep 2021 22:14:26 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: ubolonton@gmail.com, theo@thornhill.no, cpitclaudel@gmail.com,
> emacs-devel@gnu.org, monnier@iro.umontreal.ca, stephen_leake@stephe-leake.org
>
> > BTW, since dynamic libraries has different extensions on different systems, what I want to do it to try loading the library with .so, then try .dylib, then try .dll, is that a good idea?
>
> We can do better, see load-suffixes.
And in C, you can use MODULES_SUFFIX directly. Though we will
probably need some minor changes there, to have the suffix defined
even in a build --without-modules.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-11 19:17 ` Eli Zaretskii
@ 2021-09-11 20:29 ` Yuan Fu
2021-09-12 5:39 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-11 20:29 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
stephen_leake
> On Sep 11, 2021, at 12:14 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
> But the <lang> part is still needed to be concocted somehow. E.g.,
> the conversion from "C#" to "c-sharp" isn't trivial.
>
The project name of tree-sitter’s C# definition is “tree-sitter-c-sharp”[1]. So if someone wants to use the C# language, they probably know what symbol represents it (we will explain the translation rule in doc-string and the manual). I also want to point out that we don’t come up with the symbols representing each language, the _user_ passes 'tree-sitter-parser-create' a symbol representing a language, and we translate that symbol to dynamic library name and C symbol name.
[1]: https://github.com/tree-sitter/tree-sitter-c-sharp
> On Sep 11, 2021, at 12:17 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> Date: Sat, 11 Sep 2021 22:14:26 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: ubolonton@gmail.com, theo@thornhill.no, cpitclaudel@gmail.com,
>> emacs-devel@gnu.org, monnier@iro.umontreal.ca, stephen_leake@stephe-leake.org
>>
>>> BTW, since dynamic libraries has different extensions on different systems, what I want to do it to try loading the library with .so, then try .dylib, then try .dll, is that a good idea?
>>
>> We can do better, see load-suffixes.
>
> And in C, you can use MODULES_SUFFIX directly. Though we will
> probably need some minor changes there, to have the suffix defined
> even in a build --without-modules.
I’m using tree-sitter-load-suffixes with default value ‘(“.so”, “.dylib”, “.dll”). Should I populate this variable with MODULES_SUFFIX and MODULES_SECONDARY_SUFFIX, or should I just use the two SUFFIX in C? I.e., do you see a need for users to customize suffixes?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-11 20:29 ` Yuan Fu
@ 2021-09-12 5:39 ` Eli Zaretskii
2021-09-13 4:15 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-12 5:39 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 11 Sep 2021 13:29:09 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> stephen_leake@stephe-leake.org
>
> > But the <lang> part is still needed to be concocted somehow. E.g.,
> > the conversion from "C#" to "c-sharp" isn't trivial.
>
> The project name of tree-sitter’s C# definition is “tree-sitter-c-sharp”[1]. So if someone wants to use the C# language, they probably know what symbol represents it (we will explain the translation rule in doc-string and the manual). I also want to point out that we don’t come up with the symbols representing each language, the _user_ passes 'tree-sitter-parser-create' a symbol representing a language, and we translate that symbol to dynamic library name and C symbol name.
Surely, you don't mean "user" as in "the person who edits a source
file"? I presume you mean the Lisp program, not the human user. That
Lisp program is the major mode which wants to use TS services, and the
only thing that it has in hand is its own symbol, like 'c-mode' or
'python-mode' or 'f90-mode'. It needs a way to pass the corresponding
TS module name to TS, and my question is: how would the major mode
compute the correct module name? We need either a mode-specific
variable with that name, or some global function that could be used by
any major mode to obtain the language module name.
> >>> BTW, since dynamic libraries has different extensions on different systems, what I want to do it to try loading the library with .so, then try .dylib, then try .dll, is that a good idea?
> >>
> >> We can do better, see load-suffixes.
> >
> > And in C, you can use MODULES_SUFFIX directly. Though we will
> > probably need some minor changes there, to have the suffix defined
> > even in a build --without-modules.
>
> I’m using tree-sitter-load-suffixes with default value ‘(“.so”, “.dylib”, “.dll”). Should I populate this variable with MODULES_SUFFIX and MODULES_SECONDARY_SUFFIX, or should I just use the two SUFFIX in C? I.e., do you see a need for users to customize suffixes?
I'd prefer a general variable shared-library-suffix(es), either a
single value specific to the target system or an alist with keys being
system names (from system-type). Then we could use that in
load-suffixes (instead of MODULES_SUFFIX) and everywhere else.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-12 5:39 ` Eli Zaretskii
@ 2021-09-13 4:15 ` Yuan Fu
2021-09-13 11:47 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-13 4:15 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
stephen_leake
> On Sep 11, 2021, at 10:39 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sat, 11 Sep 2021 13:29:09 -0700
>> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
>> Theodor Thornhill <theo@thornhill.no>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Emacs developers <emacs-devel@gnu.org>,
>> Stefan Monnier <monnier@iro.umontreal.ca>,
>> stephen_leake@stephe-leake.org
>>
>>> But the <lang> part is still needed to be concocted somehow. E.g.,
>>> the conversion from "C#" to "c-sharp" isn't trivial.
>>
>> The project name of tree-sitter’s C# definition is “tree-sitter-c-sharp”[1]. So if someone wants to use the C# language, they probably know what symbol represents it (we will explain the translation rule in doc-string and the manual). I also want to point out that we don’t come up with the symbols representing each language, the _user_ passes 'tree-sitter-parser-create' a symbol representing a language, and we translate that symbol to dynamic library name and C symbol name.
>
> Surely, you don't mean "user" as in "the person who edits a source
> file"? I presume you mean the Lisp program, not the human user. That
> Lisp program is the major mode which wants to use TS services, and the
> only thing that it has in hand is its own symbol, like 'c-mode' or
> 'python-mode' or 'f90-mode'. It needs a way to pass the corresponding
> TS module name to TS, and my question is: how would the major mode
> compute the correct module name? We need either a mode-specific
> variable with that name, or some global function that could be used by
> any major mode to obtain the language module name.
Not the end-user, no. But not really “Lisp Program”, either. I mean the human being writing the major-mode and adapting the major-mode to utilize tree-sitter features. The major mode writer should be able to figure out the correct symbol to use, if she go checks out the project name for the language definition, or the package name of the language definition in her package manager, or by some other means. For example, one should be able to figure out that tree-sitter-c is the symbol for C language definition, and tree-sitter-c-sharp that C#. Then Emacs automatically translate tree-sitter-c to libtree-sitter-c.so, and tree-sitter-c-sharp to libtree-sitter-c-sharp.so; basically adding “lib” and “.so” (or “dylib” etc). If that doesn’t give the correct library name for a quirky language, the major-mode writer can add an entry to tree-sitter-library-name-override-list—(tree-sitter-quirky-lang “libtree-sitter-qlang” “tree_sitter_qlang”)—and Emacs will use that. (Or she can just use tree-sitter-qlang as the symbol, and Emacs’ auto translation would just fine.)
>
>>>>> BTW, since dynamic libraries has different extensions on different systems, what I want to do it to try loading the library with .so, then try .dylib, then try .dll, is that a good idea?
>>>>
>>>> We can do better, see load-suffixes.
>>>
>>> And in C, you can use MODULES_SUFFIX directly. Though we will
>>> probably need some minor changes there, to have the suffix defined
>>> even in a build --without-modules.
>>
>> I’m using tree-sitter-load-suffixes with default value ‘(“.so”, “.dylib”, “.dll”). Should I populate this variable with MODULES_SUFFIX and MODULES_SECONDARY_SUFFIX, or should I just use the two SUFFIX in C? I.e., do you see a need for users to customize suffixes?
>
> I'd prefer a general variable shared-library-suffix(es), either a
> single value specific to the target system or an alist with keys being
> system names (from system-type). Then we could use that in
> load-suffixes (instead of MODULES_SUFFIX) and everywhere else.
To summarize, we have
"load-suffixes” (".elc" ".el”, with M_SUFFIX & M_SEC_SUFFIX if modules enabled),
"module-file-suffix” (M_SUFFIX if modules enabled),
"load-file-rep-suffixes” ("" ".gz").
All contribute to the possible file names Emacs tries when loading a file (be it a Elisp file or an Emacs module). I will add a "shared-library-suffix” specifically for loading dynamic libraries, its value will be MODULES_SUFFIX regardless if module is enabled.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-13 4:15 ` Yuan Fu
@ 2021-09-13 11:47 ` Eli Zaretskii
2021-09-13 18:01 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-13 11:47 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 12 Sep 2021 21:15:31 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> stephen_leake@stephe-leake.org
>
> Not the end-user, no. But not really “Lisp Program”, either. I mean the human being writing the major-mode and adapting the major-mode to utilize tree-sitter features. The major mode writer should be able to figure out the correct symbol to use, if she go checks out the project name for the language definition, or the package name of the language definition in her package manager, or by some other means. For example, one should be able to figure out that tree-sitter-c is the symbol for C language definition, and tree-sitter-c-sharp that C#. Then Emacs automatically translate tree-sitter-c to libtree-sitter-c.so, and tree-sitter-c-sharp to libtree-sitter-c-sharp.so; basically adding “lib” and “.so” (or “dylib” etc). If that doesn’t give the correct library name for a qui
rky language, the major-mode writer can add an entry to tree-sitter-library-name-override-list—(tree-sitter-quirky-lang “libtree-sitter-qlang” “tree_sitter_qlang”)—and Emacs will use that. (Or she can just use tree-sitter-qlang as the symbol, and Emacs’ auto translation would just fine.)
It makes little sense to me to request each major mode to figure this
out. It should IMO be a service provided by the TS integration into
Emacs.
> To summarize, we have
>
> "load-suffixes” (".elc" ".el”, with M_SUFFIX & M_SEC_SUFFIX if modules enabled),
> "module-file-suffix” (M_SUFFIX if modules enabled),
> "load-file-rep-suffixes” ("" ".gz").
>
> All contribute to the possible file names Emacs tries when loading a file (be it a Elisp file or an Emacs module). I will add a "shared-library-suffix” specifically for loading dynamic libraries, its value will be MODULES_SUFFIX regardless if module is enabled.
Maybe the other way around: define a shared-library-suffix, and make
MODULES_SUFFIX use that if Emacs is built with modules.
Otherwise, SGTM, thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-13 11:47 ` Eli Zaretskii
@ 2021-09-13 18:01 ` Yuan Fu
2021-09-13 18:07 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-13 18:01 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
stephen_leake
> On Sep 13, 2021, at 4:47 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sun, 12 Sep 2021 21:15:31 -0700
>> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
>> Theodor Thornhill <theo@thornhill.no>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Emacs developers <emacs-devel@gnu.org>,
>> Stefan Monnier <monnier@iro.umontreal.ca>,
>> stephen_leake@stephe-leake.org
>>
>> Not the end-user, no. But not really “Lisp Program”, either. I mean the human being writing the major-mode and adapting the major-mode to utilize tree-sitter features. The major mode writer should be able to figure out the correct symbol to use, if she go checks out the project name for the language definition, or the package name of the language definition in her package manager, or by some other means. For example, one should be able to figure out that tree-sitter-c is the symbol for C language definition, and tree-sitter-c-sharp that C#. Then Emacs automatically translate tree-sitter-c to libtree-sitter-c.so, and tree-sitter-c-sharp to libtree-sitter-c-sharp.so; basically adding “lib” and “.so” (or “dylib” etc). If that doesn’t give the correct library name for a quirky language, the major-mode writer can add an entry to tree-sitter-library-name-override-list—(tree-sitter-quirky-lang “libtree-sitter-qlang” “tree_sitter_qlang”)—and Emacs will use that. (Or she can just use tree-sitter-qlang as the symbol, and Emacs’ auto translation would just fine.)
>
> It makes little sense to me to request each major mode to figure this
> out. It should IMO be a service provided by the TS integration into
> Emacs.
This is IMO the easiest and least confusing way for major-mode authors. But before we continue, what is the way you envisioned? I’m not sure what exactly is the service you want Emacs to provide.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-13 18:01 ` Yuan Fu
@ 2021-09-13 18:07 ` Eli Zaretskii
2021-09-13 18:29 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-13 18:07 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 13 Sep 2021 11:01:47 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> stephen_leake@stephe-leake.org
>
> > It makes little sense to me to request each major mode to figure this
> > out. It should IMO be a service provided by the TS integration into
> > Emacs.
>
> This is IMO the easiest and least confusing way for major-mode authors. But before we continue, what is the way you envisioned? I’m not sure what exactly is the service you want Emacs to provide.
What I had in mind is a function that give the major-mode symbol will
return the name of the corresponding TS language module (or a list of
modules, if there's more than one).
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-13 18:07 ` Eli Zaretskii
@ 2021-09-13 18:29 ` Yuan Fu
2021-09-13 18:37 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-13 18:29 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
stephen_leake
> On Sep 13, 2021, at 11:07 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 13 Sep 2021 11:01:47 -0700
>> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
>> Theodor Thornhill <theo@thornhill.no>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Emacs developers <emacs-devel@gnu.org>,
>> Stefan Monnier <monnier@iro.umontreal.ca>,
>> stephen_leake@stephe-leake.org
>>
>>> It makes little sense to me to request each major mode to figure this
>>> out. It should IMO be a service provided by the TS integration into
>>> Emacs.
>>
>> This is IMO the easiest and least confusing way for major-mode authors. But before we continue, what is the way you envisioned? I’m not sure what exactly is the service you want Emacs to provide.
>
> What I had in mind is a function that give the major-mode symbol will
> return the name of the corresponding TS language module (or a list of
> modules, if there's more than one).
My problem with such a function is that Emacs can’t possibly cover all the major-modes and tree-sitter languages. What if there is a new language, and someone wrote a tree-sitter language definition for it, and then want to write an Emacs major mode using tree-sitter features?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-13 18:29 ` Yuan Fu
@ 2021-09-13 18:37 ` Eli Zaretskii
2021-09-14 0:13 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-13 18:37 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 13 Sep 2021 11:29:01 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> stephen_leake@stephe-leake.org
>
> > What I had in mind is a function that give the major-mode symbol will
> > return the name of the corresponding TS language module (or a list of
> > modules, if there's more than one).
>
> My problem with such a function is that Emacs can’t possibly cover all the major-modes and tree-sitter languages. What if there is a new language, and someone wrote a tree-sitter language definition for it, and then want to write an Emacs major mode using tree-sitter features?
A new major mode will extend the function to support its language(s).
the extension could be as simple as adding something to a database of
known mode-to-language associations in some alist.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-13 18:37 ` Eli Zaretskii
@ 2021-09-14 0:13 ` Yuan Fu
2021-09-14 2:29 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-14 0:13 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
stephen_leake
>>
>>> What I had in mind is a function that give the major-mode symbol will
>>> return the name of the corresponding TS language module (or a list of
>>> modules, if there's more than one).
>>
>> My problem with such a function is that Emacs can’t possibly cover all the major-modes and tree-sitter languages. What if there is a new language, and someone wrote a tree-sitter language definition for it, and then want to write an Emacs major mode using tree-sitter features?
>
> A new major mode will extend the function to support its language(s).
> the extension could be as simple as adding something to a database of
> known mode-to-language associations in some alist.
>
Just to recap, we were talking about how to represent a tree-sitter language in Emacs and how to figure out the dynamic library name for that language. My plan is to use tree-sitter-<lang> to represent a language, which is usually the project name for that language definition. And we just turn it into libtree-sitter-<lang>.so/dylib/dll to get the name of the dynamic library. I think your idea has evolved into another thing—translating major-mode to the tree-sitter languages it uses could be useful, but how does it help with the original topic (representing language, translate to library name)?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-14 0:13 ` Yuan Fu
@ 2021-09-14 2:29 ` Eli Zaretskii
2021-09-14 4:27 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-14 2:29 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 13 Sep 2021 17:13:40 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> stephen_leake@stephe-leake.org
>
> > A new major mode will extend the function to support its language(s).
> > the extension could be as simple as adding something to a database of
> > known mode-to-language associations in some alist.
> >
>
> Just to recap, we were talking about how to represent a tree-sitter language in Emacs and how to figure out the dynamic library name for that language. My plan is to use tree-sitter-<lang> to represent a language, which is usually the project name for that language definition. And we just turn it into libtree-sitter-<lang>.so/dylib/dll to get the name of the dynamic library. I think your idea has evolved into another thing—translating major-mode to the tree-sitter languages it uses could be useful, but how does it help with the original topic (representing language, translate to library name)?
I guess I don't see a problem there? What is the problem?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-14 2:29 ` Eli Zaretskii
@ 2021-09-14 4:27 ` Yuan Fu
2021-09-14 11:29 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-14 4:27 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
stephen_leake
>>
>> Just to recap, we were talking about how to represent a tree-sitter language in Emacs and how to figure out the dynamic library name for that language. My plan is to use tree-sitter-<lang> to represent a language, which is usually the project name for that language definition. And we just turn it into libtree-sitter-<lang>.so/dylib/dll to get the name of the dynamic library. I think your idea has evolved into another thing—translating major-mode to the tree-sitter languages it uses could be useful, but how does it help with the original topic (representing language, translate to library name)?
>
> I guess I don't see a problem there? What is the problem?
I thought you proposed the major mode thing to replace the naming scheme, because we were talking about naming languages and translating language names to library names when you proposed it. So you agree to to the initial plan to translate tree-sitter-<lang> to libtree-sitter-<lang>.so/etc, and to use an override alist for irregular names?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-14 4:27 ` Yuan Fu
@ 2021-09-14 11:29 ` Eli Zaretskii
2021-09-15 0:50 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-14 11:29 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 13 Sep 2021 21:27:00 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> stephen_leake@stephe-leake.org
>
> > I guess I don't see a problem there? What is the problem?
>
> I thought you proposed the major mode thing to replace the naming scheme, because we were talking about naming languages and translating language names to library names when you proposed it. So you agree to to the initial plan to translate tree-sitter-<lang> to libtree-sitter-<lang>.so/etc, and to use an override alist for irregular names?
Almost: there's the (minor) problem of obtaining the "<lang>" part by
the major-mode. I think it would be good to have a utility function
to do that so that major modes won't need to reinvent the wheel, do
the research, etc.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-14 11:29 ` Eli Zaretskii
@ 2021-09-15 0:50 ` Yuan Fu
2021-09-15 6:15 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-15 0:50 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
stephen_leake
>>
>> I thought you proposed the major mode thing to replace the naming scheme, because we were talking about naming languages and translating language names to library names when you proposed it. So you agree to to the initial plan to translate tree-sitter-<lang> to libtree-sitter-<lang>.so/etc, and to use an override alist for irregular names?
>
> Almost: there's the (minor) problem of obtaining the "<lang>" part by
> the major-mode. I think it would be good to have a utility function
> to do that so that major modes won't need to reinvent the wheel, do
> the research, etc.
That’s where I don’t understand: the major mode is written by major mode writers, who certainly know the correct “<lang>” name: they need to read the source of the language definition to use language’s tree-sitter features. You seem to agree on that because you said that this function can be extended by major mode writers.
But if you mean an ordinary end user need to know the correct <lang> name, then it makes more sense. In that case, it is not Emacs’ but major mode writers’ responsibility to teach that function how to map a major mode to one or many tree-sitter langauge names. Is that what you meant?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-15 0:50 ` Yuan Fu
@ 2021-09-15 6:15 ` Eli Zaretskii
2021-09-15 15:56 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-15 6:15 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Tue, 14 Sep 2021 17:50:48 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> stephen_leake@stephe-leake.org
>
> > Almost: there's the (minor) problem of obtaining the "<lang>" part by
> > the major-mode. I think it would be good to have a utility function
> > to do that so that major modes won't need to reinvent the wheel, do
> > the research, etc.
>
> That’s where I don’t understand: the major mode is written by major mode writers, who certainly know the correct “<lang>” name: they need to read the source of the language definition to use language’s tree-sitter features. You seem to agree on that because you said that this function can be extended by major mode writers.
I don't understand what you are saying here. Why would major mode
programmers need to know the correct <lang> name? The TS facilities
we will have in Emacs will be language-agnostic, right? For example,
to correctly indent a line of code, the major mode will call some
hypothetical tree-sitter-get-indentation function, and that function
will work in any major mode, provided that the major mode told TS to
load the support for the programming language of the buffer. Right?
So when the major mode initializes for working with TS, it should tell
TS which language to load, and why would we request the major mode
programmer to know the correct <lang> name which corresponds to the
major mode's programming language? Why would they need to "read the
source of the language definition to use language’s tree-sitter
features"? The specifics of the TS implementation of, say,
indentation calculations won't be exposed on the level of the
indentation facilities provided by TS integration in Emacs, right?
There's some misunderstanding here, and I cannot for the life of me
figure out where is it.
> But if you mean an ordinary end user need to know the correct <lang> name
No, that's not the issue I'm talking about.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-15 6:15 ` Eli Zaretskii
@ 2021-09-15 15:56 ` Yuan Fu
2021-09-15 16:02 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-15 15:56 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
stephen_leake
> On Sep 14, 2021, at 11:15 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Tue, 14 Sep 2021 17:50:48 -0700
>> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
>> Theodor Thornhill <theo@thornhill.no>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Emacs developers <emacs-devel@gnu.org>,
>> Stefan Monnier <monnier@iro.umontreal.ca>,
>> stephen_leake@stephe-leake.org
>>
>>> Almost: there's the (minor) problem of obtaining the "<lang>" part by
>>> the major-mode. I think it would be good to have a utility function
>>> to do that so that major modes won't need to reinvent the wheel, do
>>> the research, etc.
>>
>> That’s where I don’t understand: the major mode is written by major mode writers, who certainly know the correct “<lang>” name: they need to read the source of the language definition to use language’s tree-sitter features. You seem to agree on that because you said that this function can be extended by major mode writers.
>
> I don't understand what you are saying here. Why would major mode
> programmers need to know the correct <lang> name? The TS facilities
> we will have in Emacs will be language-agnostic, right? For example,
> to correctly indent a line of code, the major mode will call some
> hypothetical tree-sitter-get-indentation function, and that function
> will work in any major mode, provided that the major mode told TS to
> load the support for the programming language of the buffer. Right?
Now I see why there is confusion. Tree-sitter only provide a “primitive” feature: the concert syntax tree, and it is not language-agnostic. You don’t get indentation for free, unfortunately. Indenting the program by the information from the syntax tree is our problem. Tree-sitter doesn’t have anything like tree-sitter-get-indentation function, and there is no mechanical way to provide one, a human needs to read the source of the tree-sitter language definition and figure out how to do it. See below.
> So when the major mode initializes for working with TS, it should tell
> TS which language to load, and why would we request the major mode
> programmer to know the correct <lang> name which corresponds to the
> major mode's programming language? Why would they need to "read the
> source of the language definition to use language’s tree-sitter
> features"? The specifics of the TS implementation of, say,
> indentation calculations won't be exposed on the level of the
> indentation facilities provided by TS integration in Emacs, right?
Tree-sitter has no indentation calculation feature. Major mode writers genuinely need to read the source of the tree-sitter language definition. The source tells us what will be in the syntax tree parsed by tree-sitter, and the node names differ from one language to another. For example, if I want to fontify type identifiers in C with font-lock-type-face, I need to know how is type represented in the syntax tree. I look up the source[1], and find
_type_specifier: $ => choice(
$.struct_specifier,
$.union_specifier,
$.enum_specifier,
$.macro_type_specifier,
$.sized_type_specifier,
$.primitive_type,
$._type_identifier
),
This roughly translates to
_type_specifier := <struct_specifier>
| <union_specifier>
| <enum_specifier>
| <macro_type_specifier>
| <sized_type_specifier>
| <primitive_type>
| <_type_identifier>
in BNF
From this (and some other hint) I know I need to grab all the _type_specifier nodes in the syntax tree, find their corresponding text in the buffer, and apply font-lock-type-face. And type identifiers in another language will be named differently, tree-sitter doesn’t provide an abstraction for semantic names in the syntax tree.
>
> There's some misunderstanding here, and I cannot for the life of me
> figure out where is it.
I was very confused, too, for the past several days, but I think we know the source of it now.
[1] The source of tree-sitter-c is at https://github.com/tree-sitter/tree-sitter-c/blob/master/grammar.js
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-15 15:56 ` Yuan Fu
@ 2021-09-15 16:02 ` Eli Zaretskii
2021-09-15 18:19 ` Stefan Monnier
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-15 16:02 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 15 Sep 2021 08:56:18 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> stephen_leake@stephe-leake.org
>
> > I don't understand what you are saying here. Why would major mode
> > programmers need to know the correct <lang> name? The TS facilities
> > we will have in Emacs will be language-agnostic, right? For example,
> > to correctly indent a line of code, the major mode will call some
> > hypothetical tree-sitter-get-indentation function, and that function
> > will work in any major mode, provided that the major mode told TS to
> > load the support for the programming language of the buffer. Right?
>
> Now I see why there is confusion. Tree-sitter only provide a “primitive” feature: the concert syntax tree, and it is not language-agnostic. You don’t get indentation for free, unfortunately.
I wasn't talking about tree-sitter itself, I was talking about the
facilities Emacs will provide based on TS. There will be in Emacs a
function to calculate indentation using TS, right? And that function
will be language-agnostic, like indent-line-function is, right?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-15 16:02 ` Eli Zaretskii
@ 2021-09-15 18:19 ` Stefan Monnier
2021-09-15 18:48 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-09-15 18:19 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, ubolonton, theo, cpitclaudel, emacs-devel, stephen_leake
> I wasn't talking about tree-sitter itself, I was talking about the
> facilities Emacs will provide based on TS. There will be in Emacs a
> function to calculate indentation using TS, right? And that function
> will be language-agnostic, like indent-line-function is, right?
There is such a function but it doesn't do anything itself. It relies
on the major-mode to do the heavy lifting which consists in giving
indentation rules for each one of the possible node types that can
appear in the AST (and as Yuan explained, those types are different for
every language).
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-15 18:19 ` Stefan Monnier
@ 2021-09-15 18:48 ` Eli Zaretskii
2021-09-16 21:46 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-15 18:48 UTC (permalink / raw)
To: Stefan Monnier
Cc: casouri, theo, ubolonton, emacs-devel, cpitclaudel, stephen_leake
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Yuan Fu <casouri@gmail.com>, ubolonton@gmail.com, theo@thornhill.no,
> cpitclaudel@gmail.com, emacs-devel@gnu.org,
> stephen_leake@stephe-leake.org
> Date: Wed, 15 Sep 2021 14:19:12 -0400
>
> > I wasn't talking about tree-sitter itself, I was talking about the
> > facilities Emacs will provide based on TS. There will be in Emacs a
> > function to calculate indentation using TS, right? And that function
> > will be language-agnostic, like indent-line-function is, right?
>
> There is such a function but it doesn't do anything itself. It relies
> on the major-mode to do the heavy lifting which consists in giving
> indentation rules for each one of the possible node types that can
> appear in the AST
Sure, and the new one will do that with help of TS. But the principle
is the same.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-15 18:48 ` Eli Zaretskii
@ 2021-09-16 21:46 ` Yuan Fu
2021-09-17 6:06 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-16 21:46 UTC (permalink / raw)
To: Eli Zaretskii
Cc: ubolonton, theo, cpitclaudel, emacs-devel, Stefan Monnier,
stephen_leake
> On Sep 15, 2021, at 11:48 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: Yuan Fu <casouri@gmail.com>, ubolonton@gmail.com, theo@thornhill.no,
>> cpitclaudel@gmail.com, emacs-devel@gnu.org,
>> stephen_leake@stephe-leake.org
>> Date: Wed, 15 Sep 2021 14:19:12 -0400
>>
>>> I wasn't talking about tree-sitter itself, I was talking about the
>>> facilities Emacs will provide based on TS. There will be in Emacs a
>>> function to calculate indentation using TS, right? And that function
>>> will be language-agnostic, like indent-line-function is, right?
>>
>> There is such a function but it doesn't do anything itself. It relies
>> on the major-mode to do the heavy lifting which consists in giving
>> indentation rules for each one of the possible node types that can
>> appear in the AST
>
> Sure, and the new one will do that with help of TS. But the principle
> is the same.
My point is, major mode writers need to read the source of the tree-sitter language definition to do anything useful with tree-sitter, therefore they must know the correct <lang> name.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-16 21:46 ` Yuan Fu
@ 2021-09-17 6:06 ` Eli Zaretskii
2021-09-17 6:56 ` Yuan Fu
2021-09-17 12:23 ` Stefan Monnier
0 siblings, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-17 6:06 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 16 Sep 2021 14:46:08 -0700
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> ubolonton@gmail.com,
> theo@thornhill.no,
> cpitclaudel@gmail.com,
> emacs-devel@gnu.org,
> stephen_leake@stephe-leake.org
>
> >>> I wasn't talking about tree-sitter itself, I was talking about the
> >>> facilities Emacs will provide based on TS. There will be in Emacs a
> >>> function to calculate indentation using TS, right? And that function
> >>> will be language-agnostic, like indent-line-function is, right?
> >>
> >> There is such a function but it doesn't do anything itself. It relies
> >> on the major-mode to do the heavy lifting which consists in giving
> >> indentation rules for each one of the possible node types that can
> >> appear in the AST
> >
> > Sure, and the new one will do that with help of TS. But the principle
> > is the same.
>
> My point is, major mode writers need to read the source of the tree-sitter language definition to do anything useful with tree-sitter
If this is so, then why do we bother documenting the Lisp APIs for
TS-related features? If Lisp programmers need to read the TS sources
to do anything useful in Emacs, let them read the sources, including
the Lisp and C sources you are working on?
That was somewhat sarcastic, but my point is that this is NOT how we
do this kind of stuff in Emacs. We should have Lisp-level facilities
that reflect the TS features, and those Lisp-level facilities should
be documented and should be the ONLY thing a Lisp programmer needs to
read to adapt his/her major mode to TS. We should NOT assume that
Lisp programmers read the TS source code, exactly like we don't assume
that for other libraries, like GnuTLS, librsvg, or libgccjit. Under
that modus operandi, the way to glean the <lang> part from the major
mode's language name is something that should be part of the
facilities we provide.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 6:06 ` Eli Zaretskii
@ 2021-09-17 6:56 ` Yuan Fu
2021-09-17 7:38 ` Eli Zaretskii
2021-09-17 12:11 ` Tuấn-Anh Nguyễn
2021-09-17 12:23 ` Stefan Monnier
1 sibling, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-09-17 6:56 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
stephen_leake
>>
>> My point is, major mode writers need to read the source of the tree-sitter language definition to do anything useful with tree-sitter
>
> If this is so, then why do we bother documenting the Lisp APIs for
> TS-related features? If Lisp programmers need to read the TS sources
> to do anything useful in Emacs, let them read the sources, including
> the Lisp and C sources you are working on?
>
> That was somewhat sarcastic, but my point is that this is NOT how we
> do this kind of stuff in Emacs. We should have Lisp-level facilities
> that reflect the TS features, and those Lisp-level facilities should
> be documented and should be the ONLY thing a Lisp programmer needs to
> read to adapt his/her major mode to TS. We should NOT assume that
> Lisp programmers read the TS source code, exactly like we don't assume
> that for other libraries, like GnuTLS, librsvg, or libgccjit. Under
> that modus operandi, the way to glean the <lang> part from the major
> mode's language name is something that should be part of the
> facilities we provide.
Thank you for your patience. I certainly believe in documentation and put considerable effort into it, and if it is possible to document as you described, I would do it. We have documentation for all the tree-sitter features provided by Emacs and a bit more, but I don’t think it is possible to document the language definitions. We can think of language definitions as BNF grammars for each language, how do you document that? Say, for the language definition for Scheme below, how do we document it?
<token> --> <identifier> | <boolean> | <number>
| <character> | <string>
| ( | ) | #( |
' | ` | , | ,@ | .
<delimiter> --> <whitespace> | ( | ) | " | ;
<whitespace> --> <space or newline>
<comment> --> ; <all subsequent characters up to a
line break>
...
<number> --> <num 2>| <num 8>
| <num 10>| <num 16>
…
The language definition source of a tree-sitter language is basically that, with some superfluous javascript syntax. Language definitions are not mechanic, but rather data—you can document mechanic but not really data.
And I want to also point out that as Emacs core developers, we can’t possibly provide a good translation from convention language names to their tree-sitter name (C# -> c-sharp). Maybe we can do a half-decent job, but 1) that won’t cover all available languages, and 2) if there is a new language, we need to wait for the next release to update our translation. It is better for the major mode writers to provide the information on how to translate names. Because, as I said earlier, they already know it.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 6:56 ` Yuan Fu
@ 2021-09-17 7:38 ` Eli Zaretskii
2021-09-17 20:30 ` Yuan Fu
2021-09-18 12:33 ` Stephen Leake
2021-09-17 12:11 ` Tuấn-Anh Nguyễn
1 sibling, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-17 7:38 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 16 Sep 2021 23:56:20 -0700
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> stephen_leake@stephe-leake.org
>
> We have documentation for all the tree-sitter features provided by Emacs and a bit more, but I don’t think it is possible to document the language definitions. We can think of language definitions as BNF grammars for each language, how do you document that?
Why do we need to document the language definitions? When a Lisp
programmer defines font-lock and indentation for a programming
language in the current Emacs, do they necessarily need to consult the
language grammar?
> Say, for the language definition for Scheme below, how do we document it?
>
> <token> --> <identifier> | <boolean> | <number>
> | <character> | <string>
> | ( | ) | #( |
> ' | ` | , | ,@ | .
> <delimiter> --> <whitespace> | ( | ) | " | ;
> <whitespace> --> <space or newline>
> <comment> --> ; <all subsequent characters up to a
> line break>
> ...
> <number> --> <num 2>| <num 8>
> | <num 10>| <num 16>
> …
This stuff should be known to TS; the Lisp programmer only needs to be
aware of the results of lexical and syntactical analysis, in terms of
their Lisp expressions (Lisp data structures with appropriate symbols
and fields).
> And I want to also point out that as Emacs core developers, we can’t possibly provide a good translation from convention language names to their tree-sitter name (C# -> c-sharp). Maybe we can do a half-decent job, but 1) that won’t cover all available languages, and 2) if there is a new language, we need to wait for the next release to update our translation. It is better for the major mode writers to provide the information on how to translate names.
The database used by the conversion should definitely be extensible.
But that doesn't mean it should be empty.
Anyway, we've spent enough time on this issue. If you are still
unconvinced, feel free to do it your way, and let the chips fall as
they may.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 6:56 ` Yuan Fu
2021-09-17 7:38 ` Eli Zaretskii
@ 2021-09-17 12:11 ` Tuấn-Anh Nguyễn
2021-09-17 13:14 ` Stefan Monnier
1 sibling, 1 reply; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-17 12:11 UTC (permalink / raw)
To: Yuan Fu
Cc: Clément Pit-Claudel, Theodor Thornhill, Emacs developers,
Stefan Monnier, Eli Zaretskii, Stephen Leake
> And I want to also point out that as Emacs core developers, we can’t possibly provide a good translation from convention language names to their tree-sitter name (C# -> c-sharp). Maybe we can do a half-decent job, but 1) that won’t cover all available languages, and 2) if there is a new language, we need to wait for the next release to update our translation. It is better for the major mode writers to provide the information on how to translate names. Because, as I said earlier, they already know it.
This makes sense. My suggestion is, in `tree-sitter.el`:
(defvar tree-sitter-major-mode-language-alist
'((c-mode . c)
;; And other major modes that Emacs includes, or are well-known.
(c++-mode . cpp)
(javascript-mode . javascript)
(python-mode . python))
"Alist that maps major modes to tree-sitter language names.")
For other major modes, it's up to the mode writers to add entries to that list,
similar to what they do with `auto-mode-alist`.
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 6:06 ` Eli Zaretskii
2021-09-17 6:56 ` Yuan Fu
@ 2021-09-17 12:23 ` Stefan Monnier
2021-09-17 13:03 ` Tuấn-Anh Nguyễn
1 sibling, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-09-17 12:23 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, ubolonton, theo, cpitclaudel, emacs-devel, stephen_leake
>> My point is, major mode writers need to read the source of the tree-sitter
>> language definition to do anything useful with tree-sitter
>
> If this is so, then why do we bother documenting the Lisp APIs for
> TS-related features? If Lisp programmers need to read the TS sources
> to do anything useful in Emacs, let them read the sources, including
> the Lisp and C sources you are working on?
The "source of the tree-sitter language definition" is the language's
grammar (in the format defined by TS). It's written in its own language
and is independent from the code of the TS runtime or the code of the
TS bindings in Emacs.
For a major mode to use TS, the major mode's code needs to know;
- The name of the language's grammar file.
This can usually/often be guessed from the language's name and is
a trivial piece of information.
- The names of the various nodes that will appear in the AST.
Those are specific to the particular grammar being used, and the best
place to find them is in the source code of the grammar, tho you can
also just find them by experimentation.
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 12:23 ` Stefan Monnier
@ 2021-09-17 13:03 ` Tuấn-Anh Nguyễn
0 siblings, 0 replies; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-17 13:03 UTC (permalink / raw)
To: Stefan Monnier
Cc: Yuan Fu, Theodor Thornhill, Clément Pit-Claudel, emacs-devel,
Eli Zaretskii, Stephen Leake
On Fri, Sep 17, 2021 at 7:23 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> For a major mode to use TS, the major mode's code needs to know;
> - The name of the language's grammar file.
> This can usually/often be guessed from the language's name and is
> a trivial piece of information.
> - The names of the various nodes that will appear in the AST.
> Those are specific to the particular grammar being used, and the best
> place to find them is in the source code of the grammar, tho you can
> also just find them by experimentation.
There are also field names, which make it easier to refer to specific child
nodes within a parent node. For example, within a `function_definition` node,
the `identifier` node that is the function name has the field name `name`.
Node names (types in tree-sitter's terms) and field names can both be extracted
from the shared library. We should provide functions for that, e.g.
`tree-sitter-node-types`, `tree-sitter-field-names`.
The interesting non-trivial parts of a definition would be the relationships
between the node types. For example, a `function_definition` can be inside a
`class_definition`, or a `module`, but not a `boolean_expression`. In principle,
these can be included in the shared library as well. AFAICT, the `tree-sitter`
authors would be open to adding that feature.
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 12:11 ` Tuấn-Anh Nguyễn
@ 2021-09-17 13:14 ` Stefan Monnier
2021-09-17 13:39 ` Tuấn-Anh Nguyễn
0 siblings, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-09-17 13:14 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: Yuan Fu, Eli Zaretskii, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stephen Leake
> (defvar tree-sitter-major-mode-language-alist
> '((c-mode . c)
> ;; And other major modes that Emacs includes, or are well-known.
> (c++-mode . cpp)
> (javascript-mode . javascript)
> (python-mode . python))
> "Alist that maps major modes to tree-sitter language names.")
Why not just `tree-sitter-language-name` which the major mode can set
buffer-locally.
(setq-local tree-sitter-language-name 'foo)
is better than
(add-to-list 'tree-sitter-major-mode-language-alist '(foo-mode . foo))
[ Among other things because it won't signal an error when
`tree-sitter.el` is not loaded. ]
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 13:14 ` Stefan Monnier
@ 2021-09-17 13:39 ` Tuấn-Anh Nguyễn
2021-09-17 17:18 ` Stefan Monnier
0 siblings, 1 reply; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-17 13:39 UTC (permalink / raw)
To: Stefan Monnier
Cc: Yuan Fu, Theodor Thornhill, Clément Pit-Claudel,
Emacs developers, Eli Zaretskii, Stephen Leake
On Fri, Sep 17, 2021 at 8:14 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> Why not just `tree-sitter-language-name` which the major mode can set
> buffer-locally.
>
> (setq-local tree-sitter-language-name 'foo)
>
> is better than
>
> (add-to-list 'tree-sitter-major-mode-language-alist '(foo-mode . foo))
It can be both, with one being the underlying mechanism for the other. (That's
what I do in my packages.) The alist is a central place that's more convenient
for users (not mode writers) to customize, e.g. in case a major mode hasn't
added itself yet:
(add-hook 'foo-mode-hook (lambda () (setq tree-sitter-language-name 'foo)))
is less convenient than
(add-to-list 'tree-sitter-major-mode-language-alist '(foo-mode . foo))
> [ Among other things because it won't signal an error when
> `tree-sitter.el` is not loaded. ]
We can make `tree-sitter-major-mode-language-list` autoloaded.
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 13:39 ` Tuấn-Anh Nguyễn
@ 2021-09-17 17:18 ` Stefan Monnier
2021-09-18 2:16 ` Tuấn-Anh Nguyễn
0 siblings, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-09-17 17:18 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: Yuan Fu, Eli Zaretskii, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stephen Leake
> It can be both, with one being the underlying mechanism for the other. (That's
> what I do in my packages.) The alist is a central place that's more convenient
> for users (not mode writers) to customize, e.g. in case a major mode hasn't
> added itself yet:
>
> (add-hook 'foo-mode-hook (lambda () (setq tree-sitter-language-name 'foo)))
>
> is less convenient than
>
> (add-to-list 'tree-sitter-major-mode-language-alist '(foo-mode . foo))
Since in most cases TS support will require a fair bit more than just
the language's name, I'm not sure it will be useful for very many users.
So I'd start with just `tree-sitter-language-name` (which should anyway
be the main/canonical way to provide the language's name).
An alist indexed by a major mode is a "design smell" in my book.
>> [ Among other things because it won't signal an error when
>> `tree-sitter.el` is not loaded. ]
> We can make `tree-sitter-major-mode-language-list` autoloaded.
Autoloaded variables are also a design smell ;-)
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 7:38 ` Eli Zaretskii
@ 2021-09-17 20:30 ` Yuan Fu
2021-09-18 2:22 ` Tuấn-Anh Nguyễn
2021-09-18 12:33 ` Stephen Leake
1 sibling, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-17 20:30 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
stephen_leake
>
> Why do we need to document the language definitions? When a Lisp
> programmer defines font-lock and indentation for a programming
> language in the current Emacs, do they necessarily need to consult the
> language grammar?
> […]
> This stuff should be known to TS; the Lisp programmer only needs to be
> aware of the results of lexical and syntactical analysis, in terms of
> their Lisp expressions (Lisp data structures with appropriate symbols
> and fields).
I demonstrate the reason why one needs to consult the source a few messages back:
> Tree-sitter has no indentation calculation feature. Major mode writers genuinely need to read the source of the tree-sitter language definition. The source tells us what will be in the syntax tree parsed by tree-sitter, and the node names differ from one language to another. For example, if I want to fontify type identifiers in C with font-lock-type-face, I need to know how is type represented in the syntax tree. I look up the source[1], and find
>
> _type_specifier: $ => choice(
> $.struct_specifier,
> $.union_specifier,
> $.enum_specifier,
> $.macro_type_specifier,
> $.sized_type_specifier,
> $.primitive_type,
> $._type_identifier
> ),
>
> This roughly translates to
>
> _type_specifier := <struct_specifier>
> | <union_specifier>
> | <enum_specifier>
> | <macro_type_specifier>
> | <sized_type_specifier>
> | <primitive_type>
> | <_type_identifier>
>
> in BNF
>
> From this (and some other hint) I know I need to grab all the _type_specifier nodes in the syntax tree, find their corresponding text in the buffer, and apply font-lock-type-face. And type identifiers in another language will be named differently, tree-sitter doesn’t provide an abstraction for semantic names in the syntax tree.
>> And I want to also point out that as Emacs core developers, we can’t possibly provide a good translation from convention language names to their tree-sitter name (C# -> c-sharp). Maybe we can do a half-decent job, but 1) that won’t cover all available languages, and 2) if there is a new language, we need to wait for the next release to update our translation. It is better for the major mode writers to provide the information on how to translate names.
>
> The database used by the conversion should definitely be extensible.
> But that doesn't mean it should be empty.
>
> Anyway, we've spent enough time on this issue. If you are still
> unconvinced, feel free to do it your way, and let the chips fall as
> they may.
I’ll do it the way I see fit. You can always comment in the final review (or something). Thanks.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 17:18 ` Stefan Monnier
@ 2021-09-18 2:16 ` Tuấn-Anh Nguyễn
0 siblings, 0 replies; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-18 2:16 UTC (permalink / raw)
To: Stefan Monnier
Cc: Yuan Fu, Theodor Thornhill, Clément Pit-Claudel,
Emacs developers, Eli Zaretskii, Stephen Leake
> > It can be both, with one being the underlying mechanism for the other. (That's
> > what I do in my packages.) The alist is a central place that's more convenient
> > for users (not mode writers) to customize, e.g. in case a major mode hasn't
> > added itself yet:
> >
> > (add-hook 'foo-mode-hook (lambda () (setq tree-sitter-language-name 'foo)))
> >
> > is less convenient than
> >
> > (add-to-list 'tree-sitter-major-mode-language-alist '(foo-mode . foo))
>
> Since in most cases TS support will require a fair bit more than just
> the language's name, I'm not sure it will be useful for very many users.
It has been useful (not "will be"). Of course it's more than just the language's
name.
> So I'd start with just `tree-sitter-language-name` (which should anyway
> be the main/canonical way to provide the language's name).
Yes, that's a good start.
> An alist indexed by a major mode is a "design smell" in my book.
>
> >> [ Among other things because it won't signal an error when
> >> `tree-sitter.el` is not loaded. ]
> > We can make `tree-sitter-major-mode-language-list` autoloaded.
>
> Autoloaded variables are also a design smell ;-)
These are non-arguments, but let's move on. This is among the most trivial ones
in the list of design decisions to make for tree-sitter integration. We don't
need to spend more time on it.
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 20:30 ` Yuan Fu
@ 2021-09-18 2:22 ` Tuấn-Anh Nguyễn
2021-09-18 6:38 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Tuấn-Anh Nguyễn @ 2021-09-18 2:22 UTC (permalink / raw)
To: Yuan Fu
Cc: Clément Pit-Claudel, Theodor Thornhill, Emacs developers,
Stefan Monnier, Eli Zaretskii, Stephen Leake
> > Tree-sitter has no indentation calculation feature. Major mode writers genuinely need to read the source of the tree-sitter language definition. The source tells us what will be in the syntax tree parsed by tree-sitter, and the node names differ from one language to another. For example, if I want to fontify type identifiers in C with font-lock-type-face, I need to know how is type represented in the syntax tree. I look up the source[1], and find
> >
> > _type_specifier: $ => choice(
> > $.struct_specifier,
> > $.union_specifier,
> > $.enum_specifier,
> > $.macro_type_specifier,
> > $.sized_type_specifier,
> > $.primitive_type,
> > $._type_identifier
> > ),
> >
> > This roughly translates to
> >
> > _type_specifier := <struct_specifier>
> > | <union_specifier>
> > | <enum_specifier>
> > | <macro_type_specifier>
> > | <sized_type_specifier>
> > | <primitive_type>
> > | <_type_identifier>
> >
> > in BNF
> >
> > From this (and some other hint) I know I need to grab all the _type_specifier nodes in the syntax tree, find their corresponding text in the buffer, and apply font-lock-type-face. And type identifiers in another language will be named differently, tree-sitter doesn’t provide an abstraction for semantic names in the syntax tree.
>
>
> >> And I want to also point out that as Emacs core developers, we can’t possibly provide a good translation from convention language names to their tree-sitter name (C# -> c-sharp). Maybe we can do a half-decent job, but 1) that won’t cover all available languages, and 2) if there is a new language, we need to wait for the next release to update our translation. It is better for the major mode writers to provide the information on how to translate names.
> >
> > The database used by the conversion should definitely be extensible.
> > But that doesn't mean it should be empty.
> >
> > Anyway, we've spent enough time on this issue. If you are still
> > unconvinced, feel free to do it your way, and let the chips fall as
> > they may.
>
> I’ll do it the way I see fit. You can always comment in the final review (or something). Thanks.
Your arguments were reasonable. Please continue the work. It's quite valuable.
There will be a lot more important details to discuss.
--
Tuấn-Anh Nguyễn
Software Engineer
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-18 2:22 ` Tuấn-Anh Nguyễn
@ 2021-09-18 6:38 ` Yuan Fu
0 siblings, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2021-09-18 6:38 UTC (permalink / raw)
To: Tuấn-Anh Nguyễn
Cc: Clément Pit-Claudel, Theodor Thornhill, Emacs developers,
Stefan Monnier, Eli Zaretskii, Stephen Leake
>
> Your arguments were reasonable. Please continue the work. It's quite valuable.
> There will be a lot more important details to discuss.
Thanks. I’ve pushed a change to the branch on GitHub, and now Emacs loads dynamic libraries directly. The name of the library to load can be controlled by tree-sitter-load-name-list. Please have a look and comment as you like. In the mean time, I’ll be working on tests and documentation.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-17 7:38 ` Eli Zaretskii
2021-09-17 20:30 ` Yuan Fu
@ 2021-09-18 12:33 ` Stephen Leake
2021-09-20 16:48 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Stephen Leake @ 2021-09-18 12:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, theo, ubolonton, emacs-devel, cpitclaudel, monnier
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Yuan Fu <casouri@gmail.com>
>>
>> We have documentation for all the tree-sitter features provided by
>> Emacs and a bit more, but I don’t think it is possible to document
>> the language definitions. We can think of language definitions as
>> BNF grammars for each language, how do you document that?
>
> Why do we need to document the language definitions? When a Lisp
> programmer defines font-lock and indentation for a programming
> language in the current Emacs, do they necessarily need to consult the
> language grammar?
Yes!
If you want to indent a statment in a language, you need to know the
syntax of that statement; you can't define indent for a "generic if
statement). Consider Ada and C:
Ada:
if <expression> then
<statement_list>
else
<statement_list>
end if;
C:
if (<expression>)
<block>
else
<block>
The language details matter.
--
-- Stephe
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-18 12:33 ` Stephen Leake
@ 2021-09-20 16:48 ` Yuan Fu
2021-09-20 18:48 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-20 16:48 UTC (permalink / raw)
To: Stephen Leake
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, emacs-devel, Stefan Monnier,
Eli Zaretskii
A minor question, is there a better CS term for “punctuation marks”?
Names like @code{root}, @code{expression}, @code{number},
@code{operator} are nodes' @dfn{type}. However, not all nodes in a
syntax tree have a type. Nodes that don't are @dfn{anonymous nodes},
and nodes with a type are @dfn{named nodes}. Anonymous nodes usually
represent punctuation marks (FIXME: better word than ``puncturation
marks''?) like quote @samp{"} and bracket @samp{[}, or tokens that
have a fixed representation, such as keywords like @code{return}.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-20 16:48 ` Yuan Fu
@ 2021-09-20 18:48 ` Eli Zaretskii
2021-09-20 19:09 ` John Yates
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-20 18:48 UTC (permalink / raw)
To: Yuan Fu; +Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 20 Sep 2021 09:48:27 -0700
> Cc: Eli Zaretskii <eliz@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> emacs-devel@gnu.org
>
> A minor question, is there a better CS term for “punctuation marks”?
Punctuation characters, I think.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-20 18:48 ` Eli Zaretskii
@ 2021-09-20 19:09 ` John Yates
2021-09-21 22:20 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: John Yates @ 2021-09-20 19:09 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, theo, ubolonton, Emacs developers, cpitclaudel,
Stefan Monnier, Stephen Leake
[-- Attachment #1: Type: text/plain, Size: 223 bytes --]
Maybe:
Anonymous nodes are usually tokens composed of
punctuation characters like quote @samp{"} and
auto-increment @samp{++}, or distinguished identifiers
with fixed spellings used as keywords, like @code{return}.
/john
[-- Attachment #2: Type: text/html, Size: 1454 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-20 19:09 ` John Yates
@ 2021-09-21 22:20 ` Yuan Fu
2021-09-27 4:42 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-21 22:20 UTC (permalink / raw)
To: John Yates
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
Eli Zaretskii, Stephen Leake
> On Sep 20, 2021, at 12:09 PM, John Yates <john@yates-sheets.org> wrote:
>
> Maybe:
>
> Anonymous nodes are usually tokens composed of
> punctuation characters like quote @samp{"} and
> auto-increment @samp{++}, or distinguished identifiers
> with fixed spellings used as keywords, like @code{return}.
>
> /John
Thanks, John and Eli. I modified it to
Names like @code{root}, @code{expression}, @code{number},
@code{operator} are nodes' @dfn{type}. However, not all nodes in a
syntax tree have a type. Nodes that don't are @dfn{anonymous nodes},
and nodes with a type are @dfn{named nodes}. Anonymous nodes are
tokens with fixed spellings, including punctuation characters like
bracket @samp{]}, and keywords like @code{return}.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-21 22:20 ` Yuan Fu
@ 2021-09-27 4:42 ` Yuan Fu
2021-09-27 5:37 ` Eli Zaretskii
2021-09-27 19:17 ` Stefan Monnier
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-09-27 4:42 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
Stephen Leake, John Yates
Currently, because font-lock.el uses functions and variables defined in tree-sitter.el, it needs to require tree-sitter.el. Should we require tree-sitter.el by default? Then what do we do when tree-sitter is not available on the system? Should I wrap every reference to tree-sitter in font-lock.el with (when (featurep ’tree-sitter))? Or is there better ways to deal with this?
Another approach is to define everything tree-sitter related in tree-sitter.el, and make tree-sitter.el require font-lock.el instead of the other way around. Would that be better?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-27 4:42 ` Yuan Fu
@ 2021-09-27 5:37 ` Eli Zaretskii
2021-09-27 19:17 ` Stefan Monnier
1 sibling, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-27 5:37 UTC (permalink / raw)
To: Yuan Fu
Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake,
john
> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 26 Sep 2021 21:42:36 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> John Yates <john@yates-sheets.org>
>
> Currently, because font-lock.el uses functions and variables defined in tree-sitter.el, it needs to require tree-sitter.el.
But only conditioned on some variable, right?
> Should we require tree-sitter.el by default?
No, but you could require it on the same condition that makes
font-lock use tree-sitter functions.
> Then what do we do when tree-sitter is not available on the system? Should I wrap every reference to tree-sitter in font-lock.el with (when (featurep ’tree-sitter))? Or is there better ways to deal with this?
Again, you probably already have a condition that wraps it, no?
> Another approach is to define everything tree-sitter related in tree-sitter.el, and make tree-sitter.el require font-lock.el instead of the other way around. Would that be better?
If it works, yes. But I'm not sure I understand the details well
enough for my answer to be correct.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-27 4:42 ` Yuan Fu
2021-09-27 5:37 ` Eli Zaretskii
@ 2021-09-27 19:17 ` Stefan Monnier
2021-09-28 5:33 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2021-09-27 19:17 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stephen Leake,
John Yates
> Currently, because font-lock.el uses functions and variables defined in
> tree-sitter.el,
Why? I don't see any reason why you'd need to change font-lock.el to
add support for tree-sitter fontification.
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-27 19:17 ` Stefan Monnier
@ 2021-09-28 5:33 ` Yuan Fu
2021-09-28 7:02 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-28 5:33 UTC (permalink / raw)
To: Stefan Monnier
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Eli Zaretskii,
Stephen Leake, John Yates
> On Sep 27, 2021, at 12:17 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
>> Currently, because font-lock.el uses functions and variables defined in
>> tree-sitter.el,
>
> Why? I don't see any reason why you'd need to change font-lock.el to
> add support for tree-sitter fontification.
>
[Also in reply to Eli:]
Before tree-sitter, font-lock roughly consists of two passes, the syntactic pass (that uses the syntax table) and regex pass (that uses regex matching). I added a three pass, tree-sitter pass, because I want to add tree-sitter fontification on top of existing mechanisms, not replacing it. This way we can still fontify keywords. Simply replacing font-lock with tree-sitter font-lock would cause anything relying on the existing fontification facility stop to work if I turn on tree-sitter. For example, I use a package that fontify keywords like “TODO” and “FIXME”, it would be a shame if it stops working as soon as I turn on tree-sitter fontification. So it seemed natural to me to augment font-lock.el instead of putting stuff in tree-sitter.el. Now that I realized the dependency problem, I think I can move all the font-lock integration code to tree-sitter.el and leave font-lock.el untouched, but still maintain the augmentation nature. E.g., define a tree-sitter-fontify-region-function that first calls font-lock-fontify-region-function, then does tree-sitter fontification. And user can turn on tree-sitter font-lock with, say tree-sitter-font-lock-mode.
With that said, I still have one thing not too sure. What should tree-sitter.el do if libtree-sitter is not on the system, and tree-sitter.c is not included in Emacs? Should we simply not include tree-sitter.el? Is there existing build facility that can do that (exclude tree-sitter.el when libtree-sitter is not found on system)?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-28 5:33 ` Yuan Fu
@ 2021-09-28 7:02 ` Eli Zaretskii
2021-09-28 16:10 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-28 7:02 UTC (permalink / raw)
To: Yuan Fu
Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake,
john
> From: Yuan Fu <casouri@gmail.com>
> Date: Mon, 27 Sep 2021 22:33:17 -0700
> Cc: Eli Zaretskii <eliz@gnu.org>,
> Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> John Yates <john@yates-sheets.org>
>
> With that said, I still have one thing not too sure. What should tree-sitter.el do if libtree-sitter is not on the system, and tree-sitter.c is not included in Emacs? Should we simply not include tree-sitter.el? Is there existing build facility that can do that (exclude tree-sitter.el when libtree-sitter is not found on system)?
I don't think I understand the problem: why would you need "not to
include" tree-sitter.el? We have quite a few *.el files that need
support from built-ins which could not be available at run time, and
yet we don't hesitate to include those *.el files. How is this case
different? I guess some details of what bothers you are missing.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-28 7:02 ` Eli Zaretskii
@ 2021-09-28 16:10 ` Yuan Fu
2021-09-28 16:28 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-09-28 16:10 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
Stephen Leake, john
> On Sep 28, 2021, at 12:02 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Mon, 27 Sep 2021 22:33:17 -0700
>> Cc: Eli Zaretskii <eliz@gnu.org>,
>> Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
>> Theodor Thornhill <theo@thornhill.no>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Emacs developers <emacs-devel@gnu.org>,
>> Stephen Leake <stephen_leake@stephe-leake.org>,
>> John Yates <john@yates-sheets.org>
>>
>> With that said, I still have one thing not too sure. What should tree-sitter.el do if libtree-sitter is not on the system, and tree-sitter.c is not included in Emacs? Should we simply not include tree-sitter.el? Is there existing build facility that can do that (exclude tree-sitter.el when libtree-sitter is not found on system)?
>
> I don't think I understand the problem: why would you need "not to
> include" tree-sitter.el? We have quite a few *.el files that need
> support from built-ins which could not be available at run time, and
> yet we don't hesitate to include those *.el files. How is this case
> different? I guess some details of what bothers you are missing.
Nothing in particular except the naive assumption that we won’t provide functions that don’t work. I didn’t know that we have quite a few *.el files that could potentially not work before. Do you have some examples? Anyway, I can provide a function tree-sitter-avaliable-p similar to native-compilation, that way a user knows if he can use tree-sitter features.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-28 16:10 ` Yuan Fu
@ 2021-09-28 16:28 ` Eli Zaretskii
2021-12-13 6:54 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-09-28 16:28 UTC (permalink / raw)
To: Yuan Fu
Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake,
john
> From: Yuan Fu <casouri@gmail.com>
> Date: Tue, 28 Sep 2021 09:10:32 -0700
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> john@yates-sheets.org
>
> > I don't think I understand the problem: why would you need "not to
> > include" tree-sitter.el? We have quite a few *.el files that need
> > support from built-ins which could not be available at run time, and
> > yet we don't hesitate to include those *.el files. How is this case
> > different? I guess some details of what bothers you are missing.
>
> Nothing in particular except the naive assumption that we won’t provide functions that don’t work. I didn’t know that we have quite a few *.el files that could potentially not work before. Do you have some examples?
Examples include native-compilation (comp.el), xwidgets (xwidget.el),
and threads (thread.el).
> Anyway, I can provide a function tree-sitter-avaliable-p similar to native-compilation, that way a user knows if he can use tree-sitter features.
Yes, that's generally what optional packages do.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-09-28 16:28 ` Eli Zaretskii
@ 2021-12-13 6:54 ` Yuan Fu
2021-12-13 12:56 ` Eli Zaretskii
2021-12-18 13:39 ` Daniel Martín
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2021-12-13 6:54 UTC (permalink / raw)
To: Eli Zaretskii
Cc: ubolonton, theo, cpitclaudel, emacs-devel, Stefan Monnier,
stephen_leake, john
It’s been a while and no one provided further comments on the indent and font-lock integration of tree-sitter, so I finished the manuals for indent and font-lock integration. They are under 24.6 Font Lock Mode and 24.7 Automatic Indentation of code. Once the author of tree-sitter allow tree-sitter to change malloc implementation at runtime, tree-sitter integration will be ready. (Though I suspect that won’t come soon. The author is still actively developing tree-sitter but he didn’t reply to my request.)
As before, the code is at https://github.com/casouri/emacs on ts branch.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-13 6:54 ` Yuan Fu
@ 2021-12-13 12:56 ` Eli Zaretskii
2021-12-14 7:19 ` Yuan Fu
2021-12-18 14:45 ` Philipp
2021-12-18 13:39 ` Daniel Martín
1 sibling, 2 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-12-13 12:56 UTC (permalink / raw)
To: Yuan Fu
Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake,
john
> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 12 Dec 2021 22:54:59 -0800
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> ubolonton@gmail.com,
> theo@thornhill.no,
> cpitclaudel@gmail.com,
> emacs-devel@gnu.org,
> stephen_leake@stephe-leake.org,
> john@yates-sheets.org
>
> It’s been a while and no one provided further comments on the indent and font-lock integration of tree-sitter, so I finished the manuals for indent and font-lock integration. They are under 24.6 Font Lock Mode and 24.7 Automatic Indentation of code. Once the author of tree-sitter allow tree-sitter to change malloc implementation at runtime, tree-sitter integration will be ready. (Though I suspect that won’t come soon. The author is still actively developing tree-sitter but he didn’t reply to my request.)
Would you please ping the authors and tell them that this single issue
prevents us from integrating TS into Emacs? Maybe that would change
their priorities. I cannot imagine that the feature we are asking is
hard to implement.
> As before, the code is at https://github.com/casouri/emacs on ts branch.
Thanks. Perhaps people could try testing the branch and providing
feedback?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-13 12:56 ` Eli Zaretskii
@ 2021-12-14 7:19 ` Yuan Fu
2021-12-17 0:14 ` Yuan Fu
2021-12-18 14:45 ` Philipp
1 sibling, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-12-14 7:19 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
Stephen Leake, john
>>
>> It’s been a while and no one provided further comments on the indent and font-lock integration of tree-sitter, so I finished the manuals for indent and font-lock integration. They are under 24.6 Font Lock Mode and 24.7 Automatic Indentation of code. Once the author of tree-sitter allow tree-sitter to change malloc implementation at runtime, tree-sitter integration will be ready. (Though I suspect that won’t come soon. The author is still actively developing tree-sitter but he didn’t reply to my request.)
>
> Would you please ping the authors and tell them that this single issue
> prevents us from integrating TS into Emacs? Maybe that would change
> their priorities. I cannot imagine that the feature we are asking is
> hard to implement.
Done.
>> As before, the code is at https://github.com/casouri/emacs on ts branch.
>
> Thanks. Perhaps people could try testing the branch and providing
> feedback?
Yes. Now that the manual is complete, people are welcome to try it out and see what they like and don’t like. It would be even better if someone wants to implement some major modes with the new tree-sitter features.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-14 7:19 ` Yuan Fu
@ 2021-12-17 0:14 ` Yuan Fu
2021-12-17 7:15 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-12-17 0:14 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
Stephen Leake, john
> On Dec 13, 2021, at 11:19 PM, Yuan Fu <casouri@gmail.com> wrote:
>
>>>
>>> It’s been a while and no one provided further comments on the indent and font-lock integration of tree-sitter, so I finished the manuals for indent and font-lock integration. They are under 24.6 Font Lock Mode and 24.7 Automatic Indentation of code. Once the author of tree-sitter allow tree-sitter to change malloc implementation at runtime, tree-sitter integration will be ready. (Though I suspect that won’t come soon. The author is still actively developing tree-sitter but he didn’t reply to my request.)
>>
>> Would you please ping the authors and tell them that this single issue
>> prevents us from integrating TS into Emacs? Maybe that would change
>> their priorities. I cannot imagine that the feature we are asking is
>> hard to implement.
>
> Done.
Someone commented on my request saying
> Had this issue as well, but thought was too niche to open an issue. The standard way to change the allocator at runtime is with the LD_PRELOAD envvar (see mimalloc or any allocator doc).
IIUC it is more of a user-feature right? Like you will use LD_PRELOAD=xxx program but not change the environment programmatically in the program? Could Emacs do this should tree-sitter doesn’t want to change?
BTW the conversation is at https://github.com/tree-sitter/tree-sitter/issues/1535
The author suggested to implement runtime change of malloc on top of current macros, but I think he missed the point (we don’t want to maintain our own version of tree-sitter).
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-17 0:14 ` Yuan Fu
@ 2021-12-17 7:15 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2021-12-17 7:15 UTC (permalink / raw)
To: Yuan Fu
Cc: ubolonton, theo, cpitclaudel, emacs-devel, monnier, stephen_leake,
john
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 16 Dec 2021 16:14:52 -0800
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
> Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> john@yates-sheets.org
>
> Someone commented on my request saying
>
> > Had this issue as well, but thought was too niche to open an issue. The standard way to change the allocator at runtime is with the LD_PRELOAD envvar (see mimalloc or any allocator doc).
>
> IIUC it is more of a user-feature right? Like you will use LD_PRELOAD=xxx program but not change the environment programmatically in the program? Could Emacs do this should tree-sitter doesn’t want to change?
I don't think we want to use LD_PRELOAD for this, for several good
reasons. It's non-portable, for starters.
> The author suggested to implement runtime change of malloc on top of current macros, but I think he missed the point (we don’t want to maintain our own version of tree-sitter).
Yes.
I hope we get a better response from the developers of Tree-sitter.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-13 6:54 ` Yuan Fu
2021-12-13 12:56 ` Eli Zaretskii
@ 2021-12-18 13:39 ` Daniel Martín
2021-12-19 2:48 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Daniel Martín @ 2021-12-18 13:39 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, ubolonton, theo, cpitclaudel, emacs-devel,
Stefan Monnier, stephen_leake, john
Yuan Fu <casouri@gmail.com> writes:
> It’s been a while and no one provided further comments on the indent
> and font-lock integration of tree-sitter, so I finished the manuals
> for indent and font-lock integration. They are under 24.6 Font Lock
> Mode and 24.7 Automatic Indentation of code. Once the author of
> tree-sitter allow tree-sitter to change malloc implementation at
> runtime, tree-sitter integration will be ready. (Though I suspect that
> won’t come soon. The author is still actively developing tree-sitter
> but he didn’t reply to my request.)
>
> As before, the code is at https://github.com/casouri/emacs on ts branch.
>
> Yuan
Thank you for your work. I had some troubles getting the latest code to
compile, so I've sent you a pull request with potential fixes:
https://github.com/casouri/emacs/pull/4 I have signed the FSF papers.
I have a general question about the major modes. I see there's a couple
of sample major modes, one for C and another for JSON, but they are in
tree-sitter.el. How would those new major modes be included with Emacs?
Will there be a c-ts mode, separate from cc-mode, which will implement
font lock and indentation in terms of tree-sitter (when Emacs is
compiled with tree-sitter support)? Or the plan is to extend the core
language modes to offer an option to support tree-sitter? (I'm not sure
how complicated and clean that would be.)
Thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-13 12:56 ` Eli Zaretskii
2021-12-14 7:19 ` Yuan Fu
@ 2021-12-18 14:45 ` Philipp
2021-12-18 14:57 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Philipp @ 2021-12-18 14:45 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, theo, ubolonton, emacs-devel, cpitclaudel, monnier,
stephen_leake, john
> Am 13.12.2021 um 13:56 schrieb Eli Zaretskii <eliz@gnu.org>:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sun, 12 Dec 2021 22:54:59 -0800
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> ubolonton@gmail.com,
>> theo@thornhill.no,
>> cpitclaudel@gmail.com,
>> emacs-devel@gnu.org,
>> stephen_leake@stephe-leake.org,
>> john@yates-sheets.org
>>
>> It’s been a while and no one provided further comments on the indent and font-lock integration of tree-sitter, so I finished the manuals for indent and font-lock integration. They are under 24.6 Font Lock Mode and 24.7 Automatic Indentation of code. Once the author of tree-sitter allow tree-sitter to change malloc implementation at runtime, tree-sitter integration will be ready. (Though I suspect that won’t come soon. The author is still actively developing tree-sitter but he didn’t reply to my request.)
>
> Would you please ping the authors and tell them that this single issue
> prevents us from integrating TS into Emacs? Maybe that would change
> their priorities. I cannot imagine that the feature we are asking is
> hard to implement.
That feature in itself won't be enough. Even with it, TreeSitter will have the same problem as GMP: allocation isn't allowed to fail, and longjmp'ing out of it isn't allowed and generally causes undefined behavior. What's needed is a rewrite of the TreeSitter code so that it handles allocation failure properly and gracefully by returning an error to the caller.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-18 14:45 ` Philipp
@ 2021-12-18 14:57 ` Eli Zaretskii
2021-12-19 2:51 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-12-18 14:57 UTC (permalink / raw)
To: Philipp
Cc: casouri, theo, ubolonton, emacs-devel, cpitclaudel, monnier,
stephen_leake, john
> From: Philipp <p.stephani2@gmail.com>
> Date: Sat, 18 Dec 2021 15:45:18 +0100
> Cc: Yuan Fu <casouri@gmail.com>,
> ubolonton@gmail.com,
> theo@thornhill.no,
> cpitclaudel@gmail.com,
> emacs-devel@gnu.org,
> monnier@iro.umontreal.ca,
> stephen_leake@stephe-leake.org,
> john@yates-sheets.org
>
> > Would you please ping the authors and tell them that this single issue
> > prevents us from integrating TS into Emacs? Maybe that would change
> > their priorities. I cannot imagine that the feature we are asking is
> > hard to implement.
>
> That feature in itself won't be enough. Even with it, TreeSitter will have the same problem as GMP: allocation isn't allowed to fail, and longjmp'ing out of it isn't allowed and generally causes undefined behavior.
It may not be enough to satisfy purists, but it's enough to allow the
user to save the session and shut down Emacs in an orderly fashion,
instead of abruptly exiting and losing all the edits.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-18 13:39 ` Daniel Martín
@ 2021-12-19 2:48 ` Yuan Fu
0 siblings, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2021-12-19 2:48 UTC (permalink / raw)
To: Daniel Martín
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Stefan Monnier,
Eli Zaretskii, Stephen Leake, john
> On Dec 18, 2021, at 5:39 AM, Daniel Martín <mardani29@yahoo.es> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>> It’s been a while and no one provided further comments on the indent
>> and font-lock integration of tree-sitter, so I finished the manuals
>> for indent and font-lock integration. They are under 24.6 Font Lock
>> Mode and 24.7 Automatic Indentation of code. Once the author of
>> tree-sitter allow tree-sitter to change malloc implementation at
>> runtime, tree-sitter integration will be ready. (Though I suspect that
>> won’t come soon. The author is still actively developing tree-sitter
>> but he didn’t reply to my request.)
>>
>> As before, the code is at https://github.com/casouri/emacs on ts branch.
>>
>> Yuan
>
> Thank you for your work. I had some troubles getting the latest code to
> compile, so I've sent you a pull request with potential fixes:
> https://github.com/casouri/emacs/pull/4 I have signed the FSF papers.
Yes, sorry, I forgot to push fixes after merging. I included your fix for the switch case. Thanks.
>
> I have a general question about the major modes. I see there's a couple
> of sample major modes, one for C and another for JSON, but they are in
> tree-sitter.el. How would those new major modes be included with Emacs?
> Will there be a c-ts mode, separate from cc-mode, which will implement
> font lock and indentation in terms of tree-sitter (when Emacs is
> compiled with tree-sitter support)? Or the plan is to extend the core
> language modes to offer an option to support tree-sitter? (I'm not sure
> how complicated and clean that would be.)
They are just my experiments and I included them in tree-sitter.el as examples for anyone want to try out tree-sitter. They will be removed when tree-sitter integration merges into master. Each major mode should optionally take advantage of tree-sitter features according to tree-sitter-enable-p. (At least that’s my plan, no one has objected to this approach so far.)
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-18 14:57 ` Eli Zaretskii
@ 2021-12-19 2:51 ` Yuan Fu
2021-12-19 7:11 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-12-19 2:51 UTC (permalink / raw)
To: Eli Zaretskii
Cc: ubolonton, theo, cpitclaudel, emacs-devel, Philipp, monnier,
stephen_leake, john
>>
>>> Would you please ping the authors and tell them that this single issue
>>> prevents us from integrating TS into Emacs? Maybe that would change
>>> their priorities. I cannot imagine that the feature we are asking is
>>> hard to implement.
>>
>> That feature in itself won't be enough. Even with it, TreeSitter will have the same problem as GMP: allocation isn't allowed to fail, and longjmp'ing out of it isn't allowed and generally causes undefined behavior.
>
> It may not be enough to satisfy purists, but it's enough to allow the
> user to save the session and shut down Emacs in an orderly fashion,
> instead of abruptly exiting and losing all the edits.
Uses can set tree-sitter-maximum-size to limit memory usage of tree-sitter. Buffers with size larger than that cannot enable tree-sitter. That doesn’t solve the problem directly but should let users avoid allocation failing most of the time.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-19 2:51 ` Yuan Fu
@ 2021-12-19 7:11 ` Eli Zaretskii
2021-12-19 7:52 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-12-19 7:11 UTC (permalink / raw)
To: Yuan Fu
Cc: ubolonton, theo, cpitclaudel, emacs-devel, p.stephani2, monnier,
stephen_leake, john
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 18 Dec 2021 18:51:25 -0800
> Cc: Philipp <p.stephani2@gmail.com>,
> ubolonton@gmail.com,
> theo@thornhill.no,
> cpitclaudel@gmail.com,
> emacs-devel@gnu.org,
> monnier@iro.umontreal.ca,
> stephen_leake@stephe-leake.org,
> john@yates-sheets.org
>
> >> That feature in itself won't be enough. Even with it, TreeSitter will have the same problem as GMP: allocation isn't allowed to fail, and longjmp'ing out of it isn't allowed and generally causes undefined behavior.
> >
> > It may not be enough to satisfy purists, but it's enough to allow the
> > user to save the session and shut down Emacs in an orderly fashion,
> > instead of abruptly exiting and losing all the edits.
>
> Uses can set tree-sitter-maximum-size to limit memory usage of tree-sitter. Buffers with size larger than that cannot enable tree-sitter. That doesn’t solve the problem directly but should let users avoid allocation failing most of the time.
Btw, we should have a good idea how frequent this out-of-memory
problem could be with tree-sitter. Did someone try to scroll through
all of xdisp.c, using tree-sitter for C Mode fontifications, and
measured the memory footprint that produces? If not, I think it would
be a good idea to try.
If the OOM problem happens frequently with large source files, it may
indeed be the case that we will need to disable tree-sitter up front
based on some size criteria.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-19 7:11 ` Eli Zaretskii
@ 2021-12-19 7:52 ` Yuan Fu
2021-12-24 10:04 ` Yoav Marco
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-12-19 7:52 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Philipp,
Stefan Monnier, Stephen Leake, john
> On Dec 18, 2021, at 11:11 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sat, 18 Dec 2021 18:51:25 -0800
>> Cc: Philipp <p.stephani2@gmail.com>,
>> ubolonton@gmail.com,
>> theo@thornhill.no,
>> cpitclaudel@gmail.com,
>> emacs-devel@gnu.org,
>> monnier@iro.umontreal.ca,
>> stephen_leake@stephe-leake.org,
>> john@yates-sheets.org
>>
>>>> That feature in itself won't be enough. Even with it, TreeSitter will have the same problem as GMP: allocation isn't allowed to fail, and longjmp'ing out of it isn't allowed and generally causes undefined behavior.
>>>
>>> It may not be enough to satisfy purists, but it's enough to allow the
>>> user to save the session and shut down Emacs in an orderly fashion,
>>> instead of abruptly exiting and losing all the edits.
>>
>> Uses can set tree-sitter-maximum-size to limit memory usage of tree-sitter. Buffers with size larger than that cannot enable tree-sitter. That doesn’t solve the problem directly but should let users avoid allocation failing most of the time.
>
> Btw, we should have a good idea how frequent this out-of-memory
> problem could be with tree-sitter. Did someone try to scroll through
> all of xdisp.c, using tree-sitter for C Mode fontifications, and
> measured the memory footprint that produces? If not, I think it would
> be a good idea to try.
>
> If the OOM problem happens frequently with large source files, it may
> indeed be the case that we will need to disable tree-sitter up front
> based on some size criteria.
From the author’s quote and my experiments, tree-sitter uses about 10–20x memory of the buffer size. So xdisp.c is fine. Also you don’t need to scroll through the buffer, tree-sitter parses the whole buffer up-front.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-19 7:52 ` Yuan Fu
@ 2021-12-24 10:04 ` Yoav Marco
2021-12-24 10:21 ` Yoav Marco
0 siblings, 1 reply; 370+ messages in thread
From: Yoav Marco @ 2021-12-24 10:04 UTC (permalink / raw)
To: casouri
Cc: cpitclaudel, theo, ubolonton, emacs-devel, p.stephani2, monnier,
eliz, stephen_leake, john
Hi, Yuan and I had a discussion on github
https://github.com/casouri/emacs/issues/5
and he suggested we move here. I'm quoting our comments for conveniece.
Yoav Marco <yoavm448@gmail.com> writes:
> Hi! My question is about the lines:
> https://github.com/casouri/emacs/blob/a4f90c5f95476914fb8789c67652af1025644af8/src/tree-sitter.c#L1375-L1380
>
> /* TODO: We could cache the query object, so that repeatedly
> querying with the same query can reuse the query object. It also
> saves us from expanding the sexp query into a string. I don't
> know how much time that could save though. */
> TSQuery *ts_query = ts_query_new (lang, source, strlen (source),
> &error_offset, &error_type);
>
>
> Regarding error handling mostly.
>
> In this branch queries are saved as *strings* and compiled in the internals on
> each use. In elisp-tree-sitter, you call `tsc-make-query` and use the object
> it returns for calls to tsc-query-captures which is the analog for
> tree-sitter-query-capture.
>
> What happens if your query is deformed, or simply has a typo in a node name?
> We call `tree-sitter-query-capture` on each keystroke in
> `tree-sitter-font-lock-fontify-region`. With the compilation occurring
> ahead-of-time it would fail once, but here wouldn't it barrage you with
> errors?
>
> Especially with patterns that aren't set in stone and can be modified like
> font-lock keywords, I think compiling the query when the pattern is added is
> better than on each execution.
>
> One nice thing though about compiling queries only when queried is that you
> can call `ts_query_delete` straight away. With users compiling queries it
> would need to be up to garbage collection, I think.
Yuan Fu <notifications@github.com> writes:
>> What happens if your query is deformed, or simply has a typo in a node name?
>> We call tree-sitter-query-capture on each keystroke in
>> tree-sitter-font-lock-fontify-region. With the compilation occurring
>> ahead-of-time it would fail once, but here wouldn't it barrage you with
>> errors?
>
> Not quite barraging, jit-lock will just silently fail and leave a bunch of
> logs in Messages. I don't think error out when calling
> tree-sitter-query-capture is a grave problem, since 1) it doesn't barrage as
> you worried and 2) I don't expect queries in major modes to ship wrong code:
> it's not like a bug that could go undiscovered, if the query has a typo, the
> major mode writer will certainly find out when he/she tries to fontify a
> buffer.
>
> I can see some advantages to compile the query ahead of time. 1) It would be
> helpful to know there is an error before calling
> tree-sitter-font-lock-fontify-region and see an unfontified buffer, not
> knowing what went wrong. I can add a function, say, tree-sitter-compile-query
> that checks a query (as in query pattern) and passes it on if its correct. 2)
> It could potentially saves recompilation of the query. But computing the query
> most probably takes negligible time.
>
> On the other hand, compiling the query has downsides: I don't know what does
> tsc-make-query return, I assume an internal object? I try to minimize the
> number of new object types I introduce to Emacs, for hygiene. So far I've
> managed to add only parser object and node object. If there aren't good
> reasons I'm inclined to not add a query object. So far the advantages that I
> see aren't very convincing.
>
> If you want to continue the discussion, I suggest we continue at emacs-devel,
> that way others who are more knowledgable than I can join and offer their
> opinion.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-24 10:04 ` Yoav Marco
@ 2021-12-24 10:21 ` Yoav Marco
2021-12-25 8:31 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Yoav Marco @ 2021-12-24 10:21 UTC (permalink / raw)
To: casouri
Cc: cpitclaudel, theo, ubolonton, emacs-devel, p.stephani2, monnier,
eliz, stephen_leake, john
> Yuan Fu <notifications@github.com> writes:
>> I can see some advantages to compile the query ahead of time. 1) It would be
>> helpful to know there is an error before calling
>> tree-sitter-font-lock-fontify-region and see an unfontified buffer, not
>> knowing what went wrong. I can add a function, say, tree-sitter-compile-query
>> that checks a query (as in query pattern) and passes it on if its correct. 2)
>> It could potentially saves recompilation of the query. But computing the query
>> most probably takes negligible time.
I'll try to benchmark it. Would be great if it really is nothing.
>> On the other hand, compiling the query has downsides: I don't know what does
>> tsc-make-query return, I assume an internal object? I try to minimize the
>> number of new object types I introduce to Emacs, for hygiene. So far I've
>> managed to add only parser object and node object. If there aren't good
>> reasons I'm inclined to not add a query object. So far the advantages that I
>> see aren't very convincing.
Yeah, it returns a user-pointer.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-24 10:21 ` Yoav Marco
@ 2021-12-25 8:31 ` Yuan Fu
2021-12-25 10:13 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-12-25 8:31 UTC (permalink / raw)
To: Yoav Marco
Cc: Clément Pit-Claudel, Theodor Thornhill, ubolonton,
Emacs developers, Philipp, Stefan Monnier, Eli Zaretskii,
Stephen Leake, John Yates
> On Dec 24, 2021, at 2:21 AM, Yoav Marco <yoavm448@gmail.com> wrote:
>
>
>
>> Yuan Fu <notifications@github.com> writes:
>>> I can see some advantages to compile the query ahead of time. 1) It would be
>>> helpful to know there is an error before calling
>>> tree-sitter-font-lock-fontify-region and see an unfontified buffer, not
>>> knowing what went wrong. I can add a function, say, tree-sitter-compile-query
>>> that checks a query (as in query pattern) and passes it on if its correct. 2)
>>> It could potentially saves recompilation of the query. But computing the query
>>> most probably takes negligible time.
>
> I'll try to benchmark it. Would be great if it really is nothing.
Sounds good, thanks. Eli, is there any profiling primitives in Emacs? How do you usually profile Emacs?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-25 8:31 ` Yuan Fu
@ 2021-12-25 10:13 ` Eli Zaretskii
2021-12-26 9:50 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-12-25 10:13 UTC (permalink / raw)
To: Yuan Fu
Cc: cpitclaudel, theo, ubolonton, emacs-devel, p.stephani2, monnier,
yoavm448, stephen_leake, john
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 25 Dec 2021 00:31:26 -0800
> Cc: Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Eli Zaretskii <eliz@gnu.org>,
> Emacs developers <emacs-devel@gnu.org>,
> John Yates <john@yates-sheets.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> Philipp <p.stephani2@gmail.com>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Theodor Thornhill <theo@thornhill.no>,
> ubolonton@gmail.com
>
> Sounds good, thanks. Eli, is there any profiling primitives in Emacs? How do you usually profile Emacs?
If it's mainly Lisp code, then "M-x profiler-start RET RET" followed
by "M-x profiler-report RET".
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-25 10:13 ` Eli Zaretskii
@ 2021-12-26 9:50 ` Yuan Fu
2021-12-26 10:23 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-12-26 9:50 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Theodor Thornhill, ubolonton,
Emacs developers, Philipp, Stefan Monnier, Yoav Marco,
Stephen Leake, John Yates
> On Dec 25, 2021, at 2:13 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sat, 25 Dec 2021 00:31:26 -0800
>> Cc: Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Eli Zaretskii <eliz@gnu.org>,
>> Emacs developers <emacs-devel@gnu.org>,
>> John Yates <john@yates-sheets.org>,
>> Stefan Monnier <monnier@iro.umontreal.ca>,
>> Philipp <p.stephani2@gmail.com>,
>> Stephen Leake <stephen_leake@stephe-leake.org>,
>> Theodor Thornhill <theo@thornhill.no>,
>> ubolonton@gmail.com
>>
>> Sounds good, thanks. Eli, is there any profiling primitives in Emacs? How do you usually profile Emacs?
>
> If it's mainly Lisp code, then "M-x profiler-start RET RET" followed
> by "M-x profiler-report RET”.
Thanks. I mean C functions.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-26 9:50 ` Yuan Fu
@ 2021-12-26 10:23 ` Eli Zaretskii
2021-12-30 0:59 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-12-26 10:23 UTC (permalink / raw)
To: Yuan Fu
Cc: cpitclaudel, theo, ubolonton, emacs-devel, p.stephani2, monnier,
yoavm448, stephen_leake, john
> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 26 Dec 2021 01:50:58 -0800
> Cc: Yoav Marco <yoavm448@gmail.com>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> John Yates <john@yates-sheets.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> Philipp <p.stephani2@gmail.com>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Theodor Thornhill <theo@thornhill.no>,
> ubolonton@gmail.com
>
>
>
> > On Dec 25, 2021, at 2:13 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> >
> >> From: Yuan Fu <casouri@gmail.com>
> >> Date: Sat, 25 Dec 2021 00:31:26 -0800
> >> Cc: Clément Pit-Claudel <cpitclaudel@gmail.com>,
> >> Eli Zaretskii <eliz@gnu.org>,
> >> Emacs developers <emacs-devel@gnu.org>,
> >> John Yates <john@yates-sheets.org>,
> >> Stefan Monnier <monnier@iro.umontreal.ca>,
> >> Philipp <p.stephani2@gmail.com>,
> >> Stephen Leake <stephen_leake@stephe-leake.org>,
> >> Theodor Thornhill <theo@thornhill.no>,
> >> ubolonton@gmail.com
> >>
> >> Sounds good, thanks. Eli, is there any profiling primitives in Emacs? How do you usually profile Emacs?
> >
> > If it's mainly Lisp code, then "M-x profiler-start RET RET" followed
> > by "M-x profiler-report RET”.
>
> Thanks. I mean C functions.
Emacs can be built with profiling support, see --enable-profiling.
Then you can run gprof on the output of an Emacs session.
Another option is to use perf on GNU/Linux.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-26 10:23 ` Eli Zaretskii
@ 2021-12-30 0:59 ` Yuan Fu
2021-12-30 6:35 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2021-12-30 0:59 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Theodor Thornhill, ubolonton,
Emacs developers, Philipp, Stefan Monnier, Yoav Marco,
Stephen Leake, John Yates
>
> Emacs can be built with profiling support, see --enable-profiling.
> Then you can run gprof on the output of an Emacs session.
>
> Another option is to use perf on GNU/Linux.
Thanks.
BTW, I can’t seem to compiler Emacs after merging master. (But master along compiles fine.) Neither make bootstrap nor git clean -xf worked. I’m getting
./temacs --batch -l loadup --temacs=pbootstrap \
--bin-dest /Users/yuan/emacs/nextstep/Emacs.app/Contents/MacOS/ --eln-dest /Users/yuan/emacs/nextstep/Emacs.app/Contents/Frameworks/
Loading loadup.el (source)...
Symbol's function definition is void: internal-timer-start-idle
make[2]: *** [bootstrap-emacs.pdmp] Error 255
make[1]: *** [src] Error 2
make: *** [bootstrap] Error 2
How should I go about debugging this?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-30 0:59 ` Yuan Fu
@ 2021-12-30 6:35 ` Eli Zaretskii
2022-01-04 18:31 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2021-12-30 6:35 UTC (permalink / raw)
To: Yuan Fu
Cc: cpitclaudel, theo, ubolonton, emacs-devel, p.stephani2, monnier,
yoavm448, stephen_leake, john
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 29 Dec 2021 16:59:44 -0800
> Cc: Yoav Marco <yoavm448@gmail.com>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> John Yates <john@yates-sheets.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> Philipp <p.stephani2@gmail.com>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Theodor Thornhill <theo@thornhill.no>,
> ubolonton@gmail.com
>
> BTW, I can’t seem to compiler Emacs after merging master. (But master along compiles fine.) Neither make bootstrap nor git clean -xf worked. I’m getting
>
> ./temacs --batch -l loadup --temacs=pbootstrap \
> --bin-dest /Users/yuan/emacs/nextstep/Emacs.app/Contents/MacOS/ --eln-dest /Users/yuan/emacs/nextstep/Emacs.app/Contents/Frameworks/
> Loading loadup.el (source)...
> Symbol's function definition is void: internal-timer-start-idle
> make[2]: *** [bootstrap-emacs.pdmp] Error 255
> make[1]: *** [src] Error 2
> make: *** [bootstrap] Error 2
>
> How should I go about debugging this?
Run the offending command under a debugger, and try to find out which
code in loadup.el causes this. On macOS, this is a bit tough, since
GDB doesn't work,so you cannot easily examine Lisp data using the
commands in src/.gdbinit.
You could also try bisecting to find the offending commit.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2021-12-30 6:35 ` Eli Zaretskii
@ 2022-01-04 18:31 ` Yuan Fu
2022-03-13 6:22 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2022-01-04 18:31 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Theodor Thornhill, ubolonton,
Emacs developers, Philipp, Stefan Monnier, Yoav Marco,
Stephen Leake, John Yates
[-- Attachment #1: Type: text/plain, Size: 496 bytes --]
>>
>> How should I go about debugging this?
>
> Run the offending command under a debugger, and try to find out which
> code in loadup.el causes this. On macOS, this is a bit tough, since
> GDB doesn't work,so you cannot easily examine Lisp data using the
> commands in src/.gdbinit.
>
> You could also try bisecting to find the offending commit.
Thanks. I figured it out.
Now that the tree-sitter integration is mostly done, anyone would like to look at the patch?
Yuan
[-- Attachment #2: tree-sitter.patch --]
[-- Type: application/octet-stream, Size: 193153 bytes --]
diff --git a/configure.ac b/configure.ac
index dabc2b425f..46a93435ab 100644
--- a/configure.ac
+++ b/configure.ac
@@ -457,6 +457,7 @@ AC_DEFUN
OPTION_DEFAULT_OFF([imagemagick],[compile with ImageMagick image support])
OPTION_DEFAULT_ON([native-image-api], [don't use native image APIs (GDI+ on Windows)])
OPTION_DEFAULT_IFAVAILABLE([json], [compile with native JSON support])
+OPTION_DEFAULT_IFAVAILABLE([tree-sitter], [compile with tree-sitter])
OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([harfbuzz],[don't use HarfBuzz for text shaping])
@@ -3117,6 +3118,23 @@ AC_DEFUN
AC_SUBST(JSON_CFLAGS)
AC_SUBST(JSON_OBJ)
+HAVE_TREE_SITTER=no
+TREE_SITTER_OBJ=
+
+if test "${with_tree_sitter}" != "no"; then
+ EMACS_CHECK_MODULES([TREE_SITTER], [tree-sitter >= 0.0],
+ [HAVE_TREE_SITTER=yes], [HAVE_TREE_SITTER=no])
+ if test "${HAVE_TREE_SITTER}" = yes; then
+ AC_DEFINE(HAVE_TREE_SITTER, 1, [Define if using tree-sitter.])
+ TREE_SITTER_LIBS=-ltree-sitter
+ TREE_SITTER_OBJ="tree-sitter.o"
+ fi
+fi
+
+AC_SUBST(TREE_SITTER_LIBS)
+AC_SUBST(TREE_SITTER_CFLAGS)
+AC_SUBST(TREE_SITTER_OBJ)
+
NOTIFY_OBJ=
NOTIFY_SUMMARY=no
@@ -3935,20 +3953,31 @@ AC_DEFUN
fi
AC_SUBST(LIBZ)
+### Dynamic library support
+case $opsys in
+ cygwin|mingw32) DYNAMIC_LIB_SUFFIX=".dll" ;;
+ darwin) DYNAMIC_LIB_SUFFIX=".dylib" ;;
+ *) DYNAMIC_LIB_SUFFIX=".so" ;;
+esac
+case "${opsys}" in
+ darwin) DYNAMIC_LIB_SECONDARY_SUFFIX='.so' ;;
+ *) DYNAMIC_LIB_SECONDARY_SUFFIX='' ;;
+esac
+AC_DEFINE_UNQUOTED(DYNAMIC_LIB_SUFFIX, "$DYNAMIC_LIB_SUFFIX",
+ [System extension for dynamic libraries])
+AC_DEFINE_UNQUOTED(DYNAMIC_LIB_SECONDARY_SUFFIX, "$DYNAMIC_LIB_SECONDARY_SUFFIX",
+ [Alternative system extension for dynamic libraries.])
+
+AC_SUBST(DYNAMIC_LIB_SUFFIX)
+AC_SUBST(DYNAMIC_LIB_SECONDARY_SUFFIX)
+
### Dynamic modules support
LIBMODULES=
HAVE_MODULES=no
MODULES_OBJ=
NEED_DYNLIB=no
-case $opsys in
- cygwin|mingw32) MODULES_SUFFIX=".dll" ;;
- darwin) MODULES_SUFFIX=".dylib" ;;
- *) MODULES_SUFFIX=".so" ;;
-esac
-case "${opsys}" in
- darwin) MODULES_SECONDARY_SUFFIX='.so' ;;
- *) MODULES_SECONDARY_SUFFIX='' ;;
-esac
+MODULES_SUFFIX="${DYNAMIC_LIB_SUFFIX}"
+MODULES_SECONDARY_SUFFIX="${DYNAMIC_LIB_SECONDARY_SUFFIX}"
if test "${with_modules}" != "no"; then
case $opsys in
gnu|gnu-linux)
@@ -3979,10 +4008,10 @@ AC_DEFUN
NEED_DYNLIB=yes
AC_DEFINE(HAVE_MODULES, 1, [Define to 1 if dynamic modules are enabled])
AC_DEFINE_UNQUOTED(MODULES_SUFFIX, "$MODULES_SUFFIX",
- [System extension for dynamic libraries])
+ [System extension for dynamic modules])
if test -n "${MODULES_SECONDARY_SUFFIX}"; then
AC_DEFINE_UNQUOTED(MODULES_SECONDARY_SUFFIX, "$MODULES_SECONDARY_SUFFIX",
- [Alternative system extension for dynamic libraries.])
+ [Alternative system extension for dynamic modules.])
fi
fi
AC_SUBST(MODULES_OBJ)
@@ -4342,6 +4371,12 @@ AC_DEFUN
*) MISSING="$MISSING json"
WITH_IFAVAILABLE="$WITH_IFAVAILABLE --with-json=ifavailable";;
esac
+case $with_tree_sitter,$HAVE_TREE_SITTER in
+ no,* | ifavailable,* | *,yes) ;;
+ *) MISSING="$MISSING tree-sitter"
+ WITH_IFAVAILABLE="$WITH_IFAVAILABLE --with-tree-sitter=ifavailable";;
+esac
+
if test "X${MISSING}" != X; then
# If we have a missing library, and we don't have pkg-config installed,
# the missing pkg-config may be the reason. Give the user a hint.
@@ -6256,7 +6291,7 @@ AC_DEFUN
optsep=
emacs_config_features=
for opt in ACL BE_APP CAIRO DBUS FREETYPE GCONF GIF GLIB GMP GNUTLS GPM GSETTINGS \
- HARFBUZZ IMAGEMAGICK JPEG JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 \
+ HARFBUZZ IMAGEMAGICK JPEG JSON TREE-SITTER LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 \
M17N_FLT MODULES NATIVE_COMP NOTIFY NS OLDXMENU PDUMPER PGTK PNG RSVG SECCOMP \
SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS \
UNEXEC WEBP X11 XAW3D XDBE XFT XIM XINPUT2 XPM XWIDGETS X_TOOLKIT \
@@ -6327,6 +6362,7 @@ AC_DEFUN
Does Emacs use -lxft? ${HAVE_XFT}
Does Emacs use -lsystemd? ${HAVE_LIBSYSTEMD}
Does Emacs use -ljansson? ${HAVE_JSON}
+ Does Emacs use -ltree-sitter? ${HAVE_TREE_SITTER}
Does Emacs use the GMP library? ${HAVE_GMP}
Does Emacs directly use zlib? ${HAVE_ZLIB}
Does Emacs have dynamic modules support? ${HAVE_MODULES}
diff --git a/doc/lispref/elisp.texi b/doc/lispref/elisp.texi
index 3254a4dba8..435df33efc 100644
--- a/doc/lispref/elisp.texi
+++ b/doc/lispref/elisp.texi
@@ -222,6 +222,7 @@ Top
* Non-ASCII Characters:: Non-ASCII text in buffers and strings.
* Searching and Matching:: Searching buffers for strings or regexps.
* Syntax Tables:: The syntax table controls word and list parsing.
+* Parsing Program Source:: Generate syntax tree for program sources.
* Abbrevs:: How Abbrev mode works, and its data structures.
* Threads:: Concurrency in Emacs Lisp.
@@ -1353,6 +1354,16 @@ Top
* Syntax Table Internals:: How syntax table information is stored.
* Categories:: Another way of classifying character syntax.
+Parsing Program Source
+
+* Language Definitions:: Loading tree-sitter language definitions.
+* Using Parser:: Introduction to parsers.
+* Retrieving Node:: Retrieving node from syntax tree.
+* Accessing Node:: Accessing node information.
+* Pattern Matching:: Pattern matching with query patterns.
+* Multiple Languages:: Parse text written in multiple languages.
+* Tree-sitter C API:: Compare the C API and the ELisp API.
+
Syntax Descriptors
* Syntax Class Table:: Table of syntax classes.
@@ -1697,6 +1708,7 @@ Top
@include searching.texi
@include syntax.texi
+@include parsing.texi
@include abbrevs.texi
@include threads.texi
@include processes.texi
diff --git a/doc/lispref/modes.texi b/doc/lispref/modes.texi
index 5fc831536e..10026221af 100644
--- a/doc/lispref/modes.texi
+++ b/doc/lispref/modes.texi
@@ -2795,11 +2795,13 @@ Font Lock Mode
in which contexts. This section explains how to customize Font Lock for
a particular major mode.
- Font Lock mode finds text to highlight in two ways: through
-syntactic parsing based on the syntax table, and through searching
-(usually for regular expressions). Syntactic fontification happens
-first; it finds comments and string constants and highlights them.
-Search-based fontification happens second.
+ Font Lock mode finds text to highlight in three ways: through
+syntactic parsing based on the syntax table, through searching
+(usually for regular expressions), and through parsing based on a
+full-blown parser. Syntactic fontification happens first; it finds
+comments and string constants and highlights them. Search-based
+fontification happens second. Parser-based fontification can be
+optionally enabled and it will precede the other two fontifications.
@menu
* Font Lock Basics:: Overview of customizing Font Lock.
@@ -2814,6 +2816,7 @@ Font Lock Mode
* Syntactic Font Lock:: Fontification based on syntax tables.
* Multiline Font Lock:: How to coerce Font Lock into properly
highlighting multiline constructs.
+* Parser-based Font Lock:: Use a parser for fontification.
@end menu
@node Font Lock Basics
@@ -3704,6 +3707,89 @@ Region to Refontify
reasonably fast.
@end defvar
+@node Parser-based Font Lock
+@subsection Parser-based Font Lock
+
+@c This node is written when the only parser Emacs has is tree-sitter,
+@c if in the future more parser are supported, feel free to reorganize
+@c and rewrite this node to describe multiple parsers in parallel.
+
+Besides simple syntactic font lock and search-based font lock, Emacs
+also provides complete syntactic font lock with the help of a parser,
+currently provided by the tree-sitter library (@pxref{Parsing Program
+Source}). Because it is an optional feature, parser-based font lock
+is less integrated with Emacs. Most variables introduced in previous
+sections only apply to search-based font lock, except for
+@var{font-lock-maximum-decoration}.
+
+@defun tree-sitter-font-lock-enable
+This function enables parser-based font lock in the current buffer.
+@end defun
+
+Parser-based font lock and other font lock mechanism are not mutually
+exclusive. By default, if enabled, parser-based font lock runs first,
+then the simple syntactic font lock (if enabled), then search-based
+font lock.
+
+Although parser-based font lock doesn't share the same customization
+variables with search-based font lock, parser-based font lock uses
+similar customization schemes. Just like @var{font-lock-keywords} and
+@var{font-lock-defaults}, parser-based font lock has
+@var{tree-sitter-font-lock-settings} and
+@var{tree-sitter-font-lock-defaults}.
+
+@defvar tree-sitter-font-lock-settings
+A list of @var{setting}s for tree-sitter font lock.
+
+Each @var{setting} should look like
+
+@example
+(@var{language} @var{query})
+@end example
+
+Each @var{setting} controls one parser (often of different language).
+And @var{language} is the language symbol (@pxref{Language
+Definitions}); @var{query} is either a string query or a sexp query
+(@pxref{Pattern Matching}).
+
+Capture names in @var{query} should be face names like
+@code{font-lock-keyword-face}. The captured node will be fontified
+with that face. Capture names can also be function names, in which
+case the function is called with (@var{start} @var{end} @var{node}),
+where @var{start} and @var{end} are the start and end position of the
+node in buffer, and @var{node} is the tree-sitter node object. If a
+capture name is both a face and a function, face takes priority.
+
+Generally, major modes should set @var{tree-sitter-font-lock-defaults},
+and let Emacs automatically populate this variable.
+@end defvar
+
+@defvar tree-sitter-font-lock-defaults
+This variable stores defaults for tree-sitter font Lock. It is a list
+of
+
+@example
+(@var{default} @var{:keyword} @var{value}...)
+@end example
+
+A @var{default} may be a symbol or a list of symbols (for different
+levels of fontification). The symbol(s) can be a variable or a
+function. If a symbol is both a variable and a function, it is used
+as a function. Different levels of fontification can be controlled by
+@var{font-lock-maximum-decoration}.
+
+The symbol(s) in @var{default} should contain or return a
+@var{setting} as described in @var{tree-sitter-font-lock-settings}.
+
+The rest @var{keyword}s and @var{value}s are additional settings that
+could be used to alter the fontification behavior. Currently there
+aren't any.
+@end defvar
+
+Multi-language major modes should provide range functions in
+@var{tree-sitter-range-functions}, and Emacs will set the ranges
+accordingly before fontifing a region (@pxref{Multiple Languages}).
+
@node Auto-Indentation
@section Automatic Indentation of code
@@ -3760,10 +3846,12 @@ Auto-Indentation
so if your language seems somewhat similar to one of those languages,
you might try to use that engine. @c FIXME: documentation?
Another one is SMIE which takes an approach in the spirit
-of Lisp sexps and adapts it to non-Lisp languages.
+of Lisp sexps and adapts it to non-Lisp languages. Yet another one is
+to rely on a full-blown parser, for example, the tree-sitter library.
@menu
* SMIE:: A simple minded indentation engine.
+* Parser-based indentation:: Parser-based indentation engine.
@end menu
@node SMIE
@@ -4423,6 +4511,169 @@ SMIE Customization
@code{eval: (smie-config-local '(@var{rules}))}.
@end defun
+@node Parser-based Indentation
+@subsection Parser-based Indentation
+
+@c This node is written when the only parser Emacs has is tree-sitter,
+@c if in the future more parser are supported, feel free to reorganize
+@c and rewrite this node to describe multiple parsers in parallel.
+
+When built with the tree-sitter library (@pxref{Parsing Program
+Source}), Emacs could parse program source and produce a syntax tree.
+And this syntax tree can be used for indentation. For maximum
+flexibility, we could write a custom indent function that queries the
+syntax tree and indents accordingly for each language, but that would
+be a lot of work. It is more convenient to use the simple indentation
+engine described below: we only need to write some indentation rules
+and the engine takes care of the rest.
+
+To enable the indentation engine, set the value of
+@var{indent-line-function} to @code{tree-sitter-indent}.
+
+@defvar tree-sitter-indent-function
+This variable stores the actual function called by
+@code{tree-sitter-indent}. By default, its value is
+@code{tree-sitter-simple-indent}. In the future we might add other
+more complex indentation engines, if @code{tree-sitter-simple-indent}
+proves to be insufficient.
+@end defvar
+
+@heading Writing indentation rules
+
+@defvar tree-sitter-simple-indent-rules
+This local variable stores indentation rules for every language. It is
+a list of
+
+@example
+(@var{language} . @var{rules})
+@end example
+
+where @var{language} is a language symbol, @var{rules} is a list of
+
+@example
+(@var{matcher} @var{anchor} @var{offset})
+@end example
+
+The @var{matcher} determines whether this rule applies, @var{anchor}
+and @var{offset} together determines which column to indent to.
+
+A @var{matcher} is a function that takes three arguments (@var{node}
+@var{parent} @var{bol}). Argument @var{bol} is the point at where we
+are indenting: the position of the first non-whitespace character from
+the beginning of line; @var{node} is the largest (highest-in-tree)
+node that starts at that point; @var{parent} is the parent of
+@var{node};
+
+If @var{matcher} returns non-nil, meaning the rule matches, Emacs then
+uses @var{anchor} to find an anchor, it should be a function that
+takes the same argument (@var{node} @var{parent} @var{bol}) and
+returns a point.
+
+Finally Emacs computes the column of that point returned by
+@var{anchor} and adds @var{offset} to it, and indents to that column.
+
+For @var{matcher} and @var{anchor}, Emacs provides some convenient
+presets to spare us from writing these functions ourselves. They are
+stored in @var{tree-sitter-simple-indent-presets}, see below.
+@end defvar
+
+@defvar tree-sitter-simple-indent-presets
+This is a list of presets for @var{matcher}s and @var{anchor}s in
+@var{tree-sitter-simple-indent-rules}. Each of them represent a
+function that takes @var{node}, @var{parent} and @var{bol} as
+arguments.
+
+@example
+(match @var{node-type} @var{parent-type}
+ @var{node-field} @var{node-index-min} @var{node-index-max})
+@end example
+
+This matcher checks if @var{node}'s type is @var{node-type},
+@var{parent}'s type is @var{parent-type}, @var{node}'s field name in
+@var{parent} is @var{node-field}, and @var{node}'s index among its
+siblings is between @var{node-index-min} and @var{node-index-max}. If
+the value of a constraint is nil, this matcher doesn't check for that
+constraint. For example, to match the first child where parent is
+@code{argument_list}, use
+
+@example
+(match nil "argument_list" nil nil 0 0)
+@end example
+
+@example
+no-node
+@end example
+
+This matcher matches the case where @var{node} is nil, i.e., there is
+no node that starts at @var{bol}. This is the case when @var{bol} is
+at an empty line or inside a multi-line string, etc.
+
+@example
+(parent-is @var{type})
+@end example
+
+This matcher matches if @var{parent}'s type is @var{type}.
+
+@example
+(node-is @var{type})
+@end example
+
+This matcher matches if @var{node}'s type is @var{type}.
+
+@example
+(query @var{query})
+@end example
+
+This matcher matches if querying @var{parent} with @var{query}
+captures @var{node}. The capture name does not matter.
+
+@example
+first-sibling
+@end example
+
+This anchor returns the start of the first child of @var{parent}.
+
+@example
+parent
+@end example
+
+This anchor returns the start of @var{parent}.
+
+@example
+prev-sibling
+@end example
+
+This anchor returns the start of the previous sibling of @var{node}.
+
+@example
+no-indent
+@end example
+
+This anchor returns the start of @var{node}, i.e., do not indent.
+
+@example
+prev-line
+@end example
+
+This anchor returns the start of the first named node on the previous
+line. This can be used for indenting an empty line.
+@end defvar
+
+@heading Indentation utilities
+
+Here are some utility functions that can help writing indentation
+rules.
+
+@defun tree-sitter-check-indent mode
+This function check current buffer's indentation against major mode
+@var{mode}. It indents the current line in @var{mode} and compares
+the indentation with the current indentation. Then it pops up a diff
+buffer showing the difference. Correct indentation (target) is in
+green, current indentation is in red.
+@end defun
+
+It is also helpful to use @code{tree-sitter-inspect-mode} when writing
+indentation rules.
@node Desktop Save Mode
@section Desktop Save Mode
diff --git a/doc/lispref/parsing.texi b/doc/lispref/parsing.texi
new file mode 100644
index 0000000000..4ebb13ac5e
--- /dev/null
+++ b/doc/lispref/parsing.texi
@@ -0,0 +1,1416 @@
+@c -*- mode: texinfo; coding: utf-8 -*-
+@c This is part of the GNU Emacs Lisp Reference Manual.
+@c Copyright (C) 2021 Free Software Foundation, Inc.
+@c See the file elisp.texi for copying conditions.
+@node Parsing Program Source
+@chapter Parsing Program Source
+
+Emacs provides various ways to parse program source text and produce a
+@dfn{syntax tree}. In a syntax tree, text is no longer a
+one-dimensional stream but a structured tree of nodes, where each node
+representing a piece of text. Thus a syntax tree can enable
+interesting features like precise fontification, indentation,
+navigation, structured editing, etc.
+
+Emacs has a simple facility for parsing balanced expressions
+(@pxref{Parsing Expressions}). There is also SMIE library for generic
+navigation and indentation (@pxref{SMIE}).
+
+Emacs also provides integration with tree-sitter library
+(@uref{https://tree-sitter.github.io/tree-sitter}) if compiled with
+it. The tree-sitter library implements an incremental parser and has
+support from a wide range of programming languages.
+
+@defun tree-sitter-available-p
+This function returns non-nil if tree-sitter features are available
+for this Emacs instance.
+@end defun
+
+For using tree-sitter features in font-lock and indentation,
+@pxref{Parser-based Font Lock}, @pxref{Parser-based Indentation}.
+
+To access the syntax tree of the text in a buffer, we need to first
+load a language definition and create a parser with it. Next, we can
+query the parser for specific nodes in the syntax tree. Then, we can
+access various information about the node, and we can pattern-match a
+node with a powerful syntax. Finally, we explain how to work with
+source files that mixes multiple languages. The following sections
+explain how to do each of the tasks in detail.
+
+@menu
+* Language Definitions:: Loading tree-sitter language definitions.
+* Using Parser:: Introduction to parsers.
+* Retrieving Node:: Retrieving node from syntax tree.
+* Accessing Node:: Accessing node information.
+* Pattern Matching:: Pattern matching with query patterns.
+* Multiple Languages:: Parse text written in multiple languages.
+* Tree-sitter C API:: Compare the C API and the ELisp API.
+@end menu
+
+@node Language Definitions
+@section Tree-sitter Language Definitions
+
+@heading Loading a language definition
+
+Tree-sitter relies on language definitions to parse text in that
+language. In Emacs, A language definition is represented by a symbol
+@code{tree-sitter-<language>}. For example, C language definition is
+represented as @code{tree-sitter-c}, and @code{tree-sitter-c} can be
+passed to tree-sitter functions as the @var{language} argument.
+
+@vindex tree-sitter-load-language-error
+Tree-sitter language definitions are distributed as dynamic
+libraries. In order to use a language definition in Emacs, you need to
+make sure that the dynamic library is installed on the system, either
+in standard locations or in @code{LD_LIBRARY_PATH} (on some systems,
+it is @code{DYLD_LIBRARY_PATH}). If Emacs cannot find the library or
+has problem loading it, Emacs signals
+@var{tree-sitter-load-language-error}. The signal data is a list of
+specific error messages.
+
+@defun tree-sitter-language-available-p language
+This function checks whether the dynamic library for @var{language} is
+present on the system, and return non-nil if it is.
+@end defun
+
+@vindex tree-sitter-load-name-override-list
+By convention, the dynamic library for @code{tree-sitter-<language>}
+is @code{libtree-sitter-<language>.@var{ext}}, where @var{ext} is the
+system-specific extension for dynamic libraries. Also by convention,
+the function provided by that library is named
+@code{tree_sitter_<language>}. If a language definition doesn't
+follow this convention, you should add an entry
+
+@example
+(@var{language-symbol} @var{library-base-name} @var{function-name})
+@end example
+
+to @var{tree-sitter-load-name-override-list}, where
+@var{library-base-name} is the base filename for the dynamic library
+(conventionally @code{libtree-sitter-<language>}), and
+@var{function-name} is the function provided by the library
+(conventionally @code{tree_sitter_<language>}). For example,
+
+@example
+(tree-sitter-cool-lang "libtree-sitter-coool" "tree_sitter_coool")
+@end example
+
+for a language too cool to abide by the rules.
+
+@heading Concrete syntax tree
+
+A syntax tree is what a language definition defines (more or less) and
+what a parser generates. In a syntax tree, each node represents a
+piece of text, and is connected to each other by a parent-child
+relationship. For example, if the source text is
+
+@example
+1 + 2
+@end example
+
+@noindent
+its syntax tree could be
+
+@example
+@group
+ +--------------+
+ | root "1 + 2" |
+ +--------------+
+ |
+ +--------------------------------+
+ | expression "1 + 2" |
+ +--------------------------------+
+ | | |
++------------+ +--------------+ +------------+
+| number "1" | | operator "+" | | number "2" |
++------------+ +--------------+ +------------+
+@end group
+@end example
+
+We can also represent it in s-expression:
+
+@example
+(root (expression (number) (operator) (number)))
+@end example
+
+@subheading Node types
+
+@cindex tree-sitter node type
+@anchor{tree-sitter node type}
+@cindex tree-sitter named node
+@anchor{tree-sitter named node}
+@cindex tree-sitter anonymous node
+Names like @code{root}, @code{expression}, @code{number},
+@code{operator} are nodes' @dfn{type}. However, not all nodes in a
+syntax tree have a type. Nodes that don't are @dfn{anonymous nodes},
+and nodes with a type are @dfn{named nodes}. Anonymous nodes are
+tokens with fixed spellings, including punctuation characters like
+bracket @samp{]}, and keywords like @code{return}.
+
+@subheading Field names
+
+@cindex tree-sitter node field name
+@anchor{tree-sitter node field name} To make the syntax tree easier to
+analyze, many language definitions assign @dfn{field names} to child
+nodes. For example, a @code{function_definition} node could have a
+@code{declarator} and a @code{body}:
+
+@example
+@group
+(function_definition
+ declarator: (declaration)
+ body: (compound_statement))
+@end group
+@end example
+
+@deffn Command tree-sitter-inspect-mode
+This minor mode displays the node that @emph{starts} at point in
+mode-line. The mode-line will display
+
+@example
+@var{parent} @var{field-name}: (@var{child} (@var{grand-child} (...)))
+@end example
+
+@var{child}, @var{grand-child}, and @var{grand-grand-child}, etc, are
+nodes that have their beginning at point. And @var{parent} is the
+parent of @var{child}.
+
+If there is no node that starts at point, i.e., point is in the middle
+of a node, then the mode-line only displays the smallest node that
+spans point, and its immediate parent.
+
+This minor mode doesn't create parsers on its own. It simply uses the
+first parser in @var{tree-sitter-parser-list} (@pxref{Using Parser}).
+@end deffn
+
+@heading Reading the grammar definition
+
+Authors of language definitions define the @dfn{grammar} of a
+language, and this grammar determines how does a parser construct a
+concrete syntax tree out of the text. In order to used the syntax
+tree effectively, we need to read the @dfn{grammar file}.
+
+The grammar file is usually @code{grammar.js} in a language
+definition’s project repository. The link to a language definition’s
+home page can be found in tree-sitter’s homepage
+(@uref{https://tree-sitter.github.io/tree-sitter}).
+
+The grammar is written in JavaScript syntax. For example, the rule
+matching a @code{function_definition} node looks like
+
+@example
+@group
+function_definition: $ => seq(
+ $.declaration_specifiers,
+ field('declarator', $.declaration),
+ field('body', $.compound_statement)
+)
+@end group
+@end example
+
+The rule is represented by a function that takes a single argument
+@var{$}, representing the whole grammar. The function itself is
+constructed by other functions: the @code{seq} function puts together a
+sequence of children; the @code{field} function annotates a child with
+a field name. If we write the above definition in BNF syntax, it
+would look like
+
+@example
+@group
+function_definition :=
+ <declaration_specifiers> <declaration> <compound_statement>
+@end group
+@end example
+
+@noindent
+and the node returned by the parser would look like
+
+@example
+@group
+(function_definition
+ (declaration_specifier)
+ declarator: (declaration)
+ body: (compound_statement))
+@end group
+@end example
+
+Below is a list of functions that one will see in a grammar
+definition. Each function takes other rules as arguments and returns
+a new rule.
+
+@itemize @bullet
+@item
+@code{seq(rule1, rule2, ...)} matches each rule one after another.
+
+@item
+@code{choice(rule1, rule2, ...)} matches one of the rules in its
+arguments.
+
+@item
+@code{repeat(rule)} matches @var{rule} for @emph{zero or more} times.
+This is like the @samp{*} operator in regular expressions.
+
+@item
+@code{repeat1(rule)} matches @var{rule} for @emph{one or more} times.
+This is like the @samp{+} operator in regular expressions.
+
+@item
+@code{optional(rule)} matches @var{rule} for @emph{zero or one} time.
+This is like the @samp{?} operator in regular expressions.
+
+@item
+@code{field(name, rule)} assigns field name @var{name} to the child
+node matched by @var{rule}.
+
+@item
+@code{alias(rule, alias)} makes nodes matched by @var{rule} appear as
+@var{alias} in the syntax tree generated by the parser. For example,
+
+@example
+alias(preprocessor_call_exp, call_expression)
+@end example
+
+makes any node matched by @code{preprocessor_call_exp} to appear as
+@code{call_expression}.
+@end itemize
+
+Below are grammar functions less interesting for a reader of a
+language definition.
+
+@itemize
+@item
+@code{token(rule)} marks @var{rule} to produce a single leaf node.
+That is, instead of generating a parent node with individual child
+nodes under it, everything is combined into a single leaf node.
+
+@item
+Normally, grammar rules ignore preceding whitespaces,
+@code{token.immediate(rule)} changes @var{rule} to match only when
+there is no preceding whitespaces.
+
+@item
+@code{prec(n, rule)} gives @var{rule} a level @var{n} precedence.
+
+@item
+@code{prec.left([n,] rule)} marks @var{rule} as left-associative,
+optionally with level @var{n}.
+
+@item
+@code{prec.right([n,] rule)} marks @var{rule} as right-associative,
+optionally with level @var{n}.
+
+@item
+@code{prec.dynamic(n, rule)} is like @code{prec}, but the precedence
+is applied at runtime instead.
+@end itemize
+
+The tree-sitter project talks about writing a grammar in more detail:
+@uref{https://tree-sitter.github.io/tree-sitter/creating-parsers}.
+Read especially ``The Grammar DSL'' section.
+
+@node Using Parser
+@section Using Tree-sitter Parser
+@cindex Tree-sitter parser
+
+This section described how to create and configure a tree-sitter
+parser. In Emacs, each tree-sitter parser is associated with a
+buffer. As we edit the buffer, the associated parser is automatically
+kept up-to-date.
+
+@defvar tree-sitter-disabled-modes
+Before creating a parser, it is perhaps good to check whether we
+should use tree-sitter at all. Sometimes a user don't want to use
+tree-sitter features for a major mode. To turn-off tree-sitter for a
+mode, they add that mode to this variable.
+@end defvar
+
+@defvar tree-sitter-maximum-size
+If users want to turn off tree-sitter for buffers larger than a
+particular size (because tree-sitter consumes memory ~10 times the
+buffer size for storing the syntax tree), they set this variable to
+that size.
+@end defvar
+
+@defun tree-sitter-should-enable-p &optional mode
+This function returns non-nil if @var{mode} (default to the current
+major mode) should activate tree-sitter features. The result depends
+on the value of @var{tree-sitter-disabled-modes} and
+@var{tree-sitter-maximum-size} described above. The result also
+depends on, of course, the result of @code{tree-sitter-avaliabe-p}.
+
+Writer of major modes or other packages are responsible for calling
+this function and determine whether to activate tree-sitter features.
+@end defun
+
+
+@cindex Creating tree-sitter parsers
+To create a parser, we provide a buffer to parse and the language to
+use (@pxref{Language Definitions}). Emacs provides several creation
+functions for different use cases.
+
+@defun tree-sitter-get-parser-create language
+This function is the most convenient one. It gives you a parser that
+recognizes @var{language} for the current buffer. The function
+checks if there already exists a parser suiting the need, and only
+creates a new one when it can't find one.
+
+@example
+@group
+;; Create a parser for C programming language.
+(tree-sitter-get-parser-create 'tree-sitter-c)
+ @c @result{} #<tree-sitter-parser for tree-sitter-c in *scratch*>
+@end group
+@end example
+@end defun
+
+@defun tree-sitter-get-parser language
+This function is like @code{tree-sitter-get-parser-create}, but it
+always creates a new parser.
+@end defun
+
+@defun tree-sitter-parser-create buffer language
+This function is the most primitive, requiring both the buffer to
+associate to, and the language to use. If @var{buffer} is nil, the
+current buffer is used.
+@end defun
+
+Given a parser, we can query information about it:
+
+@defun tree-sitter-parser-buffer parser
+Returns the buffer associated with @var{parser}.
+@end defun
+
+@defun tree-sitter-parser-language parser
+Returns the language that @var{parser} uses.
+@end defun
+
+@defun tree-sitter-parser-p object
+Checks if @var{object} is a tree-sitter parser. Return non-nil if it
+is, return nil otherwise.
+@end defun
+
+There is no need to explicitly parse a buffer, because parsing is done
+automatically and lazily. A parser only parses when we query for a
+node in its syntax tree. Therefore, when a parser is first created,
+it doesn't parse the buffer; instead, it waits until we query for a
+node for the first time. Similarly, when some change is made in the
+buffer, a parser doesn't re-parse immediately and only records some
+necessary information to later re-parse when necessary.
+
+@vindex tree-sitter-buffer-too-large
+When a parser do parse, it checks for the size of the buffer.
+Tree-sitter can only handle buffer no larger than about 4GB. If the
+size exceeds that, Emacs signals @var{tree-sitter-buffer-too-large}
+with signal data being the buffer size.
+
+@vindex tree-sitter-parser-list
+Once a parser is created, Emacs automatically adds it to the
+buffer-local variable @var{tree-sitter-parser-list}. Every time a
+change is made to the buffer, Emacs updates parsers in this list so
+they can update their syntax tree incrementally. Therefore, one must
+not remove parsers from this list and put the parser back in: if any
+change is made when that parser is absent, the parser will be
+permanently out-of-sync with the buffer content, and shouldn't be used
+anymore.
+
+@cindex tree-sitter narrowing
+@anchor{tree-sitter narrowing} Normally, a parser ``sees'' the whole
+buffer, but when the buffer is narrowed (@pxref{Narrowing}), the
+parser will only see the visible region. As far as the parser can
+tell, the hidden region is deleted. And when the buffer is later
+widened, the parser thinks text is inserted in the beginning and in
+the end. Although parsers respect narrowing, narrowing shouldn't be
+the mean to handle a multi-language buffer; instead, set the ranges in
+which a parser should operate in. @xref{Multiple Languages}.
+
+Because a parser parses lazily, when we narrow the buffer, the parser
+doesn't act immediately; as long as we don't query for a node while
+the buffer is narrowed, narrowing does not affect the parser.
+
+@cindex tree-sitter parse string
+@defun tree-sitter-parse-string string language
+Besides creating a parser for a buffer, we can also just parse a
+string. Unlike a buffer, parsing a string is a one-time deal, and
+there is no way to update the result.
+
+This function parses @var{string} with @var{language}, and returns the
+root node of the generated syntax tree.
+@end defun
+
+@node Retrieving Node
+@section Retrieving Node
+
+@cindex tree-sitter find node
+@cindex tree-sitter get node
+There are two ways to retrieve a node: directly from the syntax tree,
+or by traveling from other nodes. But before we continue, lets go
+over some conventions of tree-sitter functions.
+
+We talk about a node being ``smaller'' or ``larger'', and ``lower'' or
+``higher''. A smaller and lower node is lower in the syntax tree and
+therefore spans a smaller piece of text; a larger and higher node is
+higher up in the syntax tree, containing many smaller nodes as its
+children, and therefore spans a larger piece of text.
+
+When a function cannot find a node, it returns nil. And for the
+convenience for function chaining, all the functions that take a node
+as argument and returns a node accept the node to be nil; in that
+case, the function just returns nil.
+
+@vindex tree-sitter-node-outdated
+Nodes are not automatically updated when the associated buffer is
+modified. In fact, there is no way to update a node once it is
+retrieved. It is best to use a node and throw it away and not save
+it. A node is @dfn{outdated} if the buffer has changed since the node
+is retrieved. Using an outdated node throws
+@var{tree-sitter-node-outdated} error.
+
+@heading Retrieving node from syntax tree
+
+@defun tree-sitter-node-at beg &optional end parser-or-lang named
+This function returns the @emph{smallest} node that covers the span
+from @var{beg} to @var{end}. In other words, the start of the node
+@code{<=} @var{beg}, and the end of the node @code{>=} @var{end}. If
+@var{end} is omitted, it defaults to the value of @var{beg}.
+
+When @var{parser-or-lang} is nil, this function uses the first parser
+in @var{tree-sitter-parser-list} in the current buffer. If
+@var{parser-or-lang} is a parser object, it use that parser; if
+@var{parser-or-lang} is a language, it finds the first parser using
+that language in @var{tree-sitter-parser-list} and use that.
+
+If @var{named} is non-nil, this function looks for a named node
+instead (@pxref{tree-sitter named node, named node}).
+
+@example
+@group
+;; Find the node at point in a C parser's syntax tree.
+(tree-sitter-node-at (point) (point) 'tree-sitter-c)
+ @c @result{} #<tree-sitter-node from 1 to 4 in *scratch*>
+@end group
+@end example
+@end defun
+
+@defun tree-sitter-parser-root-node parser
+This function returns the root node of the syntax tree generated by
+@var{parser}.
+@end defun
+
+@defun tree-sitter-buffer-root-node &optional language
+This function finds the first parser that uses @var{language} in
+@var{tree-sitter-parser-list} in the current buffer, and returns the
+root node of that buffer. If it cannot find an appropriate parser, it
+returns nil.
+@end defun
+
+Once we have a node, we can retrieve other nodes from it, or query for
+information about this node.
+
+@heading Retrieving node from other nodes
+
+@subheading By kinship
+
+@defun tree-sitter-node-parent node
+This function returns the immediate parent of @var{node}.
+@end defun
+
+@defun tree-sitter-node-child node n &optional named
+This function returns the @var{n}'th child of @var{node}. If
+@var{named} is non-nil, then it only counts named nodes
+(@pxref{tree-sitter named node, named node}). For example, in a node
+that represents a string: @code{"text"}, there are three children
+nodes: the opening quote @code{"}, the string content @code{text}, and
+the enclosing quote @code{"}. Among these nodes, the first child is
+the opening quote @code{"}, the first named child is the string
+content @code{text}.
+@end defun
+
+@defun tree-sitter-node-children node &optional named
+This function returns all of @var{node}'s children in a list. If
+@var{named} is non-nil, then it only retrieves named nodes
+(@pxref{tree-sitter named node, named node}).
+@end defun
+
+@defun tree-sitter-next-sibling node &optional named
+This function finds the next sibling of @var{node}. If @var{named} is
+non-nil, it finds the next named sibling (@pxref{tree-sitter named
+node, named node}).
+@end defun
+
+@defun tree-sitter-prev-sibling node &optional named
+This function finds the previous sibling of @var{node}. If
+@var{named} is non-nil, it finds the previous named sibling
+(@pxref{tree-sitter named node, named node}).
+@end defun
+
+@subheading By field name
+
+To make the syntax tree easier to analyze, many language definitions
+assign @dfn{field names} to child nodes (@pxref{tree-sitter node field
+name, field name}). For example, a @code{function_definition} node
+could have a @code{declarator} and a @code{body}.
+
+@defun tree-sitter-child-by-field-name node field-name
+This function finds the child of @var{node} that has @var{field-name}
+as its field name.
+
+@example
+@group
+;; Get the child that has "body" as its field name.
+(tree-sitter-child-by-field-name node "body")
+ @c @result{} #<tree-sitter-node from 3 to 11 in *scratch*>
+@end group
+@end example
+@end defun
+
+@subheading By position
+
+@defun tree-sitter-first-child-for-pos node pos &optional named
+This function finds the first child of @var{node} that extends beyond
+@var{pos}. ``Extend beyond'' means the end of the child node
+@code{>=} @var{pos}. This function only looks for immediate children of
+@var{node}, and doesn't look in its grand children. If @var{named} is
+non-nil, it only looks for named child (@pxref{tree-sitter named node,
+named node}).
+@end defun
+
+@defun tree-sitter-node-descendant-for-range node beg end &optional named
+This function finds the @emph{smallest} (grand)child of @var{node}
+that spans the range from @var{beg} to @var{end}. It is similar to
+@code{tree-sitter-node-at}. If @var{named} is non-nil, it only looks
+for named child (@pxref{tree-sitter named node, named node}).
+@end defun
+
+@heading More convenient functions
+
+@defun tree-sitter-filter-child node pred &optional named
+This function finds children of @var{node} that satisfies @var{pred}.
+
+Function @var{pred} takes the child node as the argument and should
+return non-nil to indicated keeping the child. If @var{named}
+non-nil, this function only searches for named nodes."
+@end defun
+
+@defun tree-sitter-parent-until node pred
+This function repeatedly finds the parent of @var{node}, and returns
+the parent if it satisfies @var{pred} (which takes the parent as the
+argument). If no parent satisfies @var{pred}, this function returns
+nil.
+@end defun
+
+@defun tree-sitter-parent-while
+This function repeatedly finds the parent of @var{node}, and keeps
+doing so as long as the parent satisfies @var{pred} (which takes the
+parent as the single argument). I.e., this function returns the
+farthest parent that still satisfies @var{pred}.
+@end defun
+
+@node Accessing Node
+@section Accessing Node Information
+
+Before going further, make sure you have read the basic conventions
+about tree-sitter nodes in the previous node.
+
+@heading Basic information
+
+Every node is associated with a parser, and that parser is associated
+with a buffer. The following functions let you retrieve them.
+
+@defun tree-sitter-node-parser node
+This function returns @var{node}'s associated parser.
+@end defun
+
+@defun tree-sitter-node-buffer node
+This function returns @var{node}'s parser's associated buffer.
+@end defun
+
+@defun tree-sitter-node-language node
+This function returns @var{node}'s parser's associated language.
+@end defun
+
+Each node represents a piece of text in the buffer. Functions below
+finds relevant information about that text.
+
+@defun tree-sitter-node-start node
+Return the start position of @var{node}.
+@end defun
+
+@defun tree-sitter-node-end node
+Return the end position of @var{node}.
+@end defun
+
+@defun tree-sitter-node-text node &optional object
+Returns the buffer text that @var{node} represents. (If @var{node} is
+retrieved from parsing a string, it will be the text from that
+string.)
+@end defun
+
+Here are some basic checks on tree-sitter nodes.
+
+@defun tree-sitter-node-p object
+Checks if @var{object} is a tree-sitter syntax node.
+@end defun
+
+@defun tree-sitter-node-eq node1 node2
+Checks if @var{node1} and @var{node2} are the same node in a syntax
+tree.
+@end defun
+
+@heading Property information
+
+In general, nodes in a concrete syntax tree fall into two categories:
+@dfn{named nodes} and @dfn{anonymous nodes}. Whether a node is named
+or anonymous is determined by the language definition
+(@pxref{tree-sitter named node, named node}).
+
+@cindex tree-sitter missing node
+Apart from being named/anonymous, a node can have other properties. A
+node can be ``missing'': missing nodes are inserted by the parser in
+order to recover from certain kinds of syntax errors, i.e., something
+should probably be there according to the grammar, but not there.
+
+@cindex tree-sitter extra node
+A node can be ``extra'': extra nodes represent things like comments,
+which can appear anywhere in the text.
+
+@cindex tree-sitter node that has changes
+A node ``has changes'' if the buffer changed since when the node is
+retrieved. In this case, the node's start and end position would be
+off and we better throw it away and retrieve a new one.
+
+@cindex tree-sitter node that has error
+A node ``has error'' if the text it spans contains a syntax error. It
+can be the node itself has an error, or one of its (grand)children has
+an error.
+
+@defun tree-sitter-node-check node property
+This function checks if @var{node} has @var{property}. @var{property}
+can be @code{'named}, @code{'missing}, @code{'extra},
+@code{'has-changes}, or @code{'has-error}.
+@end defun
+
+Named nodes have ``types'' (@pxref{tree-sitter node type, node type}).
+For example, a named node can be a @code{string_literal} node, where
+@code{string_literal} is its type.
+
+@defun tree-sitter-node-type node
+Return @var{node}'s type as a string.
+@end defun
+
+@heading Information as a child or parent
+
+@defun tree-sitter-node-index node &optional named
+This function returns the index of @var{node} as a child node of its
+parent. If @var{named} is non-nil, it only count named nodes
+(@pxref{tree-sitter named node, named node}).
+@end defun
+
+@defun tree-sitter-node-field-name node
+A child of a parent node could have a field name (@pxref{tree-sitter
+node field name, field name}). This function returns the field name
+of @var{node} as a child of its parent.
+@end defun
+
+@defun tree-sitter-node-field-name-for-child node n
+This is a more primitive function that returns the field name of the
+@var{n}'th child of @var{node}.
+@end defun
+
+@defun tree-sitter-child-count node &optional named
+This function finds the number of children of @var{node}. If
+@var{named} is non-nil, it only counts named child (@pxref{tree-sitter
+named node, named node}).
+@end defun
+
+@node Pattern Matching
+@section Pattern Matching Tree-sitter Nodes
+
+Tree-sitter let us pattern match with a small declarative language.
+Pattern matching consists of two steps: first tree-sitter matches a
+@dfn{pattern} against nodes in the syntax tree, then it @dfn{captures}
+specific nodes in that pattern and returns the captured nodes.
+
+We describe first how to write the most basic query pattern and how to
+capture nodes in a pattern, then the pattern-match function, finally
+more advanced pattern syntax.
+
+@heading Basic query syntax
+
+@cindex Tree-sitter query syntax
+@cindex Tree-sitter query pattern
+A @dfn{query} consists of multiple @dfn{patterns}, each pattern is an
+s-expression that matches a certain node in the syntax node. A
+pattern has the following shape:
+
+@example
+(@var{type} @var{child}...)
+@end example
+
+@noindent
+For example, a pattern that matches a @code{binary_expression} node that
+contains @code{number_literal} child nodes would look like
+
+@example
+(binary_expression (number_literal))
+@end example
+
+To @dfn{capture} a node in the query pattern above, append
+@code{@@capture-name} after the node pattern you want to capture. For
+example,
+
+@example
+(binary_expression (number_literal) @@number-in-exp)
+@end example
+
+@noindent
+captures @code{number_literal} nodes that are inside a
+@code{binary_expression} node with capture name @code{number-in-exp}.
+
+We can capture the @code{binary_expression} node too, with capture
+name @code{biexp}:
+
+@example
+(binary_expression
+ (number_literal) @@number-in-exp) @@biexp
+@end example
+
+@heading Query function
+
+Now we can introduce the query functions.
+
+@defun tree-sitter-query-capture node query &optional beg end
+This function matches patterns in @var{query} in @var{node}.
+Argument @var{query} can be a either string or a s-expression. For
+now, we focus on the string syntax; s-expression syntax is described
+at the end of the section.
+
+The function returns all captured nodes in a list of
+@code{(@var{capture_name} . @var{node})}. If @var{beg} and @var{end}
+are both non-nil, it only pattern matches nodes in that range.
+
+@vindex tree-sitter-query-error
+This function raise a @var{tree-sitter-query-error} if @var{query} is
+malformed. The signal data contains a description of the specific
+error.
+@end defun
+
+@defun tree-sitter-query-in source query &optional beg end
+This function matches patterns in @var{query} in @var{source}, and
+returns all captured nodes in a list of @code{(@var{capture_name}
+. @var{node})}. If @var{beg} and @var{end} are both non-nil, it only
+pattern match nodes in that range.
+
+Argument @var{source} designates a node, it can be a language symbol,
+a parser, or simply a node. If a language symbol, @var{source}
+represents the root node of the first parser for that language in the
+current buffer; if a parser, @var{source} represents the root node of
+that parser.
+
+This function also raises @var{tree-sitter-query-error}.
+@end defun
+
+For example, suppose @var{node}'s content is @code{1 + 2}, and
+@var{query} is
+
+@example
+@group
+(setq query
+ "(binary_expression
+ (number_literal) @@number-in-exp) @@biexp")
+@end group
+@end example
+
+@noindent
+Querying that query would return
+
+@example
+@group
+(tree-sitter-query-capture node query)
+ @result{} ((biexp . @var{<node for "1 + 2">})
+ (number-in-exp . @var{<node for "1">})
+ (number-in-exp . @var{<node for "2">}))
+@end group
+@end example
+
+As we mentioned earlier, a @var{query} could contain multiple
+patterns. For example, it could have two top-level patterns:
+
+@example
+@group
+(setq query
+ "(binary_expression) @@biexp
+ (number_literal) @@number @@biexp")
+@end group
+@end example
+
+@defun tree-sitter-query-string string query language
+This function parses @var{string} with @var{language}, pattern matches
+its root node with @var{query}, and returns the result.
+@end defun
+
+@heading More query syntax
+
+Besides node type and capture, tree-sitter's query syntax can express
+anonymous node, field name, wildcard, quantification, grouping,
+alternation, anchor, and predicate.
+
+@subheading Anonymous node
+
+An anonymous node is written verbatim, surrounded by quotes. A
+pattern matching (and capturing) keyword @code{return} would be
+
+@example
+"return" @@keyword
+@end example
+
+@subheading Wild card
+
+In a query pattern, @samp{(_)} matches any named node, and @samp{_}
+matches any named and anonymous node. For example, to capture any
+named child of a @code{binary_expression} node, the pattern would be
+
+@example
+(binary_expression (_) @@in_biexp)
+@end example
+
+@subheading Field name
+
+We can capture child nodes that has specific field names:
+
+@example
+@group
+(function_definition
+ declarator: (_) @@func-declarator
+ body: (_) @@func-body)
+@end group
+@end example
+
+We can also capture a node that doesn't have certain field, say, a
+@code{function_definition} without a @code{body} field.
+
+@example
+(function_definition !body) @@func-no-body
+@end example
+
+@subheading Quantify node
+
+Tree-sitter recognizes quantification operators @samp{*}, @samp{+} and
+@samp{?}. Their meanings are the same as in regular expressions:
+@samp{*} matches the preceding pattern zero or more times, @samp{+}
+matches one or more times, and @samp{?} matches zero or one time.
+
+For example, this pattern matches @code{type_declaration} nodes
+that has @emph{zero or more} @code{long} keyword.
+
+@example
+(type_declaration "long"* @@long-in-type)
+@end example
+
+@noindent
+And this pattern matches a type declaration that has zero or one
+@code{long} keyword:
+
+@example
+(type_declaration "long"?) @@type-decl
+@end example
+
+@subheading Grouping
+
+Similar to groups in regular expression, we can bundle patterns into a
+group and apply quantification operators to it. For example, to
+express a comma separated list of identifiers, one could write
+
+@example
+(identifier) ("," (identifier))*
+@end example
+
+@subheading Alternation
+
+Again, similar to regular expressions, we can express ``match anyone
+from this group of patterns'' in the query pattern. The syntax is a
+list of patterns enclosed in square brackets. For example, to capture
+some keywords in C, the query pattern would be
+
+@example
+@group
+[
+ "return"
+ "break"
+ "if"
+ "else"
+] @@keyword
+@end group
+@end example
+
+@subheading Anchor
+
+The anchor operator @samp{.} can be used to enforce juxtaposition,
+i.e., to enforce two things to be directly next to each other. The
+two ``things'' can be two nodes, or a child and the end of its parent.
+For example, to capture the first child, the last child, or two
+adjacent children:
+
+@example
+@group
+;; Anchor the child with the end of its parent.
+(compound_expression (_) @@last-child .)
+
+;; Anchor the child with the beginning of its parent.
+(compound_expression . (_) @@first-child)
+
+;; Anchor two adjacent children.
+(compound_expression
+ (_) @@prev-child
+ .
+ (_) @@next-child)
+@end group
+@end example
+
+Note that the enforcement of juxtaposition ignores any anonymous
+nodes.
+
+@subheading Predicate
+
+We can add predicate constraints to a pattern. For example, if we use
+the following query pattern
+
+@example
+@group
+(
+ (array . (_) @@first (_) @@last .)
+ (#equal @@first @@last)
+)
+@end group
+@end example
+
+Then tree-sitter only matches arrays where the first element equals to
+the last element. To attach a predicate to a pattern, we need to
+group then together. A predicate always starts with a @samp{#}.
+Currently there are two predicates, @code{#equal} and @code{#match}.
+
+@deffn Predicate equal arg1 arg2
+Matches if @var{arg1} equals to @var{arg2}. Arguments can be either a
+string or a capture name. Capture names represent the text that the
+captured node spans in the buffer.
+@end deffn
+
+@deffn Predicate match regexp capture-name
+Matches if the text that @var{capture-name}’s node spans in the buffer
+matches regular expression @var{regexp}. Matching is case-sensitive.
+@end deffn
+
+Note that a predicate can only refer to capture names appeared in the
+same pattern. Indeed, it makes little sense to refer to capture names
+in other patterns anyway.
+
+@heading S-expression patterns
+
+Besides strings, Emacs provides a s-expression based syntax for query
+patterns. It largely resembles the string-based syntax. For example,
+the following pattern
+
+@example
+@group
+(tree-sitter-query-capture
+ node "(addition_expression
+ left: (_) @@left
+ \"+\" @@plus-sign
+ right: (_) @@right) @@addition
+
+ [\"return\" \"break\"] @@keyword")
+@end group
+@end example
+
+@noindent
+is equivalent to
+
+@example
+@group
+(tree-sitter-query-capture
+ node '((addition_expression
+ left: (_) @@left
+ "+" @@plus-sign
+ right: (_) @@right) @@addition
+
+ ["return" "break"] @@keyword))
+@end group
+@end example
+
+Most pattern syntax can be written directly as strange but
+never-the-less valid s-expressions. Only a few of them needs
+modification:
+
+@itemize
+@item
+Anchor @samp{.} is written as @code{:anchor}.
+@item
+@samp{?} is written as @samp{:?}.
+@item
+@samp{*} is written as @samp{:*}.
+@item
+@samp{+} is written as @samp{:+}.
+@item
+@code{#equal} is written as @code{:equal}. In general, predicates
+change their @samp{#} to @samp{:}.
+@end itemize
+
+For example,
+
+@example
+@group
+"(
+ (compound_expression . (_) @@first (_)* @@rest)
+ (#match \"love\" @@first)
+ )"
+@end group
+@end example
+
+is written in s-expression as
+
+@example
+@group
+'((
+ (compound_expression :anchor (_) @@first (_) :* @@rest)
+ (:match "love" @@first)
+ ))
+@end group
+@end example
+
+@defun tree-sitter-expand-query query
+This function expands the s-expression @var{query} into a string
+query. It is usually a good idea to expand the s-expression patterns
+into strings for font-lock queries since they are called repeatedly.
+@end defun
+
+Tree-sitter project's documentation about pattern-matching can be
+found at
+@uref{https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries}.
+
+@node Multiple Languages
+@section Parsing Text in Multiple Languages
+
+Sometimes, the source of a programming language could contain sources
+of other languages, HTML + CSS + JavaScript is one example. In that
+case, we need to assign individual parsers to text segments written in
+different languages. Traditionally this is achieved by using
+narrowing. While tree-sitter works with narrowing (@pxref{tree-sitter
+narrowing, narrowing}), the recommended way is to set ranges in which
+a parser will operate.
+
+@defun tree-sitter-parser-set-included-ranges parser ranges
+This function sets the range of @var{parser} to @var{ranges}. Then
+@var{parser} will only read the text covered in each range. Each
+range in @var{ranges} is a list of cons @code{(@var{beg}
+. @var{end})}.
+
+Each range in @var{ranges} must come in order and not overlap. That
+is, in pseudo code:
+
+@example
+@group
+(cl-loop for idx from 1 to (1- (length ranges))
+ for prev = (nth (1- idx) ranges)
+ for next = (nth idx ranges)
+ should (<= (car prev) (cdr prev)
+ (car next) (cdr next)))
+@end group
+@end example
+
+@vindex tree-sitter-range-invalid
+If @var{ranges} violates this constraint, or something else went
+wrong, this function signals a @var{tree-sitter-range-invalid}. The
+signal data contains a specific error message and the ranges we are
+trying to set.
+
+This function can also be used for disabling ranges. If @var{ranges}
+is nil, the parser is set to parse the whole buffer.
+
+Example:
+
+@example
+@group
+(tree-sitter-parser-set-included-ranges
+ parser '((1 . 9) (16 . 24) (24 . 25)))
+@end group
+@end example
+@end defun
+
+@defun tree-sitter-parser-included-ranges parser
+This function returns the ranges set for @var{parser}. The return
+value is the same as the @var{ranges} argument of
+@code{tree-sitter-parser-included-ranges}: a list of cons
+@code{(@var{beg} . @var{end})}. And if @var{parser} doesn't have any
+ranges, the return value is nil.
+
+@example
+@group
+(tree-sitter-parser-included-ranges parser)
+ @result{} ((1 . 9) (16 . 24) (24 . 25))
+@end group
+@end example
+@end defun
+
+@defun tree-sitter-set-ranges parser-or-lang ranges
+Like @code{tree-sitter-parser-set-included-ranges}, this function sets
+the ranges of @var{parser-or-lang} to @var{ranges}. Conveniently,
+@var{parser-or-lang} could be either a parser or a language. If it is
+a language, this function looks for the first parser in
+@var{tree-sitter-parser-list} for that language in the current buffer,
+and set range for it.
+@end defun
+
+@defun tree-sitter-get-ranges parser-or-lang
+This function returns the ranges of @var{parser-or-lang}, like
+@code{tree-sitter-parser-included-ranges}. And like
+@code{tree-sitter-set-ranges}, @var{parser-or-lang} can be a parser or
+a language symbol.
+@end defun
+
+@defun tree-sitter-query-range source pattern &optional beg end
+This function matches @var{source} with @var{pattern} and returns the
+ranges of captured nodes. The return value has the same shape of
+other functions: a list of @code{(@var{beg} . @var{end})}.
+
+For convenience, @var{source} can be a language symbol, a parser, or a
+node. If a language symbol, this function matches in the root node of
+the first parser using that language; if a parser, this function
+matches in the root node of that parser; if a node, this function
+matches in that node.
+
+Parameter @var{pattern} is the query pattern used to capture nodes
+(@pxref{Pattern Matching}). The capture names don't matter. Parameter
+@var{beg} and @var{end}, if both non-nil, limits the range in which
+this function queries.
+
+Like other query functions, this function raises an
+@var{tree-sitter-query-error} if @var{pattern} is malformed.
+@end defun
+
+@defun tree-sitter-language-at point
+This function tries to figure out which language is responsible for
+the text at @var{point}. It goes over each parser in
+@var{tree-sitter-parser-list} and see if that parser's range covers
+@var{point}.
+@end defun
+
+@defvar tree-sitter-range-functions
+A list of range functions. Font-locking and indenting code uses
+functions in this alist to set correct ranges for a language parser
+before using it.
+
+The signature of each function should be
+
+@example
+(@var{start} @var{end} &rest @var{_})
+@end example
+
+where @var{start} and @var{end} marks the region that is about to be
+used. A range function only need to (but not limited to) update
+ranges in that region.
+
+Each function in the list is called in-order.
+@end defvar
+
+@defun tree-sitter-update-ranges &optional start end
+This function is used by font-lock and indent to update ranges before
+using any parser. Each range function in
+@var{tree-sitter-range-functions} is called in-order. Arguments
+@var{start} and @var{end} are passed to each range function.
+@end defun
+
+@heading An example
+
+Normally, in a set of languages that can be mixed together, there is a
+major language and several embedded languages. The major language
+parses the whole document, and skips the embedded languages. Then the
+parser for the major language knows the ranges of the embedded
+languages. So we first parse the whole document with the major
+language’s parser, set ranges for the embedded languages, then parse
+the embedded languages.
+
+Suppose we want to parse a very simple document that mixes HTML, CSS
+and JavaScript:
+
+@example
+@group
+<html>
+ <script>1 + 2</script>
+ <style>body @{ color: "blue"; @}</style>
+</html>
+@end group
+@end example
+
+We first parse with HTML, then set ranges for CSS and JavaScript:
+
+@example
+@group
+;; Create parsers.
+(setq html (tree-sitter-get-parser-create 'tree-sitter-html))
+(setq css (tree-sitter-get-parser-create 'tree-sitter-css))
+(setq js (tree-sitter-get-parser-create 'tree-sitter-javascript))
+
+;; Set CSS ranges.
+(setq css-range
+ (tree-sitter-query-range
+ 'tree-sitter-html
+ "(style_element (raw_text) @@capture)"))
+(tree-sitter-parser-set-included-ranges css css-range)
+
+;; Set JavaScript ranges.
+(setq js-range
+ (tree-sitter-query-range
+ 'tree-sitter-html
+ "(script_element (raw_text) @@capture)"))
+(tree-sitter-parser-set-included-ranges js js-range)
+@end group
+@end example
+
+We use a query pattern @code{(style_element (raw_text) @@capture)} to
+find CSS nodes in the HTML parse tree. For how to write query
+patterns, @pxref{Pattern Matching}.
+
+@node Tree-sitter C API
+@section Tree-sitter C API Correspondence
+
+Emacs' tree-sitter integration doesn't expose every feature
+tree-sitter's C API provides. Missing features include:
+
+@itemize
+@item
+Creating a tree cursor and navigating the syntax tree with it.
+@item
+Setting timeout and cancellation flag for a parser.
+@item
+Setting the logger for a parser.
+@item
+Printing a DOT graph of the syntax tree to a file.
+@item
+Coping and modifying a syntax tree. (Emacs doesn't expose a tree
+object.)
+@item
+Using (row, column) coordinates as position.
+@item
+Updating a node with changes. (In Emacs, retrieve a new node instead
+of updating the existing one.)
+@item
+Querying statics of a language definition.
+@end itemize
+
+In addition, Emacs makes some changes to the C API to make the API more
+convenient and idiomatic:
+
+@itemize
+@item
+Instead of using byte positions, the ELisp API uses character
+positions.
+@item
+Null nodes are converted to nil.
+@end itemize
+
+Below is the correspondence between all C API functions and their
+ELisp counterparts. Sometimes one ELisp function corresponds to
+multiple C functions, and many C functions don't have an ELisp
+counterpart.
+
+@example
+ts_parser_new tree-sitter-parser-create
+ts_parser_delete
+ts_parser_set_language
+ts_parser_language tree-sitter-parser-language
+ts_parser_set_included_ranges tree-sitter-parser-set-included-ranges
+ts_parser_included_ranges tree-sitter-parser-included-ranges
+ts_parser_parse
+ts_parser_parse_string tree-sitter-parse-string
+ts_parser_parse_string_encoding
+ts_parser_reset
+ts_parser_set_timeout_micros
+ts_parser_timeout_micros
+ts_parser_set_cancellation_flag
+ts_parser_cancellation_flag
+ts_parser_set_logger
+ts_parser_logger
+ts_parser_print_dot_graphs
+ts_tree_copy
+ts_tree_delete
+ts_tree_root_node
+ts_tree_language
+ts_tree_edit
+ts_tree_get_changed_ranges
+ts_tree_print_dot_graph
+ts_node_type tree-sitter-node-type
+ts_node_symbol
+ts_node_start_byte tree-sitter-node-start
+ts_node_start_point
+ts_node_end_byte tree-sitter-node-end
+ts_node_end_point
+ts_node_string tree-sitter-node-string
+ts_node_is_null
+ts_node_is_named tree-sitter-node-check
+ts_node_is_missing tree-sitter-node-check
+ts_node_is_extra tree-sitter-node-check
+ts_node_has_changes tree-sitter-node-check
+ts_node_has_error tree-sitter-node-check
+ts_node_parent tree-sitter-node-parent
+ts_node_child tree-sitter-node-child
+ts_node_field_name_for_child tree-sitter-node-field-name-for-child
+ts_node_child_count tree-sitter-node-child-count
+ts_node_named_child tree-sitter-node-child
+ts_node_named_child_count tree-sitter-node-child-count
+ts_node_child_by_field_name tree-sitter-node-by-field-name
+ts_node_child_by_field_id
+ts_node_next_sibling tree-sitter-next-sibling
+ts_node_prev_sibling tree-sitter-prev-sibling
+ts_node_next_named_sibling tree-sitter-next-sibling
+ts_node_prev_named_sibling tree-sitter-prev-sibling
+ts_node_first_child_for_byte tree-sitter-first-child-for-pos
+ts_node_first_named_child_for_byte tree-sitter-first-child-for-pos
+ts_node_descendant_for_byte_range tree-sitter-descendant-for-range
+ts_node_descendant_for_point_range
+ts_node_named_descendant_for_byte_range tree-sitter-descendant-for-range
+ts_node_named_descendant_for_point_range
+ts_node_edit
+ts_node_eq tree-sitter-node-eq
+ts_tree_cursor_new
+ts_tree_cursor_delete
+ts_tree_cursor_reset
+ts_tree_cursor_current_node
+ts_tree_cursor_current_field_name
+ts_tree_cursor_current_field_id
+ts_tree_cursor_goto_parent
+ts_tree_cursor_goto_next_sibling
+ts_tree_cursor_goto_first_child
+ts_tree_cursor_goto_first_child_for_byte
+ts_tree_cursor_goto_first_child_for_point
+ts_tree_cursor_copy
+ts_query_new
+ts_query_delete
+ts_query_pattern_count
+ts_query_capture_count
+ts_query_string_count
+ts_query_start_byte_for_pattern
+ts_query_predicates_for_pattern
+ts_query_step_is_definite
+ts_query_capture_name_for_id
+ts_query_string_value_for_id
+ts_query_disable_capture
+ts_query_disable_pattern
+ts_query_cursor_new
+ts_query_cursor_delete
+ts_query_cursor_exec tree-sitter-query-capture
+ts_query_cursor_did_exceed_match_limit
+ts_query_cursor_match_limit
+ts_query_cursor_set_match_limit
+ts_query_cursor_set_byte_range
+ts_query_cursor_set_point_range
+ts_query_cursor_next_match
+ts_query_cursor_remove_match
+ts_query_cursor_next_capture
+ts_language_symbol_count
+ts_language_symbol_name
+ts_language_symbol_for_name
+ts_language_field_count
+ts_language_field_name_for_id
+ts_language_field_id_for_name
+ts_language_symbol_type
+ts_language_version
+@end example
diff --git a/lisp/emacs-lisp/cl-preloaded.el b/lisp/emacs-lisp/cl-preloaded.el
index ef60b266f9..9b5c761afd 100644
--- a/lisp/emacs-lisp/cl-preloaded.el
+++ b/lisp/emacs-lisp/cl-preloaded.el
@@ -68,6 +68,8 @@ cl--typeof-types
(font-spec atom) (font-entity atom) (font-object atom)
(vector array sequence atom)
(user-ptr atom)
+ (tree-sitter-parser atom)
+ (tree-sitter-node atom)
;; Plus, really hand made:
(null symbol list sequence atom))
"Alist of supertypes.
diff --git a/lisp/tree-sitter.el b/lisp/tree-sitter.el
new file mode 100644
index 0000000000..25886b393b
--- /dev/null
+++ b/lisp/tree-sitter.el
@@ -0,0 +1,844 @@
+;;; tree-sitter.el --- tree-sitter utilities -*- lexical-binding: t -*-
+
+;; Copyright (C) 2021 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+;;
+;; Note to self: we don't create parsers automatically in any provided
+;; functions.
+
+;;; Code:
+
+(eval-when-compile (require 'cl-lib))
+(require 'cl-seq)
+(require 'font-lock)
+
+;;; Activating tree-sitter
+
+(defgroup tree-sitter
+ nil
+ "Tree-sitter is an incremental parser."
+ :group 'tools)
+
+(defcustom tree-sitter-disabled-modes nil
+ "A list of major-modes for which tree-sitter support is disabled."
+ :type '(list symbol))
+
+(defcustom tree-sitter-maximum-size (* 4 1024 1024)
+ "Maximum buffer size for enabling tree-sitter parsing."
+ :type 'integer)
+
+(defun tree-sitter-available-p ()
+ "Return non-nil if tree-sitter features are available."
+ (fboundp 'tree-sitter-parser-create))
+
+(defun tree-sitter-should-enable-p (&optional mode)
+ "Return non-nil if MODE should activate tree-sitter support.
+MODE defaults to the value of `major-mode'. The result depends
+on the value of `tree-sitter-disabled-modes',
+`tree-sitter-maximum-size', and of course, whether tree-sitter is
+available on the system at all."
+ (let* ((mode (or mode major-mode))
+ (disabled (cl-loop
+ for disabled-mode in tree-sitter-disabled-modes
+ if (provided-mode-derived-p mode disabled-mode)
+ return t
+ finally return nil)))
+ (and (tree-sitter-available-p)
+ (not disabled)
+ (< (buffer-size) tree-sitter-maximum-size))))
+
+;;; Parser API supplement
+
+(defun tree-sitter-get-parser (language)
+ "Find the first parser using LANGUAGE in `tree-sitter-parser-list'."
+ (catch 'found
+ (dolist (parser tree-sitter-parser-list)
+ (when (eq language (tree-sitter-parser-language parser))
+ (throw 'found parser)))))
+
+(defun tree-sitter-get-parser-create (language)
+ "Find the first parser using LANGUAGE in `tree-sitter-parser-list'.
+If none exists, create one and return it."
+ (or (tree-sitter-get-parser language)
+ (tree-sitter-parser-create
+ (current-buffer) language)))
+
+(defun tree-sitter-parse-string (string language)
+ "Parse STRING using a parser for LANGUAGE.
+Return the root node of the syntax tree."
+ (with-temp-buffer
+ (insert string)
+ (tree-sitter-parser-root-node
+ (tree-sitter-parser-create (current-buffer) language))))
+
+(defun tree-sitter-language-at (point)
+ "Return the language used at POINT."
+ (cl-loop for parser in tree-sitter-parser-list
+ if (tree-sitter-node-at point nil parser)
+ return (tree-sitter-parser-language parser)))
+
+(defun tree-sitter-set-ranges (parser-or-lang ranges)
+ "Set the ranges of PARSER-OR-LANG to RANGES."
+ (tree-sitter-parser-set-included-ranges
+ (cond ((symbolp parser-or-lang)
+ (or (tree-sitter-get-parser parser-or-lang)
+ (error "Cannot find a parser for %s" parser-or-lang)))
+ ((tree-sitter-parser-p parser-or-lang)
+ parser-or-lang)
+ (t (error "Expecting a parser or language, but got %s"
+ parser-or-lang)))
+ ranges))
+
+(defun tree-sitter-get-ranges (parser-or-lang)
+ "Get the ranges of PARSER-OR-LANG."
+ (tree-sitter-parser-included-ranges
+ (cond ((symbolp parser-or-lang)
+ (or (tree-sitter-get-parser parser-or-lang)
+ (error "Cannot find a parser for %s" parser-or-lang)))
+ ((tree-sitter-parser-p parser-or-lang)
+ parser-or-lang)
+ (t (error "Expecting a parser or language, but got %s"
+ parser-or-lang)))))
+
+;;; Node API supplement
+
+(defun tree-sitter-node-buffer (node)
+ "Return the buffer in where NODE belongs."
+ (tree-sitter-parser-buffer
+ (tree-sitter-node-parser node)))
+
+(defun tree-sitter-node-language (node)
+ "Return the language symbol that NODE's parser uses."
+ (tree-sitter-parser-language
+ (tree-sitter-node-parser node)))
+
+(defun tree-sitter-node-at (beg &optional end parser-or-lang named)
+ "Return the smallest node covering BEG to END.
+
+If omitted, END defaults to BEG. Return nil if none find. If
+NAMED non-nil, only look for named node. NAMED defaults to nil.
+
+If PARSER-OR-LANG is nil, use the first parser in
+`tree-sitter-parser-list'; if PARSER-OR-LANG is a parser, use
+that parser; if PARSER-OR-LANG is a language, find a parser using
+that language in the current buffer, and use that."
+ (let ((root (if (tree-sitter-parser-p parser-or-lang)
+ (tree-sitter-parser-root-node parser-or-lang)
+ (tree-sitter-buffer-root-node parser-or-lang))))
+ (tree-sitter-node-descendant-for-range root beg (or end beg) named)))
+
+(defun tree-sitter-buffer-root-node (&optional language)
+ "Return the root node of the current buffer.
+Use the first parser in `tree-sitter-parser-list', if LANGUAGE is
+non-nil, use the first parser for LANGUAGE."
+ (if-let ((parser
+ (or (if language
+ (or (tree-sitter-get-parser language)
+ (error "Cannot find a parser for %s" language))
+ (or (car tree-sitter-parser-list)
+ (error "Buffer has no parser"))))))
+ (tree-sitter-parser-root-node parser)))
+
+(defun tree-sitter-filter-child (node pred &optional named)
+ "Return children of NODE that satisfies PRED.
+PRED is a function that takes one argument, the child node. If
+NAMED non-nil, only search for named node."
+ (let ((child (tree-sitter-node-child node 0 named))
+ result)
+ (while child
+ (when (funcall pred child)
+ (push child result))
+ (setq child (tree-sitter-node-next-sibling child named)))
+ (reverse result)))
+
+(defun tree-sitter-node-text (node &optional no-property)
+ "Return the buffer (or string) content corresponding to NODE.
+If NO-PROPERTY is non-nil, remove text properties."
+ (with-current-buffer (tree-sitter-node-buffer node)
+ (if no-property
+ (buffer-substring-no-properties
+ (tree-sitter-node-start node)
+ (tree-sitter-node-end node))
+ (buffer-substring
+ (tree-sitter-node-start node)
+ (tree-sitter-node-end node)))))
+
+(defun tree-sitter-parent-until (node pred)
+ "Return the closest parent of NODE that satisfies PRED.
+Return nil if none found. PRED should be a function that takes
+one argument, the parent node."
+ (let ((node (tree-sitter-node-parent node)))
+ (while (and node (not (funcall pred node)))
+ (setq node (tree-sitter-node-parent node)))
+ node))
+
+(defun tree-sitter-parent-while (node pred)
+ "Return the furthest parent of NODE that satisfies PRED.
+Return nil if none found. PRED should be a function that takes
+one argument, the parent node."
+ (let ((last nil))
+ (while (and node (funcall pred node))
+ (setq last node
+ node (tree-sitter-node-parent node)))
+ last))
+
+(defun tree-sitter-node-children (node &optional named)
+ "Return a list of NODE's children.
+If NAMED is non-nil, collect named child only."
+ (mapcar (lambda (idx)
+ (tree-sitter-node-child node idx named))
+ (number-sequence
+ 0 (1- (tree-sitter-node-child-count node named)))))
+
+(defun tree-sitter-node-index (node &optional named)
+ "Return the index of NODE in its parent.
+If NAMED is non-nil, count named child only."
+ (let ((count 0))
+ (while (setq node (tree-sitter-node-prev-sibling node named))
+ (cl-incf count))
+ count))
+
+(defun tree-sitter-node-field-name (node)
+ "Return the field name of NODE as a child of its parent."
+ (when-let ((parent (tree-sitter-node-parent node))
+ (idx (tree-sitter-node-index node)))
+ (tree-sitter-node-field-name-for-child parent idx)))
+
+;;; Query API supplement
+
+(defun tree-sitter-query-in (source query &optional beg end)
+ "Query the current buffer with QUERY.
+
+SOURCE can be a language symbol, a parser, or a node. If a
+language symbol, use the root node of the first parser for that
+language; if a parser, use the root node of that parser; if a
+node, use that node.
+
+QUERY is either a string query or a sexp query. See Info node
+`(elisp)Pattern Matching' for how to write a query pattern in either
+string or s-expression form.
+
+BEG and END, if _both_ non-nil, specifies the range in which the query
+is executed.
+
+Raise an tree-sitter-query-error if QUERY is malformed."
+ (tree-sitter-query-capture
+ (cond ((symbolp source) (tree-sitter-buffer-root-node source))
+ ((tree-sitter-parser-p source)
+ (tree-sitter-parser-root-node source))
+ ((tree-sitter-node-p source) source))
+ query
+ beg end))
+
+(defun tree-sitter-query-string (string query language)
+ "Query STRING with QUERY in LANGUAGE.
+See `tree-sitter-query-capture' for QUERY."
+ (with-temp-buffer
+ (insert string)
+ (let ((parser (tree-sitter-parser-create (current-buffer) language)))
+ (tree-sitter-query-capture
+ (tree-sitter-parser-root-node parser)
+ query))))
+
+(defun tree-sitter-query-range (source query &optional beg end)
+ "Query the current buffer and return ranges of captured nodes.
+
+QUERY, SOURCE, BEG, END are the same as in
+`tree-sitter-query-in'. This function returns a list
+of (START . END), where START and END specifics the range of each
+captured node. Capture names don't matter."
+ (cl-loop for capture
+ in (tree-sitter-query-in source query beg end)
+ for node = (cdr capture)
+ collect (cons (tree-sitter-node-start node)
+ (tree-sitter-node-end node))))
+
+;;; Range API supplement
+
+(defvar-local tree-sitter-range-functions nil
+ "A list of range functions.
+Font-locking and indenting code uses functions in this alist to
+set correct ranges for a language parser before using it.
+
+The signature of each function should be
+
+ (start end &rest _)
+
+where START and END marks the region that is about to be used. A
+range function only need to (but not limited to) update ranges in
+that region.
+
+Each function in the list is called in-order.")
+
+(defun tree-sitter-update-ranges (&optional start end)
+ "Update the ranges for each language in the current buffer.
+Calls each range functions in `tree-sitter-range-functions'
+in-order. START and END are passed to each range function."
+ (dolist (range-fn tree-sitter-range-functions)
+ (funcall range-fn (or start (point-min)) (or end (point-max)))))
+
+;;; Font-lock
+
+(defvar-local tree-sitter-font-lock-settings nil
+ "A list of SETTINGs for tree-sitter-based fontification.
+
+Each SETTING should look like
+
+ (LANGUAGE QUERY)
+
+Each SETTING controls one parser (often of different language).
+LANGUAGE is the language symbol. See Info node `(elisp)Language
+Definitions'.
+
+QUERY is either a string query or a sexp query.
+See Info node `(elisp)Pattern Matching' for writing queries.
+
+Capture names in QUERY should be face names like
+`font-lock-keyword-face'. The captured node will be fontified
+with that face. Capture names can also be function names, in
+which case the function is called with (START END NODE), where
+START and END are the start and end position of the node in
+buffer, and NODE is the tree-sitter node object. If a capture
+name is both a face and a function, face takes priority.
+
+Generally, major modes should set
+`tree-sitter-font-lock-defaults', and let Emacs automatically
+populate this variable.")
+
+(defvar-local tree-sitter-font-lock-defaults nil
+ "Defaults for tree-sitter Font Lock specified by the major mode.
+
+This variable should be a list of
+
+ (DEFAULT :KEYWORD VALUE...)
+
+A DEFAULT may be a symbol or a list of symbols (specifying
+different levels of fontification). The symbol(s) can be of a
+variable or a function. If a symbol is both a variable and a
+function, it is used as a function. Different levels of
+fontification can be controlled by
+`font-lock-maximum-decoration'.
+
+The symbol(s) in DEFAULT should contain or return a SETTING as
+explained in `tree-sitter-font-lock-settings', which looks like
+
+ (LANGUAGE QUERY)
+
+KEYWORD and VALUE are additional settings could be used to alter
+fontification behavior. Currently there aren't any.
+
+Multi-language major-modes should provide a range function for
+eacn language it supports in `tree-sitter-range-functions', and
+Emacs will set the ranges accordingly before fontifing a region.
+See Info node `(elisp)Multiple Languages' for what does it mean
+to set ranges for a parser.")
+
+(defun tree-sitter-font-lock-fontify-region (start end &optional loudly)
+ "Fontify the region between START and END.
+If LOUDLY is non-nil, message some debugging information."
+ (tree-sitter-update-ranges start end)
+ (font-lock-unfontify-region start end)
+ (dolist (setting tree-sitter-font-lock-settings)
+ (when-let* ((language (nth 0 setting))
+ (match-pattern (nth 1 setting))
+ (parser (tree-sitter-get-parser-create language)))
+ (when-let ((node (tree-sitter-node-at start end parser)))
+ (let ((captures (tree-sitter-query-capture
+ node match-pattern
+ ;; Specifying the range is important. More
+ ;; often than not, NODE will be the root
+ ;; node, and if we don't specify the range,
+ ;; we are basically querying the whole file.
+ start end)))
+ (with-silent-modifications
+ (dolist (capture captures)
+ (let* ((face (car capture))
+ (node (cdr capture))
+ (start (tree-sitter-node-start node))
+ (end (tree-sitter-node-end node)))
+ (cond ((facep face)
+ (put-text-property start end 'face face))
+ ((functionp face)
+ (funcall face start end node))
+ (t (error "Capture name %s is neither a face nor a function" face)))
+ (when loudly
+ (message "Fontifying text from %d to %d, Face: %s Language: %s"
+ start end face language)))))))))
+ ;; Call regexp font-lock after tree-sitter, as it is usually used
+ ;; for custom fontification.
+ (let ((font-lock-unfontify-region-function #'ignore))
+ (funcall #'font-lock-default-fontify-region start end loudly)))
+
+(defun tree-sitter-font-lock-enable ()
+ "Enable tree-sitter font-locking for the current buffer."
+ (let ((default (car tree-sitter-font-lock-defaults))
+ (attributes (cdr tree-sitter-font-lock-defaults)))
+ (ignore attributes)
+ (setq-local tree-sitter-font-lock-settings
+ (font-lock-eval-keywords
+ (font-lock-choose-keywords
+ default
+ (font-lock-value-in-major-mode
+ font-lock-maximum-decoration)))))
+ (setq-local font-lock-fontify-region-function
+ #'tree-sitter-font-lock-fontify-region)
+ ;; If we don't set `font-lock-defaults' to some non-nil value,
+ ;; font-lock doesn't enable properly (the font-lock-mode-internal
+ ;; doesn't run). See `font-lock-add-keywords'.
+ (when (and font-lock-mode
+ (null font-lock-keywords)
+ (null font-lock-defaults))
+ (font-lock-mode -1)
+ (setq-local font-lock-defaults '(nil t))
+ (font-lock-mode 1)))
+
+;;; Indent
+
+(defvar tree-sitter--indent-verbose nil
+ "If non-nil, log progress when indenting.")
+
+;; This is not bound locally like we normally do with major-mode
+;; stuff, because for tree-sitter, a buffer could contain more than
+;; one language.
+(defvar tree-sitter-simple-indent-rules nil
+ "A list of indent rule settings.
+Each indent rule setting should be (LANGUAGE . RULES),
+where LANGUAGE is a language symbol, and RULES is a list of
+
+ (MATCHER ANCHOR OFFSET).
+
+MATCHER determines whether this rule applies, ANCHOR and OFFSET
+together determines which column to indent to.
+
+A MATCHER is a function that takes three arguments (NODE PARENT
+BOL). BOL is the point where we are indenting: the beginning of
+line content, the position of the first non-whitespace character.
+NODE is the largest (highest-in-tree) node starting at that
+point. PARENT is the parent of NODE.
+
+If MATCHER returns non-nil, meaning the rule matches, Emacs then
+uses ANCHOR to find an anchor, it should be a function that takes
+the same argument (NODE PARENT BOL) and returns a point.
+
+Finally Emacs computes the column of that point returned by ANCHOR
+and adds OFFSET to it, and indents to that column.
+
+For MATCHER and ANCHOR, Emacs provides some convenient presets.
+See `tree-sitter-simple-indent-presets'.")
+
+(defvar tree-sitter-simple-indent-presets
+ '((match . (lambda
+ (&optional node-type parent-type node-field
+ node-index-min node-index-max)
+ `(lambda (node parent bol &rest _)
+ (and (or (null ,node-type)
+ (equal (tree-sitter-node-type node)
+ ,node-type))
+ (or (null ,parent-type)
+ (equal (tree-sitter-node-type parent)
+ ,parent-type))
+ (or (null ,node-field)
+ (equal (tree-sitter-node-field-name node)
+ ,node-field))
+ (or (null ,node-index-min)
+ (>= (tree-sitter-node-index node t)
+ ,node-index-min))
+ (or (null ,node-index-max)
+ (<= (tree-sitter-node-index node t)
+ ,node-index-max))))))
+ (no-node . (lambda (node parent bol &rest _) (null node)))
+ (parent-is . (lambda (type)
+ `(lambda (node parent bol &rest _)
+ (equal ,type (tree-sitter-node-type parent)))))
+
+ (node-is . (lambda (type)
+ `(lambda (node parent bol &rest _)
+ (equal ,type (tree-sitter-node-type node)))))
+
+ (query . (lambda (pattern)
+ `(lambda (node parent bol &rest _)
+ (cl-loop for capture
+ in (tree-sitter-query-capture
+ parent ,pattern)
+ if (tree-sitter-node-eq node (cdr capture))
+ return t
+ finally return nil))))
+ (first-sibling . (lambda (node parent bol &rest _)
+ (tree-sitter-node-start
+ (tree-sitter-node-child parent 0 t))))
+
+ (parent . (lambda (node parent bol &rest _)
+ (tree-sitter-node-start
+ (tree-sitter-node-parent node))))
+ (prev-sibling . (lambda (node parent bol &rest _)
+ (tree-sitter-node-start
+ (tree-sitter-node-prev-sibling node))))
+ (no-indent . (lambda (node parent bol &rest _) bol))
+ (prev-line . (lambda (node parent bol &rest _)
+ (save-excursion
+ (goto-char bol)
+ (forward-line -1)
+ (skip-chars-forward " \t")
+ (tree-sitter-node-start
+ (tree-sitter-node-at (point) nil nil t))))))
+ "A list of presets.
+These presets that can be used as MATHER and ANCHOR in
+`tree-sitter-simple-indent-rules'.
+
+MATCHER:
+
+\(match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN NODE-INDEX-MAX)
+
+ NODE-TYPE checks for node's type, PARENT-TYPE checks for
+ parent's type, NODE-FIELD checks for the filed name of node
+ in the parent, NODE-INDEX-MIN and NODE-INDEX-MAX checks for
+ the node's index in the parent. Therefore, to match the
+ first child where parent is \"argument_list\", use
+
+ (match nil \"argument_list\" nil nil 0 0).
+
+no-node
+
+ Matches the case where node is nil, i.e., there is no node
+ that starts at point. This is the case when indenting an
+ empty line.
+
+\(parent-is TYPE)
+
+ Check that the parent has type TYPE.
+
+\(node-is TYPE)
+
+ Checks that the node has type TYPE.
+
+\(query QUERY)
+
+ Queries the parent node with QUERY, and checks if the node
+ is captured (by any capture name).
+
+ANCHOR:
+
+first-sibling
+
+ Find the first child of the parent.
+
+parent
+
+ Find the parent.
+
+prev-sibling
+
+ Find node's previous sibling.
+
+no-indent
+
+ Do nothing.
+
+prev-line
+
+ Find the named node on the previous line. This can be used when
+ indenting an empty line: just indent like the previous node.")
+
+(defun tree-sitter--simple-apply (fn args)
+ "Apply ARGS to FN.
+
+If FN is a key in `tree-sitter-simple-indent-presets', use the
+corresponding value as the function."
+ ;; We don't want to match uncompiled lambdas, so make sure this cons
+ ;; is not a function. We could move the condition functionp
+ ;; forward, but better be explicit.
+ (cond ((and (consp fn) (not (functionp fn)))
+ (apply (tree-sitter--simple-apply (car fn) (cdr fn))
+ ;; We don't evaluate ARGS with `simple-apply', i.e.,
+ ;; no composing, better keep it simple.
+ args))
+ ((and (symbolp fn)
+ (alist-get fn tree-sitter-simple-indent-presets))
+ (apply (alist-get fn tree-sitter-simple-indent-presets)
+ args))
+ ((functionp fn) (apply fn args))
+ (t (error "Couldn't find the function corresponding to %s" fn))))
+
+;; This variable might seem unnecessary: why split
+;; `tree-sitter-indent' and `tree-sitter-simple-indent' into two
+;; functions? We add this variable in between because later we might
+;; add more powerful indentation engines, and that new engine can
+;; probably share `tree-sitter-indent'. It is also useful, suggested
+;; by Stefan M, to have a function that figures out how much to indent
+;; but doesn't actually performs the indentation, because we might
+;; want to know where will a node indent to if we put it at some other
+;; location, and use that information to calculate the actual
+;; indentation. And `tree-sitter-simple-indent' is that function. I
+;; forgot the example Stefan gave, but it makes a lot of sense.
+(defvar tree-sitter-indent-function #'tree-sitter-simple-indent
+ "Function used by `tree-sitter-indent' to do some of the work.
+
+This function is called with
+
+ (NODE PARENT BOL &rest _)
+
+and returns
+
+ (ANCHOR . OFFSET).
+
+BOL is the position of the beginning of the line; NODE is the
+\"largest\" node that starts at BOL; PARENT is its parent; ANCHOR
+is a point (not a node), and OFFSET is a number. Emacs finds the
+column of ANCHOR and adds OFFSET to it as the final indentation
+of the current line.")
+
+(defun tree-sitter-indent ()
+ "Indent according to the result of `tree-sitter-indent-function'."
+ (tree-sitter-update-ranges)
+ (let* ((orig-pos (point))
+ (bol (save-excursion
+ (forward-line 0)
+ (skip-chars-forward " \t")
+ (point)))
+ (smallest-node
+ (cl-loop for parser in tree-sitter-parser-list
+ for node = (tree-sitter-node-at
+ bol nil parser)
+ if node return node))
+ (node (tree-sitter-parent-while
+ smallest-node
+ (lambda (node)
+ (eq bol (tree-sitter-node-start node))))))
+ (pcase-let*
+ ((parser (if smallest-node
+ (tree-sitter-node-parser smallest-node)
+ nil))
+ ;; NODE would be nil if BOL is on a whitespace. In that case
+ ;; we set PARENT to the "node at point", which would
+ ;; encompass the whitespace.
+ (parent (cond ((and node parser)
+ (tree-sitter-node-parent node))
+ (parser
+ (tree-sitter-node-at bol nil parser))
+ (t nil)))
+ (`(,anchor . ,offset)
+ (funcall tree-sitter-indent-function node parent bol)))
+ (if (null anchor)
+ (when tree-sitter--indent-verbose
+ (message "Failed to find the anchor"))
+ (let ((col (+ (save-excursion
+ (goto-char anchor)
+ (current-column))
+ offset)))
+ (if (< bol orig-pos)
+ (save-excursion
+ (indent-line-to col))
+ (indent-line-to col)))))))
+
+(defun tree-sitter-simple-indent (node parent bol)
+ "Calculate indentation according to `tree-sitter-simple-indent-rules'.
+
+BOL is the position of the first non-whitespace character on the
+current line. NODE is the largest node that starts at BOL,
+PARENT is NODE's parent.
+
+Return (ANCHOR . OFFSET) where ANCHOR is a node, OFFSET is the
+indentation offset, meaning indent to align with ANCHOR and add
+OFFSET."
+ (if (null parent)
+ (when tree-sitter--indent-verbose
+ (message "PARENT is nil, not indenting"))
+ (let* ((language (tree-sitter-node-language parent))
+ (rules (alist-get language
+ tree-sitter-simple-indent-rules)))
+ (cl-loop for rule in rules
+ for pred = (nth 0 rule)
+ for anchor = (nth 1 rule)
+ for offset = (nth 2 rule)
+ if (tree-sitter--simple-apply
+ pred (list node parent bol))
+ do (when tree-sitter--indent-verbose
+ (message "Matched rule: %S" rule))
+ and
+ return (cons (tree-sitter--simple-apply
+ anchor (list node parent bol))
+ offset)))))
+
+(defun tree-sitter-check-indent (mode)
+ "Check current buffer's indentation against a major mode MODE.
+
+Pop up a diff buffer showing the difference. Correct
+indentation (target) is in green, current indentation is in red."
+ (interactive "CTarget major mode: ")
+ (let ((source-buf (current-buffer)))
+ (with-temp-buffer
+ (insert-buffer-substring source-buf)
+ (funcall mode)
+ (indent-region (point-min) (point-max))
+ (diff-buffers source-buf (current-buffer)))))
+
+;;; Debugging
+
+(defvar-local tree-sitter--inspect-name nil
+ "Tree-sitter-inspect-mode uses this to show node name in mode-line.")
+
+(defun tree-sitter-inspect-node-at-point (&optional arg)
+ "Show information of the node at point.
+If called interactively, show in echo area, otherwise set
+`tree-sitter--inspect-name' (which will appear in the mode-line
+if `tree-sitter-inspect-mode' is enabled). Uses the first parser
+in `tree-sitter-parser-list'."
+ (interactive "p")
+ ;; NODE-LIST contains all the node that starts at point.
+ (let* ((node-list
+ (cl-loop for node = (tree-sitter-node-at (point))
+ then (tree-sitter-node-parent node)
+ while node
+ if (eq (tree-sitter-node-start node)
+ (point))
+ collect node))
+ (largest-node (car (last node-list)))
+ (parent (tree-sitter-node-parent largest-node))
+ ;; node-list-acending contains all the node bottom-up, then
+ ;; the parent.
+ (node-list-acending
+ (if (null largest-node)
+ ;; If there are no nodes that start at point, just show
+ ;; the node at point and its parent.
+ (list (tree-sitter-node-at (point))
+ (tree-sitter-node-parent
+ (tree-sitter-node-at (point))))
+ (append node-list (list parent))))
+ (name ""))
+ ;; We draw nodes like (parent field-name: (node)) recursively,
+ ;; so it could be (node1 field-name: (node2 field-name: (node3))).
+ (dolist (node node-list-acending)
+ (setq
+ name
+ (concat
+ (if (tree-sitter-node-field-name node)
+ (format " %s: " (tree-sitter-node-field-name node))
+ " ")
+ (if (tree-sitter-node-check node 'named) "(" "\"")
+ (or (tree-sitter-node-type node)
+ "N/A")
+ name
+ (if (tree-sitter-node-check node 'named) ")" "\""))))
+ (setq tree-sitter--inspect-name name)
+ (force-mode-line-update)
+ (when arg
+ (if node-list
+ (message "%s" tree-sitter--inspect-name)
+ (message "No node at point")))))
+
+(define-minor-mode tree-sitter-inspect-mode
+ "Shows the node that _starts_ at point in the mode-line.
+
+The mode-line displays
+
+ PARENT FIELD-NAME: (CHILD (GRAND-CHILD (...)))
+
+CHILD, GRAND-CHILD, and GRAND-GRAND-CHILD, etc, are nodes that
+have their beginning at point. And PARENT is the parent of
+CHILD.
+
+If no node starts at point, i.e., point is in the middle of a
+node, then we just display the smallest node that spans point and
+its immediate parent.
+
+This minor mode doesn't create parsers on its own. It simply
+uses the first parser in `tree-sitter-parser-list'."
+ :lighter nil
+ (if tree-sitter-inspect-mode
+ (progn
+ (add-hook 'post-command-hook
+ #'tree-sitter-inspect-node-at-point 0 t)
+ (add-to-list 'mode-line-misc-info
+ '(:eval tree-sitter--inspect-name)))
+ (remove-hook 'post-command-hook
+ #'tree-sitter-inspect-node-at-point t)
+ (setq mode-line-misc-info
+ (remove '(:eval tree-sitter--inspect-name)
+ mode-line-misc-info))))
+
+(defun tree-sitter-check-query (query language)
+ "Check if QUERY is valid for LANGUAGE.
+If QUERY is invalid, display the query in a popup buffer, jumps
+to the offending pattern and highlight the pattern."
+ (let ((buf (get-buffer-create "*tree-sitter check query*")))
+ (with-temp-buffer
+ (tree-sitter-get-parser-create language)
+ (condition-case err
+ (progn (tree-sitter-query-in language query)
+ (message "QUERY is valid"))
+ (tree-sitter-query-error
+ (with-current-buffer buf
+ (let* ((data (cdr err))
+ (message (nth 0 data))
+ (start (nth 1 data)))
+ (erase-buffer)
+ (insert query)
+ (goto-char start)
+ (search-forward " " nil t)
+ (put-text-property start (point) 'face 'error)
+ (message "%s" (buffer-substring start (point)))
+ (goto-char (point-min))
+ (insert (format "%s: %d\n" message start))
+ (forward-char start)))
+ (pop-to-buffer buf))))))
+
+;;; Etc
+
+(declare-function find-library-name "find-func.el")
+(defun tree-sitter--check-manual-covarage ()
+ "Print tree-sitter functions missing from the manual in message buffer."
+ (interactive)
+ (require 'find-func)
+ (let ((functions-in-source
+ (with-temp-buffer
+ (insert-file-contents (find-library-name "tree-sitter"))
+ (cl-remove-if
+ (lambda (name) (string-match "tree-sitter--" name))
+ (cl-sort
+ (save-excursion
+ (goto-char (point-min))
+ (cl-loop while (re-search-forward
+ "^(defun \\([^ ]+\\)" nil t)
+ collect (match-string-no-properties 1)))
+ #'string<))))
+ (functions-in-manual
+ (with-temp-buffer
+ (insert-file-contents (expand-file-name
+ "doc/lispref/parsing.texi"
+ source-directory))
+ (insert-file-contents (expand-file-name
+ "doc/lispref/modes.texi"
+ source-directory))
+ (cl-sort
+ (save-excursion
+ (goto-char (point-min))
+ (cl-loop while (re-search-forward
+ "^@defun \\([^ ]+\\)" nil t)
+ collect (match-string-no-properties 1)))
+ #'string<))))
+ (message "Missing: %s"
+ (string-join
+ (cl-remove-if
+ (lambda (name) (member name functions-in-manual))
+ functions-in-source)
+ "\n"))))
+
+(provide 'tree-sitter)
+
+;;; tree-sitter.el ends here
diff --git a/src/Makefile.in b/src/Makefile.in
index 6379582660..0d7fbb666b 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -333,6 +333,10 @@ JSON_LIBS =
JSON_CFLAGS = @JSON_CFLAGS@
JSON_OBJ = @JSON_OBJ@
+TREE_SITTER_LIBS = @TREE_SITTER_LIBS@
+TREE_SITTER_FLAGS = @TREE_SITTER_FLAGS@
+TREE_SITTER_OBJ = @TREE_SITTER_OBJ@
+
INTERVALS_H = dispextern.h intervals.h composite.h
GETLOADAVG_LIBS = @GETLOADAVG_LIBS@
@@ -396,7 +400,7 @@ EMACS_CFLAGS=
$(XINPUT_CFLAGS) $(WEBP_CFLAGS) $(WEBKIT_CFLAGS) $(LCMS2_CFLAGS) \
$(SETTINGS_CFLAGS) $(FREETYPE_CFLAGS) $(FONTCONFIG_CFLAGS) \
$(HARFBUZZ_CFLAGS) $(LIBOTF_CFLAGS) $(M17N_FLT_CFLAGS) $(DEPFLAGS) \
- $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) \
+ $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) $(TREE_SITTER_CFLAGS) \
$(LIBGNUTLS_CFLAGS) $(NOTIFY_CFLAGS) $(CAIRO_CFLAGS) \
$(WERROR_CFLAGS) $(HAIKU_CFLAGS)
ALL_CFLAGS = $(EMACS_CFLAGS) $(WARN_CFLAGS) $(CFLAGS)
@@ -435,7 +439,7 @@ base_obj =
$(if $(HYBRID_MALLOC),sheap.o) \
$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \
$(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) $(JSON_OBJ) \
- $(HAIKU_OBJ) $(PGTK_OBJ)
+ $(TREE_SITTER_OBJ) $(HAIKU_OBJ) $(PGTK_OBJ)
doc_obj = $(base_obj) $(NS_OBJC_OBJ)
obj = $(doc_obj) $(HAIKU_CXX_OBJ)
@@ -555,7 +559,7 @@ LIBES =
$(LIBGNUTLS_LIBS) $(LIB_PTHREAD) $(GETADDRINFO_A_LIBS) $(LCMS2_LIBS) \
$(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS) \
$(JSON_LIBS) $(LIBGMP) $(LIBGCCJIT_LIBS) $(XINPUT_LIBS) $(HAIKU_LIBS) \
- $(SQLITE3_LIBS)
+ $(TREE_SITTER_LIBS) $(SQLITE3_LIBS)
## FORCE it so that admin/unidata can decide whether this file is
## up-to-date. Although since charprop depends on bootstrap-emacs,
diff --git a/src/alloc.c b/src/alloc.c
index 7582a42601..0a7e365fc2 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -50,6 +50,10 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2022 Free Software
#include TERM_HEADER
#endif /* HAVE_WINDOW_SYSTEM */
+#ifdef HAVE_TREE_SITTER
+#include "tree-sitter.h"
+#endif
+
#include <flexmember.h>
#include <verify.h>
#include <execinfo.h> /* For backtrace. */
@@ -3153,6 +3157,15 @@ cleanup_vector (struct Lisp_Vector *vector)
if (uptr->finalizer)
uptr->finalizer (uptr->p);
}
+#ifdef HAVE_TREE_SITTER
+ else if (PSEUDOVECTOR_TYPEP (&vector->header, PVEC_TS_PARSER))
+ {
+ struct Lisp_TS_Parser *lisp_parser
+ = PSEUDOVEC_STRUCT (vector, Lisp_TS_Parser);
+ ts_tree_delete(lisp_parser->tree);
+ ts_parser_delete(lisp_parser->parser);
+ }
+#endif
#ifdef HAVE_MODULES
else if (PSEUDOVECTOR_TYPEP (&vector->header, PVEC_MODULE_FUNCTION))
{
diff --git a/src/casefiddle.c b/src/casefiddle.c
index 2ea5f09b4c..ac9e73be0c 100644
--- a/src/casefiddle.c
+++ b/src/casefiddle.c
@@ -30,6 +30,10 @@ Copyright (C) 1985, 1994, 1997-1999, 2001-2022 Free Software Foundation,
#include "composite.h"
#include "keymap.h"
+#ifdef HAVE_TREE_SITTER
+#include "tree-sitter.h"
+#endif
+
enum case_action {CASE_UP, CASE_DOWN, CASE_CAPITALIZE, CASE_CAPITALIZE_UP};
/* State for casing individual characters. */
@@ -530,6 +534,11 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
modify_text (start, end);
prepare_casing_context (&ctx, flag, true);
+#ifdef HAVE_TREE_SITTER
+ ptrdiff_t start_byte = CHAR_TO_BYTE (start);
+ ptrdiff_t old_end_byte = CHAR_TO_BYTE (end);
+#endif
+
ptrdiff_t orig_end = end;
record_delete (start, make_buffer_string (start, end, true), false);
if (NILP (BVAR (current_buffer, enable_multibyte_characters)))
@@ -548,6 +557,9 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
{
signal_after_change (start, end - start - added, end - start);
update_compositions (start, end, CHECK_ALL);
+#ifdef HAVE_TREE_SITTER
+ ts_record_change (start_byte, old_end_byte, CHAR_TO_BYTE (end));
+#endif
}
return orig_end + added;
diff --git a/src/data.c b/src/data.c
index 5d0790692b..9acaf60e07 100644
--- a/src/data.c
+++ b/src/data.c
@@ -259,6 +259,10 @@ DEFUN ("type-of", Ftype_of, Stype_of, 1, 1, 0,
return Qxwidget;
case PVEC_XWIDGET_VIEW:
return Qxwidget_view;
+ case PVEC_TS_PARSER:
+ return Qtree_sitter_parser;
+ case PVEC_TS_NODE:
+ return Qtree_sitter_node;
case PVEC_SQLITE:
return Qsqlite;
/* "Impossible" cases. */
@@ -4069,6 +4073,8 @@ #define PUT_ERROR(sym, tail, msg) \
DEFSYM (Qterminal, "terminal");
DEFSYM (Qxwidget, "xwidget");
DEFSYM (Qxwidget_view, "xwidget-view");
+ DEFSYM (Qtree_sitter_parser, "tree-sitter-parser");
+ DEFSYM (Qtree_sitter_node, "tree-sitter-node");
DEFSYM (Qdefun, "defun");
diff --git a/src/emacs.c b/src/emacs.c
index 3b708db779..a0dc67bc9a 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -85,6 +85,7 @@ #define MAIN_PROGRAM
#include "intervals.h"
#include "character.h"
#include "buffer.h"
+#include "tree-sitter.h"
#include "window.h"
#include "xwidget.h"
#include "atimer.h"
@@ -2139,6 +2140,9 @@ main (int argc, char **argv)
syms_of_floatfns ();
syms_of_buffer ();
+ #ifdef HAVE_TREE_SITTER
+ syms_of_tree_sitter ();
+ #endif
syms_of_bytecode ();
syms_of_callint ();
syms_of_casefiddle ();
diff --git a/src/eval.c b/src/eval.c
index 5514583b6a..76bc1b058c 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -1970,6 +1970,19 @@ signal_error (const char *s, Lisp_Object arg)
xsignal (Qerror, Fcons (build_string (s), arg));
}
+void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent)
+{
+ eassert (SYMBOLP (name));
+ eassert (SYMBOLP (parent));
+ Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
+ eassert (CONSP (parent_conditions));
+ eassert (!NILP (Fmemq (parent, parent_conditions)));
+ eassert (NILP (Fmemq (name, parent_conditions)));
+ Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
+ Fput (name, Qerror_message, build_pure_c_string (message));
+}
+
/* Use this for arithmetic overflow, e.g., when an integer result is
too large even for a bignum. */
void
diff --git a/src/insdel.c b/src/insdel.c
index d9ba222b1d..c54a4476f7 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -31,6 +31,10 @@
#include "region-cache.h"
#include "pdumper.h"
+#ifdef HAVE_TREE_SITTER
+#include "tree-sitter.h"
+#endif
+
static void insert_from_string_1 (Lisp_Object, ptrdiff_t, ptrdiff_t, ptrdiff_t,
ptrdiff_t, bool, bool);
static void insert_from_buffer_1 (struct buffer *, ptrdiff_t, ptrdiff_t, bool);
@@ -940,6 +944,12 @@ insert_1_both (const char *string,
set_text_properties (make_fixnum (PT), make_fixnum (PT + nchars),
Qnil, Qnil, Qnil);
+#ifdef HAVE_TREE_SITTER
+ eassert (nbytes >= 0);
+ eassert (PT_BYTE >= 0);
+ ts_record_change (PT_BYTE, PT_BYTE, PT_BYTE + nbytes);
+#endif
+
adjust_point (nchars, nbytes);
check_markers ();
@@ -1071,6 +1081,12 @@ insert_from_string_1 (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
graft_intervals_into_buffer (intervals, PT, nchars,
current_buffer, inherit);
+#ifdef HAVE_TREE_SITTER
+ eassert (nbytes >= 0);
+ eassert (PT_BYTE >= 0);
+ ts_record_change (PT_BYTE, PT_BYTE, PT_BYTE + nbytes);
+#endif
+
adjust_point (nchars, outgoing_nbytes);
check_markers ();
@@ -1137,6 +1153,12 @@ insert_from_gap (ptrdiff_t nchars, ptrdiff_t nbytes, bool text_at_gap_tail)
current_buffer, 0);
}
+#ifdef HAVE_TREE_SITTER
+ eassert (nbytes >= 0);
+ eassert (ins_bytepos >= 0);
+ ts_record_change (ins_bytepos, ins_bytepos, ins_bytepos + nbytes);
+#endif
+
if (ins_charpos < PT)
adjust_point (nchars, nbytes);
@@ -1287,6 +1309,12 @@ insert_from_buffer_1 (struct buffer *buf,
/* Insert those intervals. */
graft_intervals_into_buffer (intervals, PT, nchars, current_buffer, inherit);
+#ifdef HAVE_TREE_SITTER
+ eassert (outgoing_nbytes >= 0);
+ eassert (PT_BYTE >= 0);
+ ts_record_change (PT_BYTE, PT_BYTE, PT_BYTE + outgoing_nbytes);
+#endif
+
adjust_point (nchars, outgoing_nbytes);
}
\f
@@ -1535,6 +1563,13 @@ replace_range (ptrdiff_t from, ptrdiff_t to, Lisp_Object new,
graft_intervals_into_buffer (intervals, from, inschars,
current_buffer, inherit);
+#ifdef HAVE_TREE_SITTER
+ eassert (to_byte >= from_byte);
+ eassert (outgoing_insbytes >= 0);
+ eassert (from_byte >= 0);
+ ts_record_change (from_byte, to_byte, from_byte + outgoing_insbytes);
+#endif
+
/* Relocate point as if it were a marker. */
if (from < PT)
adjust_point ((from + inschars - (PT < to ? PT : to)),
@@ -1569,7 +1604,11 @@ replace_range (ptrdiff_t from, ptrdiff_t to, Lisp_Object new,
If MARKERS, relocate markers.
Unlike most functions at this level, never call
- prepare_to_modify_buffer and never call signal_after_change. */
+ prepare_to_modify_buffer and never call signal_after_change.
+ Because this function is called in a loop, one character at a time.
+ The caller of 'replace_range_2' calls these hooks for the entire
+ region once. Apart from signal_after_change, any caller of this
+ function should also call ts_record_change. */
void
replace_range_2 (ptrdiff_t from, ptrdiff_t from_byte,
@@ -1892,6 +1931,12 @@ del_range_2 (ptrdiff_t from, ptrdiff_t from_byte,
evaporate_overlays (from);
+#ifdef HAVE_TREE_SITTER
+ eassert (from_byte <= to_byte);
+ eassert (from_byte >= 0);
+ ts_record_change (from_byte, to_byte, from_byte);
+#endif
+
return deletion;
}
diff --git a/src/json.c b/src/json.c
index 21a6df6785..bfe0f692f6 100644
--- a/src/json.c
+++ b/src/json.c
@@ -1090,22 +1090,6 @@ DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
return unbind_to (count, lisp);
}
-/* Simplified version of 'define-error' that works with pure
- objects. */
-
-static void
-define_error (Lisp_Object name, const char *message, Lisp_Object parent)
-{
- eassert (SYMBOLP (name));
- eassert (SYMBOLP (parent));
- Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
- eassert (CONSP (parent_conditions));
- eassert (!NILP (Fmemq (parent, parent_conditions)));
- eassert (NILP (Fmemq (name, parent_conditions)));
- Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
- Fput (name, Qerror_message, build_pure_c_string (message));
-}
-
void
syms_of_json (void)
{
diff --git a/src/lisp.h b/src/lisp.h
index f8fe2a6906..03d545c640 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -563,6 +563,8 @@ #define ENUM_BF(TYPE) enum TYPE
your object -- this way, the same object could be used to represent
several disparate C structures.
+ In addition, you need to add switch branches in data.c for Ftype_of.
+
You also need to add the new type to the constant
`cl--typeof-types' in lisp/emacs-lisp/cl-preloaded.el. */
@@ -1083,6 +1085,8 @@ DEFINE_GDB_SYMBOL_END (PSEUDOVECTOR_FLAG)
PVEC_CONDVAR,
PVEC_MODULE_FUNCTION,
PVEC_NATIVE_COMP_UNIT,
+ PVEC_TS_PARSER,
+ PVEC_TS_NODE,
PVEC_SQLITE,
/* These should be last, for internal_equal and sxhash_obj. */
@@ -5208,6 +5212,11 @@ maybe_gc (void)
maybe_garbage_collect ();
}
+/* Simplified version of 'define-error' that works with pure
+ objects. */
+void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent);
+
INLINE_HEADER_END
#endif /* EMACS_LISP_H */
diff --git a/src/lread.c b/src/lread.c
index 2eff20f15d..fb78536c4c 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -5181,6 +5181,14 @@ syms_of_lread (void)
Fcons (build_pure_c_string (MODULES_SECONDARY_SUFFIX), Vload_suffixes);
#endif
+ DEFVAR_LISP ("dynamic-library-suffixes", Vdynamic_library_suffixes,
+ doc: /* A list of suffixes for loadable dynamic libraries. */);
+ Vdynamic_library_suffixes =
+ Fcons (build_pure_c_string (DYNAMIC_LIB_SECONDARY_SUFFIX), Qnil);
+ Vdynamic_library_suffixes =
+ Fcons (build_pure_c_string (DYNAMIC_LIB_SUFFIX),
+ Vdynamic_library_suffixes);
+
#endif
DEFVAR_LISP ("module-file-suffix", Vmodule_file_suffix,
doc: /* Suffix of loadable module file, or nil if modules are not supported. */);
diff --git a/src/print.c b/src/print.c
index a3c9011215..0b77787b24 100644
--- a/src/print.c
+++ b/src/print.c
@@ -48,6 +48,10 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2022 Free Software
# include <sys/socket.h> /* for F_DUPFD_CLOEXEC */
#endif
+#ifdef HAVE_TREE_SITTER
+#include "tree-sitter.h"
+#endif
+
struct terminal;
/* Avoid actual stack overflow in print. */
@@ -1880,6 +1884,30 @@ print_vectorlike (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag,
}
break;
#endif
+
+#ifdef HAVE_TREE_SITTER
+ case PVEC_TS_PARSER:
+ print_c_string ("#<tree-sitter-parser for ", printcharfun);
+ Lisp_Object language = XTS_PARSER (obj)->language_symbol;
+ print_string (Fsymbol_name (language), printcharfun);
+ print_c_string (" in ", printcharfun);
+ print_object (XTS_PARSER (obj)->buffer, printcharfun, escapeflag);
+ printchar ('>', printcharfun);
+ break;
+ case PVEC_TS_NODE:
+ print_c_string ("#<tree-sitter-node from ", printcharfun);
+ print_object (Ftree_sitter_node_start (obj),
+ printcharfun, escapeflag);
+ print_c_string (" to ", printcharfun);
+ print_object (Ftree_sitter_node_end (obj),
+ printcharfun, escapeflag);
+ print_c_string (" in ", printcharfun);
+ print_object (XTS_PARSER (XTS_NODE (obj)->parser)->buffer,
+ printcharfun, escapeflag);
+ printchar ('>', printcharfun);
+ break;
+#endif
+
case PVEC_SQLITE:
{
print_c_string ("#<sqlite ", printcharfun);
diff --git a/src/tree-sitter.c b/src/tree-sitter.c
new file mode 100644
index 0000000000..0cf27298f0
--- /dev/null
+++ b/src/tree-sitter.c
@@ -0,0 +1,1597 @@
+/* Tree-sitter integration for GNU Emacs.
+
+Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include "lisp.h"
+#include "buffer.h"
+#include "tree-sitter.h"
+
+/* Commentary
+
+ The Emacs wrapper of tree-sitter does not expose everything the C
+ API provides, most notably:
+
+ - It doesn't expose a syntax tree, we put the syntax tree in the
+ parser object, and updating the tree is handled in the C level.
+
+ - We don't expose tree cursor either. I think Lisp is slow enough
+ to nullify any performance advantage of using a cursor, though I
+ don't have evidence.
+
+ - Because updating the change is handled in the C level as each
+ change is made in the buffer, there is no way for Lisp to update
+ a node. But since we can just retrieve a new node, it shouldn't
+ be a limitation.
+
+ - I didn't expose setting timeout and cancellation flag for a
+ parser, mainly because I don't think they are really necessary
+ in Emacs' use cases.
+
+ - Many tree-sitter functions asks for a TSPoint, basically a (row,
+ column) location. Emacs uses a gap buffer and keeps no
+ information about row and column position. According to the
+ author of tree-sitter, tree-sitter only asks for (row, column)
+ position to carry it around and return back to the user later;
+ and the real position used is the byte position. He also said
+ that he _think_ that it will work to use byte position only.
+ That's why whenever a TSPoint is asked, we pass a dummy one to
+ it. Judging by the nature of parsing algorithms, I think it is
+ safe to use only byte position, and I don't think this will
+ change in the future.
+
+ REF: https://github.com/tree-sitter/tree-sitter/issues/445
+
+ tree-sitter.h has some commentary on the two main data structure
+ for the parser and node. ts_ensure_position_synced has some
+ commentary on how do we make tree-sitter play well with narrowing
+ (tree-sitter parser only sees the visible region, so we need to
+ translate positions back and forth). Most action happens in
+ ts_ensure_parsed, ts_read_buffer and ts_record_change.
+
+ A complete correspondence list between tree-sitter functions and
+ exposed Lisp functions can be found in the manual (elisp)API
+ Correspondence.
+
+ Placement of CHECK_xxx functions: call CHECK_xxx before using any
+ unchecked Lisp values; these include argument of Lisp functions,
+ return value of Fsymbol_value, car of a cons.
+
+ Initializing tree-sitter: there are two entry points to tree-sitter
+ functions: 'tree-sitter-parser-create' and
+ 'tree-sitter-language-available-p'. Therefore we only need to call
+ initialization function in those two functions.
+
+ Tree-sitter offset (0-based) and buffer position (1-based):
+ tree-sitter offset + buffer position = buffer position
+ buffer position - buffer position = tree-sitter offset
+ */
+
+/*** Initialization */
+
+bool ts_initialized = false;
+
+static void *
+ts_calloc_wrapper (size_t n, size_t size)
+{
+ return xzalloc (n * size);
+}
+
+void
+ts_initialize ()
+{
+ if (!ts_initialized)
+ {
+ ts_set_allocator (xmalloc, ts_calloc_wrapper, xrealloc, xfree);
+ ts_initialized = true;
+ }
+}
+
+/*** Loading language library */
+
+/* Translates a symbol tree-sitter-<lang> to a C name
+ tree_sitter_<lang>. */
+void
+ts_symbol_to_c_name (char *symbol_name)
+{
+ for (int idx=0; idx < strlen (symbol_name); idx++)
+ {
+ if (symbol_name[idx] == '-')
+ symbol_name[idx] = '_';
+ }
+}
+
+bool
+ts_find_override_name
+(Lisp_Object language_symbol, Lisp_Object *name, Lisp_Object *c_symbol)
+{
+ for (Lisp_Object list = Vtree_sitter_load_name_override_list;
+ !NILP (list); list = XCDR (list))
+ {
+ Lisp_Object lang = XCAR (XCAR (list));
+ CHECK_SYMBOL (lang);
+ if (EQ (lang, language_symbol))
+ {
+ *name = Fnth (make_fixnum (1), XCAR (list));
+ CHECK_STRING (*name);
+ *c_symbol = Fnth (make_fixnum (2), XCAR (list));
+ CHECK_STRING (*c_symbol);
+ return true;
+ }
+ }
+ return false;
+}
+
+/* Load the dynamic library of LANGUAGE_SYMBOL and return the pointer
+ to the language definition. Signals
+ Qtree_sitter_load_language_error if something goes wrong.
+ Qtree_sitter_load_language_error carries the error message from
+ trying to load the library with each extension.
+
+ If SIGNAL is true, signal an error when failed to load LANGUAGE; if
+ false, return NULL when failed. */
+TSLanguage *
+ts_load_language (Lisp_Object language_symbol, bool signal)
+{
+ Lisp_Object symbol_name = Fsymbol_name (language_symbol);
+
+ /* Figure out the library name and C name. */
+ Lisp_Object lib_base_name =
+ (concat2 (build_pure_c_string ("lib"), symbol_name));
+ char *c_name = strdup (SSDATA (symbol_name));
+ ts_symbol_to_c_name (c_name);
+
+ /* Override the library name and C name, if appropriate. */
+ Lisp_Object override_name;
+ Lisp_Object override_c_name;
+ bool found_override = ts_find_override_name
+ (language_symbol, &override_name, &override_c_name);
+ if (found_override)
+ {
+ lib_base_name = override_name;
+ c_name = SSDATA (override_c_name);
+ }
+
+ dynlib_handle_ptr handle;
+ char const *error;
+ Lisp_Object error_list = Qnil;
+ /* Try loading dynamic library with each extension in
+ 'tree-sitter-load-suffixes'. Stop when succeed, record error
+ message and try the next one when fail. */
+ for (Lisp_Object suffixes = Vdynamic_library_suffixes;
+ !NILP (suffixes); suffixes = XCDR (suffixes))
+ {
+ char *library_name =
+ SSDATA (concat2 (lib_base_name, XCAR (suffixes)));
+ dynlib_error ();
+ handle = dynlib_open (library_name);
+ error = dynlib_error ();
+ if (error == NULL)
+ break;
+ else
+ error_list = Fcons (build_string (error), error_list);
+ }
+ if (error != NULL)
+ {
+ if (signal)
+ xsignal2 (Qtree_sitter_load_language_error,
+ symbol_name, Fnreverse (error_list));
+ else
+ return NULL;
+ }
+
+ /* Load TSLanguage. */
+ dynlib_error ();
+ TSLanguage *(*langfn) ();
+ langfn = dynlib_sym (handle, c_name);
+ error = dynlib_error ();
+ if (error != NULL)
+ {
+ if (signal)
+ xsignal1 (Qtree_sitter_load_language_error,
+ build_string (error));
+ else
+ return NULL;
+ }
+ TSLanguage *lang = (*langfn) ();
+
+ /* Check if language version matches tree-sitter version. */
+ TSParser *parser = ts_parser_new ();
+ bool success = ts_parser_set_language (parser, lang);
+ ts_parser_delete (parser);
+ if (!success)
+ {
+ if (signal)
+ xsignal2 (Qtree_sitter_load_language_error,
+ build_pure_c_string ("Language version doesn't match tree-sitter version, language version:"),
+ make_fixnum (ts_language_version (lang)));
+ else
+ return NULL;
+ }
+ return lang;
+}
+
+DEFUN ("tree-sitter-language-available-p",
+ Ftree_sitter_langauge_available_p,
+ Stree_sitter_language_available_p,
+ 1, 1, 0,
+ doc: /* Return non-nil if LANGUAGE exists and is loadable. */)
+ (Lisp_Object language)
+{
+ CHECK_SYMBOL (language);
+ ts_initialize ();
+ if (ts_load_language(language, false) == NULL)
+ return Qnil;
+ else
+ return Qt;
+}
+
+/*** Parsing functions */
+
+/* An auxiliary function that saves a few lines of code. */
+static inline void
+ts_tree_edit_1 (TSTree *tree, ptrdiff_t start_byte,
+ ptrdiff_t old_end_byte, ptrdiff_t new_end_byte)
+{
+ TSPoint dummy_point = {0, 0};
+ TSInputEdit edit = {(uint32_t) start_byte,
+ (uint32_t) old_end_byte,
+ (uint32_t) new_end_byte,
+ dummy_point, dummy_point, dummy_point};
+ ts_tree_edit (tree, &edit);
+}
+
+/* Update each parser's tree after the user made an edit. This
+function does not parse the buffer and only updates the tree. (So it
+should be very fast.) */
+void
+ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
+ ptrdiff_t new_end_byte)
+{
+ for (Lisp_Object parser_list =
+ Fsymbol_value (Qtree_sitter_parser_list);
+ !NILP (parser_list);
+ parser_list = XCDR (parser_list))
+ {
+ CHECK_CONS (parser_list);
+ Lisp_Object lisp_parser = XCAR (parser_list);
+ CHECK_TS_PARSER (lisp_parser);
+ TSTree *tree = XTS_PARSER (lisp_parser)->tree;
+ if (tree != NULL)
+ {
+ eassert (start_byte <= old_end_byte);
+ eassert (start_byte <= new_end_byte);
+ /* Think the recorded change as a delete followed by an
+ insert, and think of them as moving unchanged text back
+ and forth. After all, the whole point of updating the
+ tree is to update the position of unchanged text. */
+ ptrdiff_t bytes_del = old_end_byte - start_byte;
+ ptrdiff_t bytes_ins = new_end_byte - start_byte;
+
+ ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
+ ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
+
+ ptrdiff_t affected_start =
+ max (visible_beg, start_byte) - visible_beg;
+ ptrdiff_t affected_old_end =
+ min (visible_end, affected_start + bytes_del);
+ ptrdiff_t affected_new_end =
+ affected_start + bytes_ins;
+
+ ts_tree_edit_1 (tree, affected_start, affected_old_end,
+ affected_new_end);
+ XTS_PARSER (lisp_parser)->visible_end = affected_new_end;
+ XTS_PARSER (lisp_parser)->need_reparse = true;
+ XTS_PARSER (lisp_parser)->timestamp++;
+ }
+ }
+}
+
+void
+ts_ensure_position_synced (Lisp_Object parser)
+{
+ TSParser *ts_parser = XTS_PARSER (parser)->parser;
+ TSTree *tree = XTS_PARSER (parser)->tree;
+ struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+ /* Before we parse or set ranges, catch up with the narrowing
+ situation. We change visible_beg and visible_end to match
+ BUF_BEGV_BYTE and BUF_ZV_BYTE, and inform tree-sitter of the
+ change. */
+ ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg;
+ ptrdiff_t visible_end = XTS_PARSER (parser)->visible_end;
+ /* Before re-parse, we want to move the visible range of tree-sitter
+ to matched the narrowed range. For example,
+ from ________|xxxx|__
+ to |xxxx|__________ */
+
+ /* 1. Make sure visible_beg <= BUF_BEGV_BYTE. */
+ if (visible_beg > BUF_BEGV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: insert at the beginning. */
+ ts_tree_edit_1 (tree, 0, 0, visible_beg - BUF_BEGV_BYTE (buffer));
+ visible_beg = BUF_BEGV_BYTE (buffer);
+ }
+ /* 2. Make sure visible_end = BUF_ZV_BYTE. */
+ if (visible_end < BUF_ZV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: insert at the end. */
+ ts_tree_edit_1 (tree, visible_end - visible_beg,
+ visible_end - visible_beg,
+ BUF_ZV_BYTE (buffer) - visible_beg);
+ visible_end = BUF_ZV_BYTE (buffer);
+ }
+ else if (visible_end > BUF_ZV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: delete at the end. */
+ ts_tree_edit_1 (tree, BUF_ZV_BYTE (buffer) - visible_beg,
+ visible_end - visible_beg,
+ BUF_ZV_BYTE (buffer) - visible_beg);
+ visible_end = BUF_ZV_BYTE (buffer);
+ }
+ /* 3. Make sure visible_beg = BUF_BEGV_BYTE. */
+ if (visible_beg < BUF_BEGV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: delete at the beginning. */
+ ts_tree_edit_1 (tree, 0, BUF_BEGV_BYTE (buffer) - visible_beg, 0);
+ visible_beg = BUF_BEGV_BYTE (buffer);
+ }
+ eassert (0 <= visible_beg);
+ eassert (visible_beg <= visible_end);
+
+ XTS_PARSER (parser)->visible_beg = visible_beg;
+ XTS_PARSER (parser)->visible_end = visible_end;
+}
+
+void
+ts_check_buffer_size (struct buffer *buffer)
+{
+ ptrdiff_t buffer_size =
+ (BUF_Z (buffer) - BUF_BEG (buffer));
+ if (buffer_size > UINT32_MAX)
+ xsignal2 (Qtree_sitter_buffer_too_large,
+ build_pure_c_string ("Buffer size too large, size:"),
+ make_fixnum (buffer_size));
+}
+
+/* Parse the buffer. We don't parse until we have to. When we have
+to, we call this function to parse and update the tree. */
+void
+ts_ensure_parsed (Lisp_Object parser)
+{
+ if (!XTS_PARSER (parser)->need_reparse)
+ return;
+ TSParser *ts_parser = XTS_PARSER (parser)->parser;
+ TSTree *tree = XTS_PARSER(parser)->tree;
+ TSInput input = XTS_PARSER (parser)->input;
+ struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+ ts_check_buffer_size (buffer);
+
+ /* Before we parse, catch up with the narrowing situation. */
+ ts_ensure_position_synced (parser);
+
+ TSTree *new_tree = ts_parser_parse(ts_parser, tree, input);
+ /* This should be very rare (impossible, really): it only happens
+ when 1) language is not set (impossible in Emacs because the user
+ has to supply a language to create a parser), 2) parse canceled
+ due to timeout (impossible because we don't set a timeout), 3)
+ parse canceled due to cancellation flag (impossible because we
+ don't set the flag). (See comments for ts_parser_parse in
+ tree_sitter/api.h.) */
+ if (new_tree == NULL)
+ {
+ Lisp_Object buf;
+ XSETBUFFER (buf, buffer);
+ xsignal1 (Qtree_sitter_parse_error, buf);
+ }
+
+ ts_tree_delete (tree);
+ XTS_PARSER (parser)->tree = new_tree;
+ XTS_PARSER (parser)->need_reparse = false;
+}
+
+/* This is the read function provided to tree-sitter to read from a
+ buffer. It reads one character at a time and automatically skips
+ the gap. */
+const char*
+ts_read_buffer (void *parser, uint32_t byte_index,
+ TSPoint position, uint32_t *bytes_read)
+{
+ struct buffer *buffer =
+ XBUFFER (((struct Lisp_TS_Parser *) parser)->buffer);
+ ptrdiff_t visible_beg = ((struct Lisp_TS_Parser *) parser)->visible_beg;
+ ptrdiff_t visible_end = ((struct Lisp_TS_Parser *) parser)->visible_end;
+ ptrdiff_t byte_pos = byte_index + visible_beg;
+ /* We will make sure visible_beg = BUF_BEGV_BYTE before re-parse (in
+ ts_ensure_parsed), so byte_pos will never be smaller than
+ BUF_BEG_BYTE. */
+ eassert (visible_beg = BUF_BEGV_BYTE (buffer));
+ eassert (visible_end = BUF_ZV_BYTE (buffer));
+
+ /* Read one character. Tree-sitter wants us to set bytes_read to 0
+ if it reads to the end of buffer. It doesn't say what it wants
+ for the return value in that case, so we just give it an empty
+ string. */
+ char *beg;
+ int len;
+ /* This function could run from a user command, so it is better to
+ do nothing instead of raising an error. (It was a pain in the a**
+ to decrypt mega-if-conditions in Emacs source, so I wrote the two
+ branches separately.) */
+ if (!BUFFER_LIVE_P (buffer))
+ {
+ beg = NULL;
+ len = 0;
+ }
+ /* Reached visible end-of-buffer, tell tree-sitter to read no more. */
+ else if (byte_pos >= visible_end)
+ {
+ beg = NULL;
+ len = 0;
+ }
+ /* Normal case, read a character. */
+ else
+ {
+ beg = (char *) BUF_BYTE_ADDRESS (buffer, byte_pos);
+ len = BYTES_BY_CHAR_HEAD ((int) *beg);
+ }
+ *bytes_read = (uint32_t) len;
+ return beg;
+}
+
+/*** Functions for parser and node object*/
+
+/* Wrap the parser in a Lisp_Object to be used in the Lisp machine. */
+Lisp_Object
+make_ts_parser (Lisp_Object buffer, TSParser *parser,
+ TSTree *tree, Lisp_Object language_symbol)
+{
+ struct Lisp_TS_Parser *lisp_parser
+ = ALLOCATE_PSEUDOVECTOR
+ (struct Lisp_TS_Parser, buffer, PVEC_TS_PARSER);
+
+ lisp_parser->language_symbol = language_symbol;
+ lisp_parser->buffer = buffer;
+ lisp_parser->parser = parser;
+ lisp_parser->tree = tree;
+ TSInput input = {lisp_parser, ts_read_buffer, TSInputEncodingUTF8};
+ lisp_parser->input = input;
+ lisp_parser->need_reparse = true;
+ lisp_parser->visible_beg = BUF_BEGV (XBUFFER (buffer));
+ lisp_parser->visible_end = BUF_ZV (XBUFFER (buffer));
+ return make_lisp_ptr (lisp_parser, Lisp_Vectorlike);
+}
+
+/* Wrap the node in a Lisp_Object to be used in the Lisp machine. */
+Lisp_Object
+make_ts_node (Lisp_Object parser, TSNode node)
+{
+ struct Lisp_TS_Node *lisp_node
+ = ALLOCATE_PSEUDOVECTOR (struct Lisp_TS_Node, parser, PVEC_TS_NODE);
+ lisp_node->parser = parser;
+ lisp_node->node = node;
+ lisp_node->timestamp = XTS_PARSER (parser)->timestamp;
+ return make_lisp_ptr (lisp_node, Lisp_Vectorlike);
+}
+
+DEFUN ("tree-sitter-parser-p",
+ Ftree_sitter_parser_p, Stree_sitter_parser_p, 1, 1, 0,
+ doc: /* Return t if OBJECT is a tree-sitter parser. */)
+ (Lisp_Object object)
+{
+ if (TS_PARSERP (object))
+ return Qt;
+ else
+ return Qnil;
+}
+
+DEFUN ("tree-sitter-node-p",
+ Ftree_sitter_node_p, Stree_sitter_node_p, 1, 1, 0,
+ doc: /* Return t if OBJECT is a tree-sitter node. */)
+ (Lisp_Object object)
+{
+ if (TS_NODEP (object))
+ return Qt;
+ else
+ return Qnil;
+}
+
+DEFUN ("tree-sitter-node-parser",
+ Ftree_sitter_node_parser, Stree_sitter_node_parser,
+ 1, 1, 0,
+ doc: /* Return the parser to which NODE belongs. */)
+ (Lisp_Object node)
+{
+ CHECK_TS_NODE (node);
+ return XTS_NODE (node)->parser;
+}
+
+DEFUN ("tree-sitter-parser-create",
+ Ftree_sitter_parser_create, Stree_sitter_parser_create,
+ 2, 2, 0,
+ doc: /* Create and return a parser in BUFFER for LANGUAGE.
+
+The parser is automatically added to BUFFER's
+`tree-sitter-parser-list'. LANGUAGE should be the symbol of a
+function provided by a tree-sitter language dynamic module, e.g.,
+'tree-sitter-json. If BUFFER is nil, use the current buffer. */)
+ (Lisp_Object buffer, Lisp_Object language)
+{
+ if (NILP (buffer))
+ buffer = Fcurrent_buffer ();
+
+ CHECK_BUFFER (buffer);
+ CHECK_SYMBOL (language);
+ ts_check_buffer_size (XBUFFER (buffer));
+
+ ts_initialize ();
+
+ TSParser *parser = ts_parser_new ();
+ TSLanguage *lang = ts_load_language (language, true);
+ /* We check language version when loading a language, so this should
+ always succeed. */
+ ts_parser_set_language (parser, lang);
+
+ Lisp_Object lisp_parser
+ = make_ts_parser (buffer, parser, NULL, language);
+
+ struct buffer *old_buffer = current_buffer;
+ set_buffer_internal (XBUFFER (buffer));
+
+ Fset (Qtree_sitter_parser_list,
+ Fcons (lisp_parser, Fsymbol_value (Qtree_sitter_parser_list)));
+
+ set_buffer_internal (old_buffer);
+ return lisp_parser;
+}
+
+DEFUN ("tree-sitter-parser-buffer",
+ Ftree_sitter_parser_buffer, Stree_sitter_parser_buffer,
+ 1, 1, 0,
+ doc: /* Return the buffer of PARSER. */)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ Lisp_Object buf;
+ XSETBUFFER (buf, XBUFFER (XTS_PARSER (parser)->buffer));
+ return buf;
+}
+
+DEFUN ("tree-sitter-parser-language",
+ Ftree_sitter_parser_language, Stree_sitter_parser_language,
+ 1, 1, 0,
+ doc: /* Return parser's language symbol.
+This symbol is the one used to create the parser. */)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ return XTS_PARSER (parser)->language_symbol;
+}
+
+/*** Parser API */
+
+DEFUN ("tree-sitter-parser-root-node",
+ Ftree_sitter_parser_root_node, Stree_sitter_parser_root_node,
+ 1, 1, 0,
+ doc: /* Return the root node of PARSER. */)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ ts_ensure_parsed (parser);
+ TSNode root_node = ts_tree_root_node (XTS_PARSER (parser)->tree);
+ return make_ts_node (parser, root_node);
+}
+
+/* Checks that the RANGES argument of
+ tree-sitter-parser-set-included-ranges is valid. */
+void
+ts_check_range_argument (Lisp_Object ranges)
+{
+ EMACS_INT last_point = 1;
+ for (Lisp_Object tail = ranges;
+ !NILP (tail); tail = XCDR (tail))
+ {
+ CHECK_CONS (tail);
+ Lisp_Object range = XCAR (tail);
+ CHECK_CONS (range);
+ CHECK_FIXNUM (XCAR (range));
+ CHECK_FIXNUM (XCDR (range));
+ EMACS_INT beg = XFIXNUM (XCAR (range));
+ EMACS_INT end = XFIXNUM (XCDR (range));
+ /* TODO: Maybe we should check for point-min/max, too? */
+ if (!(last_point <= beg && beg <= end))
+ xsignal2 (Qtree_sitter_range_invalid,
+ build_pure_c_string
+ ("RANGE is either overlapping or out-of-order"),
+ ranges);
+ last_point = end;
+ }
+}
+
+DEFUN ("tree-sitter-parser-set-included-ranges",
+ Ftree_sitter_parser_set_included_ranges,
+ Stree_sitter_parser_set_included_ranges,
+ 2, 2, 0,
+ doc: /* Limit PARSER to RANGES.
+
+RANGES is a list of (BEG . END), each (BEG . END) confines a range in
+which the parser should operate in. Each range must not overlap, and
+each range should come in order. Signal `tree-sitter-set-range-error'
+if the argument is invalid, or something else went wrong. If RANGES
+is nil, set PARSER to parse the whole buffer. */)
+ (Lisp_Object parser, Lisp_Object ranges)
+{
+ CHECK_TS_PARSER (parser);
+ CHECK_CONS (ranges);
+ ts_check_range_argument (ranges);
+
+ /* Before we parse, catch up with narrowing/widening. */
+ ts_ensure_position_synced (parser);
+
+ bool success;
+ if (NILP (ranges))
+ {
+ /* If RANGES is nil, make parser to parse the whole document.
+ To do that we give tree-sitter a 0 length, the range is a
+ dummy. */
+ TSRange ts_range = {0, 0, 0, 0};
+ success = ts_parser_set_included_ranges
+ (XTS_PARSER (parser)->parser, &ts_range , 0);
+ }
+ else
+ {
+ /* Set ranges for PARSER. */
+ ptrdiff_t len = list_length (ranges);
+ TSRange *ts_ranges = malloc (sizeof(TSRange) * len);
+ struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+
+ for (int idx=0; !NILP (ranges); idx++, ranges = XCDR (ranges))
+ {
+ Lisp_Object range = XCAR (ranges);
+ struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+
+ EMACS_INT beg_byte = buf_charpos_to_bytepos
+ (buffer, XFIXNUM (XCAR (range)));
+ EMACS_INT end_byte = buf_charpos_to_bytepos
+ (buffer, XFIXNUM (XCDR (range)));
+ /* We don't care about start and end points, put in dummy
+ value. */
+ TSRange rg = {{0,0}, {0,0},
+ (uint32_t) beg_byte - BUF_BEGV_BYTE (buffer),
+ (uint32_t) end_byte - BUF_BEGV_BYTE (buffer)};
+ ts_ranges[idx] = rg;
+ }
+ success = ts_parser_set_included_ranges
+ (XTS_PARSER (parser)->parser, ts_ranges, (uint32_t) len);
+ /* Although XFIXNUM could signal, it should be impossible
+ because we have checked the input by ts_check_range_argument.
+ So there is no need for unwind-protect. */
+ free (ts_ranges);
+ }
+
+ if (!success)
+ xsignal2 (Qtree_sitter_range_invalid,
+ build_pure_c_string
+ ("Something went wrong when setting ranges"),
+ ranges);
+
+ XTS_PARSER (parser)->need_reparse = true;
+ return Qnil;
+}
+
+DEFUN ("tree-sitter-parser-included-ranges",
+ Ftree_sitter_parser_included_ranges,
+ Stree_sitter_parser_included_ranges,
+ 1, 1, 0,
+ doc: /* Return the ranges set for PARSER.
+See `tree-sitter-parser-set-ranges'. If no range is set, return
+nil. */)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ uint32_t len;
+ const TSRange *ranges = ts_parser_included_ranges
+ (XTS_PARSER (parser)->parser, &len);
+ if (len == 0)
+ return Qnil;
+ struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+
+ Lisp_Object list = Qnil;
+ for (int idx=0; idx < len; idx++)
+ {
+ TSRange range = ranges[idx];
+ uint32_t beg_byte = range.start_byte + BUF_BEGV_BYTE (buffer);
+ uint32_t end_byte = range.end_byte + BUF_BEGV_BYTE (buffer);
+
+ Lisp_Object lisp_range =
+ Fcons (make_fixnum (buf_bytepos_to_charpos (buffer, beg_byte)) ,
+ make_fixnum (buf_bytepos_to_charpos (buffer, end_byte)));
+ list = Fcons (lisp_range, list);
+ }
+ return Fnreverse (list);
+}
+
+/*** Node API */
+
+/* Check that OBJ is a positive integer and signal an error if
+ otherwise. */
+static void
+ts_check_positive_integer (Lisp_Object obj)
+{
+ CHECK_INTEGER (obj);
+ if (XFIXNUM (obj) < 0)
+ xsignal1 (Qargs_out_of_range, obj);
+}
+
+static void
+ts_check_node (Lisp_Object obj)
+{
+ CHECK_TS_NODE (obj);
+ Lisp_Object lisp_parser = XTS_NODE (obj)->parser;
+ if (XTS_NODE (obj)->timestamp !=
+ XTS_PARSER (lisp_parser)->timestamp)
+ xsignal1 (Qtree_sitter_node_outdated, obj);
+}
+
+DEFUN ("tree-sitter-node-type",
+ Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
+ doc: /* Return the NODE's type as a string.
+If NODE is nil, return nil. */)
+ (Lisp_Object node)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ const char *type = ts_node_type (ts_node);
+ return build_string (type);
+}
+
+DEFUN ("tree-sitter-node-start",
+ Ftree_sitter_node_start, Stree_sitter_node_start, 1, 1, 0,
+ doc: /* Return the NODE's start position.
+If NODE is nil, return nil. */)
+ (Lisp_Object node)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ uint32_t start_byte_offset = ts_node_start_byte (ts_node);
+ struct buffer *buffer =
+ XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ ptrdiff_t start_pos = buf_bytepos_to_charpos
+ (buffer, start_byte_offset + visible_beg);
+ return make_fixnum (start_pos);
+}
+
+DEFUN ("tree-sitter-node-end",
+ Ftree_sitter_node_end, Stree_sitter_node_end, 1, 1, 0,
+ doc: /* Return the NODE's end position.
+If NODE is nil, return nil. */)
+ (Lisp_Object node)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ uint32_t end_byte_offset = ts_node_end_byte (ts_node);
+ struct buffer *buffer =
+ XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ ptrdiff_t end_pos = buf_bytepos_to_charpos
+ (buffer, end_byte_offset + visible_beg);
+ return make_fixnum (end_pos);
+}
+
+DEFUN ("tree-sitter-node-string",
+ Ftree_sitter_node_string, Stree_sitter_node_string, 1, 1, 0,
+ doc: /* Return the string representation of NODE.
+If NODE is nil, return nil. */)
+ (Lisp_Object node)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ char *string = ts_node_string (ts_node);
+ return make_string (string, strlen (string));
+}
+
+DEFUN ("tree-sitter-node-parent",
+ Ftree_sitter_node_parent, Stree_sitter_node_parent, 1, 1, 0,
+ doc: /* Return the immediate parent of NODE.
+Return nil if there isn't any. If NODE is nil, return nil. */)
+ (Lisp_Object node)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode parent = ts_node_parent (ts_node);
+
+ if (ts_node_is_null (parent))
+ return Qnil;
+
+ return make_ts_node (XTS_NODE (node)->parser, parent);
+}
+
+DEFUN ("tree-sitter-node-child",
+ Ftree_sitter_node_child, Stree_sitter_node_child, 2, 3, 0,
+ doc: /* Return the Nth child of NODE.
+
+Return nil if there isn't any. If NAMED is non-nil, look for named
+child only. NAMED defaults to nil. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object n, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ ts_check_positive_integer (n);
+ EMACS_INT idx = XFIXNUM (n);
+ if (idx > UINT32_MAX) xsignal1 (Qargs_out_of_range, n);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child;
+ if (NILP (named))
+ child = ts_node_child (ts_node, (uint32_t) idx);
+ else
+ child = ts_node_named_child (ts_node, (uint32_t) idx);
+
+ if (ts_node_is_null (child))
+ return Qnil;
+
+ return make_ts_node (XTS_NODE (node)->parser, child);
+}
+
+DEFUN ("tree-sitter-node-check",
+ Ftree_sitter_node_check, Stree_sitter_node_check, 2, 2, 0,
+ doc: /* Return non-nil if NODE has PROPERTY, nil otherwise.
+
+PROPERTY could be 'named, 'missing, 'extra, 'has-changes, 'has-error.
+Named nodes correspond to named rules in the language definition,
+whereas "anonymous" nodes correspond to string literals in the
+language definition.
+
+Missing nodes are inserted by the parser in order to recover from
+certain kinds of syntax errors, i.e., should be there but not there.
+
+Extra nodes represent things like comments, which are not required the
+language definition, but can appear anywhere.
+
+A node "has changes" if the buffer changed since the node is
+created. (Don't forget the "s" at the end of 'has-changes.)
+
+A node "has error" if itself is a syntax error or contains any syntax
+errors. */)
+ (Lisp_Object node, Lisp_Object property)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ CHECK_SYMBOL (property);
+ TSNode ts_node = XTS_NODE (node)->node;
+ bool result;
+ if (EQ (property, Qnamed))
+ result = ts_node_is_named (ts_node);
+ else if (EQ (property, Qmissing))
+ result = ts_node_is_missing (ts_node);
+ else if (EQ (property, Qextra))
+ result = ts_node_is_extra (ts_node);
+ else if (EQ (property, Qhas_error))
+ result = ts_node_has_error (ts_node);
+ else if (EQ (property, Qhas_changes))
+ result = ts_node_has_changes (ts_node);
+ else
+ signal_error ("Expecting 'named, 'missing, 'extra, 'has-changes or 'has-error, got",
+ property);
+ return result ? Qt : Qnil;
+}
+
+DEFUN ("tree-sitter-node-field-name-for-child",
+ Ftree_sitter_node_field_name_for_child,
+ Stree_sitter_node_field_name_for_child, 2, 2, 0,
+ doc: /* Return the field name of the Nth child of NODE.
+
+Return nil if there isn't any child or no field is found.
+If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object n)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ ts_check_positive_integer (n);
+ EMACS_INT idx = XFIXNUM (n);
+ if (idx > UINT32_MAX) xsignal1 (Qargs_out_of_range, n);
+ TSNode ts_node = XTS_NODE (node)->node;
+ const char *name
+ = ts_node_field_name_for_child (ts_node, (uint32_t) idx);
+
+ if (name == NULL)
+ return Qnil;
+
+ return make_string (name, strlen (name));
+}
+
+DEFUN ("tree-sitter-node-child-count",
+ Ftree_sitter_node_child_count,
+ Stree_sitter_node_child_count, 1, 2, 0,
+ doc: /* Return the number of children of NODE.
+
+If NAMED is non-nil, count named child only. NAMED defaults to
+nil. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ uint32_t count;
+ if (NILP (named))
+ count = ts_node_child_count (ts_node);
+ else
+ count = ts_node_named_child_count (ts_node);
+ return make_fixnum (count);
+}
+
+DEFUN ("tree-sitter-node-child-by-field-name",
+ Ftree_sitter_node_child_by_field_name,
+ Stree_sitter_node_child_by_field_name, 2, 2, 0,
+ doc: /* Return the child of NODE with FIELD-NAME.
+Return nil if there isn't any. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object field_name)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ CHECK_STRING (field_name);
+ char *name_str = SSDATA (field_name);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child
+ = ts_node_child_by_field_name (ts_node, name_str, strlen (name_str));
+
+ if (ts_node_is_null(child))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, child);
+}
+
+DEFUN ("tree-sitter-node-next-sibling",
+ Ftree_sitter_node_next_sibling,
+ Stree_sitter_node_next_sibling, 1, 2, 0,
+ doc: /* Return the next sibling of NODE.
+
+Return nil if there isn't any. If NAMED is non-nil, look for named
+child only. NAMED defaults to nil. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode sibling;
+ if (NILP (named))
+ sibling = ts_node_next_sibling (ts_node);
+ else
+ sibling = ts_node_next_named_sibling (ts_node);
+
+ if (ts_node_is_null(sibling))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, sibling);
+}
+
+DEFUN ("tree-sitter-node-prev-sibling",
+ Ftree_sitter_node_prev_sibling,
+ Stree_sitter_node_prev_sibling, 1, 2, 0,
+ doc: /* Return the previous sibling of NODE.
+
+Return nil if there isn't any. If NAMED is non-nil, look for named
+child only. NAMED defaults to nil. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode sibling;
+
+ if (NILP (named))
+ sibling = ts_node_prev_sibling (ts_node);
+ else
+ sibling = ts_node_prev_named_sibling (ts_node);
+
+ if (ts_node_is_null(sibling))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, sibling);
+}
+
+DEFUN ("tree-sitter-node-first-child-for-pos",
+ Ftree_sitter_node_first_child_for_pos,
+ Stree_sitter_node_first_child_for_pos, 2, 3, 0,
+ doc: /* Return the first child of NODE on POS.
+
+Specifically, return the first child that extends beyond POS. POS is
+a position in the buffer. Return nil if there isn't any. If NAMED is
+non-nil, look for named child only. NAMED defaults to nil. Note that
+this function returns an immediate child, not the smallest
+(grand)child. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object pos, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ ts_check_positive_integer (pos);
+
+ struct buffer *buf =
+ XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ ptrdiff_t byte_pos = buf_charpos_to_bytepos (buf, XFIXNUM (pos));
+
+ if (byte_pos < BUF_BEGV_BYTE (buf) || byte_pos > BUF_ZV_BYTE (buf))
+ xsignal1 (Qargs_out_of_range, pos);
+
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child;
+ if (NILP (named))
+ child = ts_node_first_child_for_byte
+ (ts_node, byte_pos - visible_beg);
+ else
+ child = ts_node_first_named_child_for_byte
+ (ts_node, byte_pos - visible_beg);
+
+ if (ts_node_is_null (child))
+ return Qnil;
+
+ return make_ts_node (XTS_NODE (node)->parser, child);
+}
+
+DEFUN ("tree-sitter-node-descendant-for-range",
+ Ftree_sitter_node_descendant_for_range,
+ Stree_sitter_node_descendant_for_range, 3, 4, 0,
+ doc: /* Return the smallest node that covers BEG to END.
+
+The returned node is a descendant of NODE. POS is a position. Return
+nil if there isn't any. If NAMED is non-nil, look for named child
+only. NAMED defaults to nil. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object beg, Lisp_Object end, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ CHECK_INTEGER (beg);
+ CHECK_INTEGER (end);
+
+ struct buffer *buf =
+ XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ ptrdiff_t byte_beg = buf_charpos_to_bytepos (buf, XFIXNUM (beg));
+ ptrdiff_t byte_end = buf_charpos_to_bytepos (buf, XFIXNUM (end));
+
+ /* Checks for BUFFER_BEG <= BEG <= END <= BUFFER_END. */
+ if (!(BUF_BEGV_BYTE (buf) <= byte_beg
+ && byte_beg <= byte_end
+ && byte_end <= BUF_ZV_BYTE (buf)))
+ xsignal2 (Qargs_out_of_range, beg, end);
+
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child;
+ if (NILP (named))
+ child = ts_node_descendant_for_byte_range
+ (ts_node, byte_beg - visible_beg , byte_end - visible_beg);
+ else
+ child = ts_node_named_descendant_for_byte_range
+ (ts_node, byte_beg - visible_beg, byte_end - visible_beg);
+
+ if (ts_node_is_null (child))
+ return Qnil;
+
+ return make_ts_node (XTS_NODE (node)->parser, child);
+}
+
+DEFUN ("tree-sitter-node-eq",
+ Ftree_sitter_node_eq,
+ Stree_sitter_node_eq, 2, 2, 0,
+ doc: /* Return non-nil if NODE1 and NODE2 are the same node.
+If any one of NODE1 and NODE2 is nil, return nil. */)
+ (Lisp_Object node1, Lisp_Object node2)
+{
+ if (NILP (node1) || NILP (node2))
+ return Qnil;
+ CHECK_TS_NODE (node1);
+ CHECK_TS_NODE (node2);
+
+ TSNode ts_node_1 = XTS_NODE (node1)->node;
+ TSNode ts_node_2 = XTS_NODE (node2)->node;
+
+ bool same_node = ts_node_eq (ts_node_1, ts_node_2);
+ return same_node ? Qt : Qnil;
+}
+
+/*** Query functions */
+
+/* If we decide to pre-load tree-sitter.el, maybe we can implement
+ this function in Lisp. */
+DEFUN ("tree-sitter-expand-pattern",
+ Ftree_sitter_expand_pattern,
+ Stree_sitter_expand_pattern, 1, 1, 0,
+ doc: /* Expand PATTERN to its string form.
+
+PATTERN can be
+
+ :anchor
+ :?
+ :*
+ :+
+ :equal
+ :match
+ (TYPE PATTERN...)
+ [PATTERN...]
+ FIELD-NAME:
+ @CAPTURE-NAME
+ (_)
+ _
+ \"TYPE\"
+
+Consult Info node `(elisp)Pattern Matching' form detailed
+explanation. */)
+ (Lisp_Object pattern)
+{
+ if (EQ (pattern, intern_c_string (":anchor")))
+ return build_pure_c_string(".");
+ if (EQ (pattern, intern_c_string (":?")))
+ return build_pure_c_string("?");
+ if (EQ (pattern, intern_c_string (":*")))
+ return build_pure_c_string("*");
+ if (EQ (pattern, intern_c_string (":+")))
+ return build_pure_c_string("+");
+ if (EQ (pattern, intern_c_string (":equal")))
+ return build_pure_c_string("#equal");
+ if (EQ (pattern, intern_c_string (":match")))
+ return build_pure_c_string("#match");
+ Lisp_Object opening_delimeter =
+ build_pure_c_string (VECTORP (pattern) ? "[" : "(");
+ Lisp_Object closing_delimiter =
+ build_pure_c_string (VECTORP (pattern) ? "]" : ")");
+ if (VECTORP (pattern) || CONSP (pattern))
+ return concat3 (opening_delimeter,
+ Fmapconcat (intern_c_string
+ ("tree-sitter-expand-pattern"),
+ pattern,
+ build_pure_c_string (" ")),
+ closing_delimiter);
+ return CALLN (Fformat, build_pure_c_string("%S"), pattern);
+}
+
+DEFUN ("tree-sitter-expand-query",
+ Ftree_sitter_expand_query,
+ Stree_sitter_expand_query, 1, 1, 0,
+ doc: /* Expand sexp QUERY to its string form.
+
+A PATTERN in QUERY can be
+
+ :anchor
+ :?
+ :*
+ :+
+ :equal
+ :match
+ (TYPE PATTERN...)
+ [PATTERN...]
+ FIELD-NAME:
+ @CAPTURE-NAME
+ (_)
+ _
+ \"TYPE\"
+
+Consult Info node `(elisp)Pattern Matching' form detailed
+explanation. */)
+ (Lisp_Object query)
+{
+ return Fmapconcat (intern_c_string ("tree-sitter-expand-pattern"),
+ query, build_pure_c_string (" "));
+}
+
+char*
+ts_query_error_to_string (TSQueryError error)
+{
+ switch (error)
+ {
+ case TSQueryErrorNone:
+ return "None";
+ case TSQueryErrorSyntax:
+ return "Syntax error at";
+ case TSQueryErrorNodeType:
+ return "Node type error at";
+ case TSQueryErrorField:
+ return "Field error at";
+ case TSQueryErrorCapture:
+ return "Capture error at";
+ case TSQueryErrorStructure:
+ return "Structure error at";
+ default:
+ return "Unknown error";
+ }
+}
+
+/* Collect predicates for this match and return them in a list. Each
+ predicate is a list of strings and symbols. */
+Lisp_Object
+ts_predicates_for_pattern
+(TSQuery *query, uint32_t pattern_index)
+{
+ uint32_t len;
+ const TSQueryPredicateStep *predicate_list =
+ ts_query_predicates_for_pattern (query, pattern_index, &len);
+ Lisp_Object result = Qnil;
+ Lisp_Object predicate = Qnil;
+ for (int idx=0; idx < len; idx++)
+ {
+ TSQueryPredicateStep step = predicate_list[idx];
+ switch (step.type)
+ {
+ case TSQueryPredicateStepTypeCapture:
+ {
+ uint32_t str_len;
+ const char *str = ts_query_capture_name_for_id
+ (query, step.value_id, &str_len);
+ predicate = Fcons (intern_c_string_1 (str, str_len),
+ predicate);
+ break;
+ }
+ case TSQueryPredicateStepTypeString:
+ {
+ uint32_t str_len;
+ const char *str = ts_query_string_value_for_id
+ (query, step.value_id, &str_len);
+ predicate = Fcons (make_string (str, str_len), predicate);
+ break;
+ }
+ case TSQueryPredicateStepTypeDone:
+ result = Fcons (Fnreverse (predicate), result);
+ predicate = Qnil;
+ break;
+ }
+ }
+ return Fnreverse (result);
+}
+
+/* Translate a capture NAME (symbol) to the text of the captured node.
+ Signals tree-sitter-query-error if such node is not captured. */
+Lisp_Object
+ts_predicate_capture_name_to_text (Lisp_Object name, Lisp_Object captures)
+{
+ Lisp_Object node = Qnil;
+ for (Lisp_Object tail = captures; !NILP (tail); tail = XCDR (tail))
+ {
+ if (EQ (XCAR (XCAR (tail)), name))
+ {
+ node = XCDR (XCAR (tail));
+ break;
+ }
+ }
+
+ if (NILP (node))
+ xsignal3 (Qtree_sitter_query_error,
+ build_pure_c_string ("Cannot find captured node"),
+ name, build_pure_c_string ("A predicate can only refer to captured nodes in the same pattern"));
+
+ struct buffer *old_buffer = current_buffer;
+ set_buffer_internal
+ (XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer));
+ Lisp_Object text = Fbuffer_substring
+ (Ftree_sitter_node_start (node), Ftree_sitter_node_end (node));
+ set_buffer_internal (old_buffer);
+ return text;
+}
+
+/* Handles predicate (#equal A B). Return true if A equals B; return
+ false otherwise. A and B can be either string, or a capture name.
+ The capture name evaluates to the text its captured node spans in
+ the buffer. */
+bool
+ts_predicate_equal (Lisp_Object args, Lisp_Object captures)
+{
+ if (XFIXNUM (Flength (args)) != 2)
+ xsignal2 (Qtree_sitter_query_error, build_pure_c_string ("Predicate `equal' requires two arguments but only given"), Flength (args));
+
+ Lisp_Object arg1 = XCAR (args);
+ Lisp_Object arg2 = XCAR (XCDR (args));
+ Lisp_Object tail = captures;
+ Lisp_Object text1 = STRINGP (arg1) ? arg1 :
+ ts_predicate_capture_name_to_text (arg1, captures);
+ Lisp_Object text2 = STRINGP (arg2) ? arg2 :
+ ts_predicate_capture_name_to_text (arg2, captures);
+
+ if (NILP (Fstring_equal (text1, text2)))
+ return false;
+ else
+ return true;
+}
+
+/* Handles predicate (#match "regexp" @node). Return true if "regexp"
+ matches the text spanned by @node; return false otherwise. Matching
+ is case-sensitive. */
+bool
+ts_predicate_match (Lisp_Object args, Lisp_Object captures)
+{
+ if (XFIXNUM (Flength (args)) != 2)
+ xsignal2 (Qtree_sitter_query_error, build_pure_c_string ("Predicate `equal' requires two arguments but only given"), Flength (args));
+
+ Lisp_Object regexp = XCAR (args);
+ Lisp_Object capture_name = XCAR (XCDR (args));
+ Lisp_Object tail = captures;
+ Lisp_Object text = ts_predicate_capture_name_to_text
+ (capture_name, captures);
+
+ /* It's probably common to get the argument order backwards. Catch
+ this mistake early and show helpful explanation, because Emacs
+ loves you. (We put the regexp first because that's what
+ string-match does.) */
+ if (!STRINGP (regexp))
+ xsignal1 (Qtree_sitter_query_error, build_pure_c_string ("The first argument to `match' should be a regexp string, not a capture name"));
+ if (!SYMBOLP (capture_name))
+ xsignal1 (Qtree_sitter_query_error, build_pure_c_string ("The second argument to `match' should be a capture name, not a string"));
+
+ if (fast_string_match (regexp, text) >= 0)
+ return true;
+ else
+ return false;
+}
+
+/* About predicates: I decide to hard-code predicates in C instead of
+ implementing an extensible system where predicates are translated
+ to Lisp functions, and new predicates can be added by extending a
+ list of functions, because I really couldn't imagine any useful
+ predicates besides equal and match. If we later found out that
+ such system is indeed useful and necessary, it can be easily
+ added. */
+
+/* If all predicates in PREDICATES passes, return true; otherwise
+ return false. */
+bool
+ts_eval_predicates (Lisp_Object captures, Lisp_Object predicates)
+{
+ bool pass = true;
+ /* Evaluate each predicates. */
+ for (Lisp_Object tail = predicates;
+ !NILP (tail); tail = XCDR (tail))
+ {
+ Lisp_Object predicate = XCAR (tail);
+ Lisp_Object fn = XCAR (predicate);
+ Lisp_Object args = XCDR (predicate);
+ if (!NILP (Fstring_equal (fn, build_pure_c_string("equal"))))
+ pass = ts_predicate_equal (args, captures);
+ else if (!NILP (Fstring_equal
+ (fn, build_pure_c_string("match"))))
+ pass = ts_predicate_match (args, captures);
+ else
+ xsignal3 (Qtree_sitter_query_error,
+ build_pure_c_string ("Invalid predicate"),
+ fn, build_pure_c_string ("Currently Emacs only supports equal and match predicate"));
+ }
+ /* If all predicates passed, add captures to result list. */
+ return pass;
+}
+
+DEFUN ("tree-sitter-query-capture",
+ Ftree_sitter_query_capture,
+ Stree_sitter_query_capture, 2, 4, 0,
+ doc: /* Query NODE with patterns in QUERY.
+
+Return a list of (CAPTURE_NAME . NODE). CAPTURE_NAME is the name
+assigned to the node in PATTERN. NODE is the captured node.
+
+QUERY is either a string query or a sexp query. See Info node
+`(elisp)Pattern Matching' for how to write a query in either string or
+s-expression form.
+
+BEG and END, if both non-nil, specifies the range in which the query
+is executed.
+
+Raise an tree-sitter-query-error if QUERY is malformed, or something
+else goes wrong. */)
+ (Lisp_Object node, Lisp_Object query,
+ Lisp_Object beg, Lisp_Object end)
+{
+ ts_check_node (node);
+ if (!NILP (beg))
+ CHECK_INTEGER (beg);
+ if (!NILP (end))
+ CHECK_INTEGER (end);
+
+ if (CONSP (query))
+ query = Ftree_sitter_expand_query (query);
+ else
+ CHECK_STRING (query);
+
+ /* Extract C values from Lisp objects. */
+ TSNode ts_node = XTS_NODE (node)->node;
+ Lisp_Object lisp_parser = XTS_NODE (node)->parser;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ const TSLanguage *lang = ts_parser_language
+ (XTS_PARSER (lisp_parser)->parser);
+ char *source = SSDATA (query);
+
+ /* Initialize query objects, and execute query. */
+ uint32_t error_offset;
+ TSQueryError error_type;
+ /* TODO: We could cache the query object, so that repeatedly
+ querying with the same query can reuse the query object. It also
+ saves us from expanding the sexp query into a string. I don't
+ know how much time that could save though. */
+ TSQuery *ts_query = ts_query_new (lang, source, strlen (source),
+ &error_offset, &error_type);
+ TSQueryCursor *cursor = ts_query_cursor_new ();
+
+ if (ts_query == NULL)
+ {
+ xsignal2 (Qtree_sitter_query_error,
+ build_string (ts_query_error_to_string (error_type)),
+ make_fixnum (error_offset + 1));
+ }
+ if (!NILP (beg) && !NILP (end))
+ {
+ EMACS_INT beg_byte = XFIXNUM (beg);
+ EMACS_INT end_byte = XFIXNUM (end);
+ ts_query_cursor_set_byte_range
+ (cursor, (uint32_t) beg_byte - visible_beg,
+ (uint32_t) end_byte - visible_beg);
+ }
+
+ ts_query_cursor_exec (cursor, ts_query, ts_node);
+ TSQueryMatch match;
+
+ /* Go over each match, collect captures and predicates. Include the
+ captures in the return list if all predicates in that match
+ passes. */
+ Lisp_Object result = Qnil;
+ while (ts_query_cursor_next_match (cursor, &match))
+ {
+ /* Get captured nodes. */
+ Lisp_Object captures_lisp = Qnil;
+ const TSQueryCapture *captures = match.captures;
+ for (int idx=0; idx < match.capture_count; idx++)
+ {
+ uint32_t capture_name_len;
+ TSQueryCapture capture = captures[idx];
+ Lisp_Object captured_node =
+ make_ts_node(lisp_parser, capture.node);
+ const char *capture_name = ts_query_capture_name_for_id
+ (ts_query, capture.index, &capture_name_len);
+ Lisp_Object cap =
+ Fcons (intern_c_string_1 (capture_name, capture_name_len),
+ captured_node);
+ captures_lisp = Fcons (cap, captures_lisp);
+ }
+ /* Get predicates. */
+ Lisp_Object predicates =
+ ts_predicates_for_pattern (ts_query, match.pattern_index);
+
+ captures_lisp = Fnreverse (captures_lisp);
+ if (ts_eval_predicates (captures_lisp, predicates))
+ {
+ result = CALLN (Fnconc, result, captures_lisp);
+ }
+ }
+ ts_query_delete (ts_query);
+ ts_query_cursor_delete (cursor);
+ return result;
+}
+
+/*** Initialization */
+
+/* Initialize the tree-sitter routines. */
+void
+syms_of_tree_sitter (void)
+{
+ DEFSYM (Qtree_sitter_parser_p, "tree-sitter-parser-p");
+ DEFSYM (Qtree_sitter_node_p, "tree-sitter-node-p");
+ DEFSYM (Qnamed, "named");
+ DEFSYM (Qmissing, "missing");
+ DEFSYM (Qextra, "extra");
+ DEFSYM (Qhas_changes, "has-changes");
+ DEFSYM (Qhas_error, "has-error");
+
+ DEFSYM (Qtree_sitter_error, "tree-sitter-error");
+ DEFSYM (Qtree_sitter_query_error, "tree-sitter-query-error");
+ DEFSYM (Qtree_sitter_parse_error, "tree-sitter-parse-error");
+ DEFSYM (Qtree_sitter_range_invalid, "tree-sitter-range-invalid");
+ DEFSYM (Qtree_sitter_buffer_too_large,
+ "tree-sitter-buffer-too-large");
+ DEFSYM (Qtree_sitter_load_language_error,
+ "tree-sitter-load-language-error");
+ DEFSYM (Qtree_sitter_node_outdated,
+ "tree-sitter-node-outdated");
+
+ define_error (Qtree_sitter_error, "Generic tree-sitter error", Qerror);
+ define_error (Qtree_sitter_query_error, "Query pattern is malformed",
+ Qtree_sitter_error);
+ /* Should be impossible, no need to document this error. */
+ define_error (Qtree_sitter_parse_error, "Parse failed",
+ Qtree_sitter_error);
+ define_error (Qtree_sitter_range_invalid,
+ "RANGES are invalid, they have to be ordered and not overlapping",
+ Qtree_sitter_error);
+ define_error (Qtree_sitter_buffer_too_large, "Buffer too large (> 4GB)",
+ Qtree_sitter_error);
+ define_error (Qtree_sitter_load_language_error,
+ "Cannot load language definition",
+ Qtree_sitter_error);
+ define_error (Qtree_sitter_node_outdated,
+ "This node is outdated, please retrieve a new one",
+ Qtree_sitter_error);
+
+ DEFSYM (Qtree_sitter_parser_list, "tree-sitter-parser-list");
+ DEFVAR_LISP ("tree-sitter-parser-list", Vtree_sitter_parser_list,
+ doc: /* A list of tree-sitter parsers.
+
+If you removed a parser from this list, do not put it back in. Emacs
+keeps the parser in this list updated with any change in the buffer.
+If removed and put back in, there is no guarantee that the parser is in
+sync with the buffer's content. */);
+ Vtree_sitter_parser_list = Qnil;
+ Fmake_variable_buffer_local (Qtree_sitter_parser_list);
+
+ DEFVAR_LISP ("tree-sitter-load-name-override-list",
+ Vtree_sitter_load_name_override_list,
+ doc:
+ /* An override alist for irregular tree-sitter libraries.
+
+By default, Emacs assumes the dynamic library for tree-sitter-<lang>
+is libtree-sitter-<lang>.<ext>, where <ext> is the OS specific
+extension for dynamic libraries. Emacs also assumes that the name of
+the C function the library provides is tree_sitter_<lang>. If that is
+not the case, add an entry
+
+ (LANGUAGE-SYMBOL LIBRARY-BASE-NAME FUNCTION-NAME)
+
+to this alist, where LIBRARY-BASE-NAME is the filename of the dynamic
+library without extension, FUNCTION-NAME is the function provided by
+the library. */);
+ Vtree_sitter_load_name_override_list = Qnil;
+
+ defsubr (&Stree_sitter_language_available_p);
+
+ defsubr (&Stree_sitter_parser_p);
+ defsubr (&Stree_sitter_node_p);
+
+ defsubr (&Stree_sitter_node_parser);
+
+ defsubr (&Stree_sitter_parser_create);
+ defsubr (&Stree_sitter_parser_buffer);
+ defsubr (&Stree_sitter_parser_language);
+
+ defsubr (&Stree_sitter_parser_root_node);
+ /* defsubr (&Stree_sitter_parse_string); */
+
+ defsubr (&Stree_sitter_parser_set_included_ranges);
+ defsubr (&Stree_sitter_parser_included_ranges);
+
+ defsubr (&Stree_sitter_node_type);
+ defsubr (&Stree_sitter_node_start);
+ defsubr (&Stree_sitter_node_end);
+ defsubr (&Stree_sitter_node_string);
+ defsubr (&Stree_sitter_node_parent);
+ defsubr (&Stree_sitter_node_child);
+ defsubr (&Stree_sitter_node_check);
+ defsubr (&Stree_sitter_node_field_name_for_child);
+ defsubr (&Stree_sitter_node_child_count);
+ defsubr (&Stree_sitter_node_child_by_field_name);
+ defsubr (&Stree_sitter_node_next_sibling);
+ defsubr (&Stree_sitter_node_prev_sibling);
+ defsubr (&Stree_sitter_node_first_child_for_pos);
+ defsubr (&Stree_sitter_node_descendant_for_range);
+ defsubr (&Stree_sitter_node_eq);
+
+ defsubr (&Stree_sitter_expand_pattern);
+ defsubr (&Stree_sitter_expand_query);
+ defsubr (&Stree_sitter_query_capture);
+}
diff --git a/src/tree-sitter.h b/src/tree-sitter.h
new file mode 100644
index 0000000000..b9feb8d432
--- /dev/null
+++ b/src/tree-sitter.h
@@ -0,0 +1,134 @@
+/* Header file for the tree-sitter integration.
+
+Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#ifndef EMACS_TREE_SITTER_H
+#define EMACS_TREE_SITTER_H
+
+#include "lisp.h"
+
+#include <tree_sitter/api.h>
+
+INLINE_HEADER_BEGIN
+
+/* A wrapper for a tree-sitter parser, but also contains a parse tree
+ and other goodies for convenience. */
+struct Lisp_TS_Parser
+{
+ union vectorlike_header header;
+ /* A symbol represents the language this parser uses. It should be
+ the symbol of the function provided by a language dynamic
+ module. */
+ Lisp_Object language_symbol;
+ Lisp_Object buffer;
+ TSParser *parser;
+ TSTree *tree;
+ TSInput input;
+ /* Re-parsing an unchanged buffer is not free for tree-sitter, so we
+ only make it re-parse when need_reparse == true. That usually
+ means some change is made in the buffer. But others could set
+ this field to true to force tree-sitter to re-parse. */
+ bool need_reparse;
+ /* These two positions record the buffer byte position (count from
+ 1) of the "visible region" that tree-sitter sees. Unlike
+ markers, These two positions do not change as the user inserts
+ and deletes text around them. Before re-parse, we move these
+ positions to match BUF_BEGV_BYTE and BUF_ZV_BYTE. Note that we
+ don't need to synchronize these positions when retrieving them in
+ a function that involves a node: if the node is not outdated,
+ these positions are synchronized. */
+ ptrdiff_t visible_beg;
+ ptrdiff_t visible_end;
+ /* This counter is incremented every time a change is made to the
+ buffer in ts_record_change. The node retrieved from this parser
+ inherits this timestamp. This way we can make sure the node is
+ not outdated when we access its information. */
+ ptrdiff_t timestamp;
+};
+
+/* A wrapper around a tree-sitter node. */
+struct Lisp_TS_Node
+{
+ union vectorlike_header header;
+ /* This prevents gc from collecting the tree before the node is done
+ with it. TSNode contains a pointer to the tree it belongs to,
+ and the parser object, when collected by gc, will free that
+ tree. */
+ Lisp_Object parser;
+ TSNode node;
+ /* A node inherits its parser's timestamp at creation time. The
+ parser's timestamp increments as the buffer changes. This way we
+ can make sure the node is not outdated when we access its
+ information. */
+ ptrdiff_t timestamp;
+};
+
+INLINE bool
+TS_PARSERP (Lisp_Object x)
+{
+ return PSEUDOVECTORP (x, PVEC_TS_PARSER);
+}
+
+INLINE struct Lisp_TS_Parser *
+XTS_PARSER (Lisp_Object a)
+{
+ eassert (TS_PARSERP (a));
+ return XUNTAG (a, Lisp_Vectorlike, struct Lisp_TS_Parser);
+}
+
+INLINE bool
+TS_NODEP (Lisp_Object x)
+{
+ return PSEUDOVECTORP (x, PVEC_TS_NODE);
+}
+
+INLINE struct Lisp_TS_Node *
+XTS_NODE (Lisp_Object a)
+{
+ eassert (TS_NODEP (a));
+ return XUNTAG (a, Lisp_Vectorlike, struct Lisp_TS_Node);
+}
+
+INLINE void
+CHECK_TS_PARSER (Lisp_Object parser)
+{
+ CHECK_TYPE (TS_PARSERP (parser), Qtree_sitter_parser_p, parser);
+}
+
+INLINE void
+CHECK_TS_NODE (Lisp_Object node)
+{
+ CHECK_TYPE (TS_NODEP (node), Qtree_sitter_node_p, node);
+}
+
+void
+ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
+ ptrdiff_t new_end_byte);
+
+Lisp_Object
+make_ts_parser (Lisp_Object buffer, TSParser *parser,
+ TSTree *tree, Lisp_Object language_symbol);
+
+Lisp_Object
+make_ts_node (Lisp_Object parser, TSNode node);
+
+extern void syms_of_tree_sitter (void);
+
+INLINE_HEADER_END
+
+#endif /* EMACS_TREE_SITTER_H */
diff --git a/test/src/tree-sitter-tests.el b/test/src/tree-sitter-tests.el
new file mode 100644
index 0000000000..46e6e692eb
--- /dev/null
+++ b/test/src/tree-sitter-tests.el
@@ -0,0 +1,366 @@
+;;; tree-sitter-tests.el --- tests for src/tree-sitter.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2021 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Code:
+
+(require 'ert)
+(require 'tree-sitter)
+
+(ert-deftest tree-sitter-basic-parsing ()
+ "Test basic parsing routines."
+ (with-temp-buffer
+ (let ((parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json)))
+ (should
+ (eq parser (car tree-sitter-parser-list)))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(ERROR)"))
+
+ (insert "[1,2,3]")
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number) (number)))"))
+
+ (goto-char (point-min))
+ (forward-char 3)
+ (insert "{\"name\": \"Bob\"},")
+ (should
+ (equal
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number) (number)))")))))
+
+(ert-deftest tree-sitter-node-api ()
+ "Tests for node API."
+ (with-temp-buffer
+ (let (parser root-node doc-node object-node pair-node)
+ (progn
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
+ ;; `tree-sitter-node-type'.
+ (should (equal "document" (tree-sitter-node-type root-node)))
+ ;; `tree-sitter-node-check'.
+ (should (eq t (tree-sitter-node-check root-node 'named)))
+ (should (eq nil (tree-sitter-node-check root-node 'missing)))
+ (should (eq nil (tree-sitter-node-check root-node 'extra)))
+ (should (eq nil (tree-sitter-node-check root-node 'has-error)))
+ ;; `tree-sitter-node-child'.
+ (setq doc-node (tree-sitter-node-child root-node 0))
+ (should (equal "array" (tree-sitter-node-type doc-node)))
+ (should (equal (tree-sitter-node-string doc-node)
+ "(array (number) (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number))"))
+ ;; `tree-sitter-node-child-count'.
+ (should (eql 9 (tree-sitter-node-child-count doc-node)))
+ (should (eql 4 (tree-sitter-node-child-count doc-node t)))
+ ;; `tree-sitter-node-field-name-for-child'.
+ (setq object-node (tree-sitter-node-child doc-node 2 t))
+ (setq pair-node (tree-sitter-node-child object-node 0 t))
+ (should (equal "object" (tree-sitter-node-type object-node)))
+ (should (equal "pair" (tree-sitter-node-type pair-node)))
+ (should (equal "key"
+ (tree-sitter-node-field-name-for-child
+ pair-node 0)))
+ ;; `tree-sitter-node-child-by-field-name'.
+ (should (equal "(string (string_content))"
+ (tree-sitter-node-string
+ (tree-sitter-node-child-by-field-name
+ pair-node "key"))))
+ ;; `tree-sitter-node-next-sibling'.
+ (should (equal "(number)"
+ (tree-sitter-node-string
+ (tree-sitter-node-next-sibling object-node t))))
+ (should (equal "(\",\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-next-sibling object-node))))
+ ;; `tree-sitter-node-prev-sibling'.
+ (should (equal "(number)"
+ (tree-sitter-node-string
+ (tree-sitter-node-prev-sibling object-node t))))
+ (should (equal "(\",\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-prev-sibling object-node))))
+ ;; `tree-sitter-node-first-child-for-pos'.
+ (should (equal "(number)"
+ (tree-sitter-node-string
+ (tree-sitter-node-first-child-for-pos
+ doc-node 3 t))))
+ (should (equal "(\",\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-first-child-for-pos
+ doc-node 3))))
+ ;; `tree-sitter-node-descendant-for-range'.
+ (should (equal "(\"{\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-descendant-for-range
+ root-node 6 7))))
+ (should (equal "(object (pair key: (string (string_content)) value: (string (string_content))))"
+ (tree-sitter-node-string
+ (tree-sitter-node-descendant-for-range
+ root-node 6 7 t))))
+ ;; `tree-sitter-node-eq'.
+ (should (tree-sitter-node-eq root-node root-node))
+ (should (not (tree-sitter-node-eq root-node doc-node))))))
+
+(ert-deftest tree-sitter-query-api ()
+ "Tests for query API."
+ (with-temp-buffer
+ (let (parser root-node pattern doc-node object-node pair-node)
+ (progn
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
+
+ (dolist (pattern
+ '("(string) @string
+(pair key: (_) @keyword)
+((_) @bob (#match \"^B.b$\" @bob))
+(number) @number
+((number) @n3 (#equal \"3\" @n3)) "
+ ((string) @string
+ (pair key: (_) @keyword)
+ ((_) @bob (:match "^B.b$" @bob))
+ (number) @number
+ ((number) @n3 (:equal "3" @n3)))))
+ (should
+ (equal
+ '((number . "1") (number . "2")
+ (keyword . "\"name\"")
+ (string . "\"name\"")
+ (string . "\"Bob\"")
+ (bob . "Bob")
+ (number . "3")
+ (n3 . "3"))
+ (mapcar (lambda (entry)
+ (cons (car entry)
+ (tree-sitter-node-text
+ (cdr entry))))
+ (tree-sitter-query-capture root-node pattern))))
+ (should
+ (equal
+ "(type field: (_) @capture .) ? * + \"return\""
+ (tree-sitter-expand-query
+ '((type field: (_) @capture :anchor)
+ :? :* :+ "return"))))))))
+
+(ert-deftest tree-sitter-narrow ()
+ "Tests if narrowing works."
+ (with-temp-buffer
+ (let (parser root-node pattern doc-node object-node pair-node)
+ (progn
+ (insert "xxx[1,{\"name\": \"Bob\"},2,3]xxx")
+ (narrow-to-region (+ (point-min) 3) (- (point-max) 3))
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
+ ;; This test is from the basic test.
+ (should
+ (equal
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number) (number)))"))
+
+ (widen)
+ (goto-char (point-min))
+ (insert "ooo")
+ (should (equal "oooxxx[1,{\"name\": \"Bob\"},2,3]xxx"
+ (buffer-string)))
+ (delete-region 10 26)
+ (should (equal "oooxxx[1,2,3]xxx"
+ (buffer-string)))
+ (narrow-to-region (+ (point-min) 6) (- (point-max) 3))
+ ;; This test is also from the basic test.
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number) (number)))"))
+ (widen)
+ (goto-char (point-max))
+ (insert "[1,2]")
+ (should (equal "oooxxx[1,2,3]xxx[1,2]"
+ (buffer-string)))
+ (narrow-to-region (- (point-max) 5) (point-max))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number)))"))
+ (widen)
+ (goto-char (point-min))
+ (insert "[1]")
+ (should (equal "[1]oooxxx[1,2,3]xxx[1,2]"
+ (buffer-string)))
+ (narrow-to-region (point-min) (+ (point-min) 3))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number)))")))))
+
+(ert-deftest tree-sitter-range ()
+ "Tests if range works."
+ (with-temp-buffer
+ (let (parser root-node pattern doc-node object-node pair-node)
+ (progn
+ (insert "[[1],oooxxx[1,2,3],xxx[1,2]]")
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
+ (should-error
+ (tree-sitter-parser-set-included-ranges
+ parser '((1 . 6) (5 . 20)))
+ :type '(tree-sitter-range-invalid))
+
+ (tree-sitter-parser-set-included-ranges
+ parser '((1 . 6) (12 . 20) (23 . 29)))
+ (should (equal '((1 . 6) (12 . 20) (23 . 29))
+ (tree-sitter-parser-included-ranges parser)))
+ (should (equal "(document (array (array (number)) (array (number) (number) (number)) (array (number) (number))))"
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))))
+ ;; TODO: More tests.
+ )))
+
+(ert-deftest tree-sitter-multi-lang ()
+ "Tests if parsing multiple language works."
+ (with-temp-buffer
+ (let (html css js html-range css-range js-range)
+ (progn
+ (insert "<html><script>1</script><style>body {}</style></html>")
+ (setq html (tree-sitter-get-parser-create 'tree-sitter-html))
+ (setq css (tree-sitter-get-parser-create 'tree-sitter-css))
+ (setq js (tree-sitter-get-parser-create 'tree-sitter-javascript)))
+ ;; JavaScript.
+ (setq js-range
+ (tree-sitter-query-range
+ 'tree-sitter-html
+ '((script_element (raw_text) @capture))))
+ (should (equal '((15 . 16)) js-range))
+ (tree-sitter-parser-set-included-ranges js js-range)
+ (should (equal "(program (expression_statement (number)))"
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node js))))
+ ;; CSS.
+ (setq css-range
+ (tree-sitter-query-range
+ 'tree-sitter-html
+ '((style_element (raw_text) @capture))))
+ (should (equal '((32 . 39)) css-range))
+ (tree-sitter-parser-set-included-ranges css css-range)
+ (should
+ (equal "(stylesheet (rule_set (selectors (tag_name)) (block)))"
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node css))))
+ ;; TODO: More tests.
+ )))
+
+(ert-deftest tree-sitter-parser-supplemental ()
+ "Supplemental node functions."
+ ;; `tree-sitter-get-parser'.
+ (with-temp-buffer
+ (should (equal (tree-sitter-get-parser 'tree-sitter-json) nil)))
+ ;; `tree-sitter-get-parser-create'.
+ (with-temp-buffer
+ (should (not (equal (tree-sitter-get-parser-create 'tree-sitter-json)
+ nil))))
+ ;; `tree-sitter-parse-string'.
+ (should (equal (tree-sitter-node-string
+ (tree-sitter-parse-string
+ "[1,2,{\"name\": \"Bob\"},3]"
+ 'tree-sitter-json))
+ "(document (array (number) (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number)))"))
+ (with-temp-buffer
+ (let (parser root-node doc-node object-node pair-node)
+ (progn
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser))
+ (setq doc-node (tree-sitter-node-child root-node 0)))
+ ;; `tree-sitter-get-parser'.
+ (should (not (equal (tree-sitter-get-parser 'tree-sitter-json)
+ nil)))
+ ;; `tree-sitter-language-at'.
+ (should (equal (tree-sitter-language-at (point))
+ 'tree-sitter-json))
+ ;; `tree-sitter-set-ranges', `tree-sitter-get-ranges'.
+ (tree-sitter-set-ranges 'tree-sitter-json
+ '((1 . 2)))
+ (should (equal (tree-sitter-get-ranges 'tree-sitter-json)
+ '((1 . 2)))))))
+
+(ert-deftest tree-sitter-node-supplemental ()
+ "Supplemental node functions."
+ (let (parser root-node doc-node array-node)
+ (progn
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser))
+ (setq doc-node (tree-sitter-node-child root-node 0)))
+ ;; `tree-sitter-node-buffer'.
+ (should (equal (tree-sitter-node-buffer root-node)
+ (current-buffer)))
+ ;; `tree-sitter-node-language'.
+ (should (eq (tree-sitter-node-language root-node)
+ 'tree-sitter-json))
+ ;; `tree-sitter-node-at'.
+ (should (equal (tree-sitter-node-string
+ (tree-sitter-node-at 1 2 'tree-sitter-json))
+ "(\"[\")"))
+ ;; `tree-sitter-buffer-root-node'.
+ (should (tree-sitter-node-eq
+ (tree-sitter-buffer-root-node 'tree-sitter-json)
+ root-node))
+ ;; `tree-sitter-filter-child'.
+ (should (equal (mapcar
+ (lambda (node)
+ (tree-sitter-node-type node))
+ (tree-sitter-filter-child
+ doc-node (lambda (node)
+ (tree-sitter-node-check node 'named))))
+ '("number" "number" "object" "number")))
+ ;; `tree-sitter-node-text'.
+ (should (equal (tree-sitter-node-text doc-node)
+ "[1,2,{\"name\": \"Bob\"},3]"))
+ ;; `tree-sitter-node-index'.
+ (should (eq (tree-sitter-node-index doc-node)
+ 0))
+ ;; TODO:
+ ;; `tree-sitter-parent-until'
+ ;; `tree-sitter-parent-while'
+ ;; `tree-sitter-node-children'
+ ;; `tree-sitter-node-field-name'
+ ))
+
+;; TODO
+;; - Functions in tree-sitter.el
+;; - tree-sitter-load-name-override-list
+
+(provide 'tree-sitter-tests)
+;;; tree-sitter-tests.el ends here
^ permalink raw reply related [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-01-04 18:31 ` Yuan Fu
@ 2022-03-13 6:22 ` Yuan Fu
2022-03-13 6:25 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2022-03-13 6:22 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Theodor Thornhill, ubolonton,
Emacs developers, Philipp, Stefan Monnier, Yoav Marco,
Stephen Leake, John Yates
> On Jan 4, 2022, at 10:31 AM, Yuan Fu <casouri@gmail.com> wrote:
>
>>>
>>> How should I go about debugging this?
>>
>> Run the offending command under a debugger, and try to find out which
>> code in loadup.el causes this. On macOS, this is a bit tough, since
>> GDB doesn't work,so you cannot easily examine Lisp data using the
>> commands in src/.gdbinit.
>>
>> You could also try bisecting to find the offending commit.
>
> Thanks. I figured it out.
>
> Now that the tree-sitter integration is mostly done, anyone would like to look at the patch?
>
> Yuan
>
> <tree-sitter.patch>
It has been quite a while. I added some fixes to the patch and added full changeling. Anyone would like to have a look at it?
Thanks,
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-13 6:22 ` Yuan Fu
@ 2022-03-13 6:25 ` Yuan Fu
2022-03-13 7:13 ` Po Lu
2022-03-29 16:40 ` Eli Zaretskii
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-13 6:25 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Clément Pit-Claudel, Theodor Thornhill, ubolonton,
Emacs developers, Philipp, Stefan Monnier, Yoav Marco,
Stephen Leake, John Yates
[-- Attachment #1: Type: text/plain, Size: 201 bytes --]
>
> It has been quite a while. I added some fixes to the patch and added full changeling. Anyone would like to have a look at it?
>
> Thanks,
> Yuan
Forgot to attach the patch, here it is:
[-- Attachment #2: tree-sitter.patch --]
[-- Type: application/octet-stream, Size: 196615 bytes --]
From a82dc29351dbb86fdd1c75b0ca570b7a133efc12 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Sat, 12 Mar 2022 22:10:06 -0800
Subject: [PATCH] Merge tree-sitter intergration
* configure.ac (HAVE_TREE_SITTER, TREE_SITTER_OBJ): New variables.
(DYNAMIC_LIB_SUFFIX): new variable, I copied code from MODULES_SUFFIX
so the diff looks this way.
* doc/lispref/elisp.texi (Top): Add tree-sitter manual.
* doc/lispref/modes.texi (Font Lock Mode): mention tree-sitter.
(Parser-based Font Lock): New section.
(Auto-Indentation): Mention tree-sitter.
(Parser-based Indentation): New section.
* doc/lispref/parsing.texi (Parsing Program Source): New chapter.
* lisp/emacs-lisp/cl-preloaded.el (cl--typeof-types): Add
tree-sitter-parser and tree-sitter-node type.
* lisp/tree-sitter.el: New file.
* src/Makefile.in (TREE_SITTER_LIBS, TREE_SITTER_FLAGS,
TREE_SITTER_OBJ): New variables.
* src/alloc.c:
(cleanup_vector): Add cleanup code for tree-sitter-parser and
tree-sitter-node.
* src/casefiddle.c (casify_region): Notify tree-sitter parser of
buffer change.
* src/data.c (Ftype_of): Add tree-sitter-parser and tree-sitter-node type
(Qtree_sitter_parser, Qtree_sitter_node): New symbol.
* src/emacs.c (main): Add symbols in tree-sitter.c.
* src/eval.c (define_error): Move the function to here.
* src/insdel.c (insert_1_both, insert_from_string_1, insert_from_gap,
insert_from_buffer_1, replace_range, del_range_2): Notify tree-sitter
parser of buffer change.
* src/json.c (define_error): Move this function out.
* src/lisp.h (DEFINE_GDB_SYMBOL_BEGIN): Add tree-sitter-parser and
tree-sitter-node.
* src/lread.c (Vdynamic_library_suffixes): New variable.
* src/print.c (print_vectorlike): Add code for printing
tree-sitter-parser and tree-sitter-node.
* src/tree-sitter.c: New file.
* src/tree-sitter.h: New file.
* test/src/tree-sitter-tests.el: New file.
---
configure.ac | 60 +-
doc/lispref/elisp.texi | 12 +
doc/lispref/modes.texi | 263 ++++-
doc/lispref/parsing.texi | 1416 +++++++++++++++++++++++++++
lisp/emacs-lisp/cl-preloaded.el | 2 +
lisp/tree-sitter.el | 844 ++++++++++++++++
src/Makefile.in | 10 +-
src/alloc.c | 13 +
src/casefiddle.c | 12 +
src/data.c | 6 +
src/emacs.c | 4 +
src/eval.c | 13 +
src/insdel.c | 47 +-
src/json.c | 16 -
src/lisp.h | 9 +
src/lread.c | 8 +
src/print.c | 28 +
src/tree-sitter.c | 1601 +++++++++++++++++++++++++++++++
src/tree-sitter.h | 139 +++
test/src/tree-sitter-tests.el | 366 +++++++
20 files changed, 4831 insertions(+), 38 deletions(-)
create mode 100644 doc/lispref/parsing.texi
create mode 100644 lisp/tree-sitter.el
create mode 100644 src/tree-sitter.c
create mode 100644 src/tree-sitter.h
create mode 100644 test/src/tree-sitter-tests.el
diff --git a/configure.ac b/configure.ac
index a315eeb6bd..db6b8a81fd 100644
--- a/configure.ac
+++ b/configure.ac
@@ -457,6 +457,7 @@ AC_DEFUN
OPTION_DEFAULT_OFF([imagemagick],[compile with ImageMagick image support])
OPTION_DEFAULT_ON([native-image-api], [don't use native image APIs (GDI+ on Windows)])
OPTION_DEFAULT_IFAVAILABLE([json], [compile with native JSON support])
+OPTION_DEFAULT_IFAVAILABLE([tree-sitter], [compile with tree-sitter])
OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([harfbuzz],[don't use HarfBuzz for text shaping])
@@ -3087,6 +3088,23 @@ AC_DEFUN
AC_SUBST(JSON_CFLAGS)
AC_SUBST(JSON_OBJ)
+HAVE_TREE_SITTER=no
+TREE_SITTER_OBJ=
+
+if test "${with_tree_sitter}" != "no"; then
+ EMACS_CHECK_MODULES([TREE_SITTER], [tree-sitter >= 0.0],
+ [HAVE_TREE_SITTER=yes], [HAVE_TREE_SITTER=no])
+ if test "${HAVE_TREE_SITTER}" = yes; then
+ AC_DEFINE(HAVE_TREE_SITTER, 1, [Define if using tree-sitter.])
+ TREE_SITTER_LIBS=-ltree-sitter
+ TREE_SITTER_OBJ="tree-sitter.o"
+ fi
+fi
+
+AC_SUBST(TREE_SITTER_LIBS)
+AC_SUBST(TREE_SITTER_CFLAGS)
+AC_SUBST(TREE_SITTER_OBJ)
+
NOTIFY_OBJ=
NOTIFY_SUMMARY=no
@@ -3926,20 +3944,31 @@ AC_DEFUN
fi
AC_SUBST(LIBZ)
+### Dynamic library support
+case $opsys in
+ cygwin|mingw32) DYNAMIC_LIB_SUFFIX=".dll" ;;
+ darwin) DYNAMIC_LIB_SUFFIX=".dylib" ;;
+ *) DYNAMIC_LIB_SUFFIX=".so" ;;
+esac
+case "${opsys}" in
+ darwin) DYNAMIC_LIB_SECONDARY_SUFFIX='.so' ;;
+ *) DYNAMIC_LIB_SECONDARY_SUFFIX='' ;;
+esac
+AC_DEFINE_UNQUOTED(DYNAMIC_LIB_SUFFIX, "$DYNAMIC_LIB_SUFFIX",
+ [System extension for dynamic libraries])
+AC_DEFINE_UNQUOTED(DYNAMIC_LIB_SECONDARY_SUFFIX, "$DYNAMIC_LIB_SECONDARY_SUFFIX",
+ [Alternative system extension for dynamic libraries.])
+
+AC_SUBST(DYNAMIC_LIB_SUFFIX)
+AC_SUBST(DYNAMIC_LIB_SECONDARY_SUFFIX)
+
### Dynamic modules support
LIBMODULES=
HAVE_MODULES=no
MODULES_OBJ=
NEED_DYNLIB=no
-case $opsys in
- cygwin|mingw32) MODULES_SUFFIX=".dll" ;;
- darwin) MODULES_SUFFIX=".dylib" ;;
- *) MODULES_SUFFIX=".so" ;;
-esac
-case "${opsys}" in
- darwin) MODULES_SECONDARY_SUFFIX='.so' ;;
- *) MODULES_SECONDARY_SUFFIX='' ;;
-esac
+MODULES_SUFFIX="${DYNAMIC_LIB_SUFFIX}"
+MODULES_SECONDARY_SUFFIX="${DYNAMIC_LIB_SECONDARY_SUFFIX}"
if test "${with_modules}" != "no"; then
case $opsys in
gnu|gnu-linux)
@@ -3970,10 +3999,10 @@ AC_DEFUN
NEED_DYNLIB=yes
AC_DEFINE(HAVE_MODULES, 1, [Define to 1 if dynamic modules are enabled])
AC_DEFINE_UNQUOTED(MODULES_SUFFIX, "$MODULES_SUFFIX",
- [System extension for dynamic libraries])
+ [System extension for dynamic modules])
if test -n "${MODULES_SECONDARY_SUFFIX}"; then
AC_DEFINE_UNQUOTED(MODULES_SECONDARY_SUFFIX, "$MODULES_SECONDARY_SUFFIX",
- [Alternative system extension for dynamic libraries.])
+ [Alternative system extension for dynamic modules.])
fi
fi
AC_SUBST(MODULES_OBJ)
@@ -4333,6 +4362,12 @@ AC_DEFUN
*) MISSING="$MISSING json"
WITH_IFAVAILABLE="$WITH_IFAVAILABLE --with-json=ifavailable";;
esac
+case $with_tree_sitter,$HAVE_TREE_SITTER in
+ no,* | ifavailable,* | *,yes) ;;
+ *) MISSING="$MISSING tree-sitter"
+ WITH_IFAVAILABLE="$WITH_IFAVAILABLE --with-tree-sitter=ifavailable";;
+esac
+
if test "X${MISSING}" != X; then
# If we have a missing library, and we don't have pkg-config installed,
# the missing pkg-config may be the reason. Give the user a hint.
@@ -6263,7 +6298,7 @@ AC_DEFUN
optsep=
emacs_config_features=
for opt in ACL BE_APP CAIRO DBUS FREETYPE GCONF GIF GLIB GMP GNUTLS GPM GSETTINGS \
- HARFBUZZ IMAGEMAGICK JPEG JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 \
+ HARFBUZZ IMAGEMAGICK JPEG JSON TREE-SITTER LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 \
M17N_FLT MODULES NATIVE_COMP NOTIFY NS OLDXMENU PDUMPER PGTK PNG RSVG SECCOMP \
SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS \
UNEXEC WEBP X11 XAW3D XDBE XFT XIM XINPUT2 XPM XWIDGETS X_TOOLKIT \
@@ -6334,6 +6369,7 @@ AC_DEFUN
Does Emacs use -lxft? ${HAVE_XFT}
Does Emacs use -lsystemd? ${HAVE_LIBSYSTEMD}
Does Emacs use -ljansson? ${HAVE_JSON}
+ Does Emacs use -ltree-sitter? ${HAVE_TREE_SITTER}
Does Emacs use the GMP library? ${HAVE_GMP}
Does Emacs directly use zlib? ${HAVE_ZLIB}
Does Emacs have dynamic modules support? ${HAVE_MODULES}
diff --git a/doc/lispref/elisp.texi b/doc/lispref/elisp.texi
index 426bb6d017..7390352016 100644
--- a/doc/lispref/elisp.texi
+++ b/doc/lispref/elisp.texi
@@ -222,6 +222,7 @@ Top
* Non-ASCII Characters:: Non-ASCII text in buffers and strings.
* Searching and Matching:: Searching buffers for strings or regexps.
* Syntax Tables:: The syntax table controls word and list parsing.
+* Parsing Program Source:: Generate syntax tree for program sources.
* Abbrevs:: How Abbrev mode works, and its data structures.
* Threads:: Concurrency in Emacs Lisp.
@@ -1357,6 +1358,16 @@ Top
* Syntax Table Internals:: How syntax table information is stored.
* Categories:: Another way of classifying character syntax.
+Parsing Program Source
+
+* Language Definitions:: Loading tree-sitter language definitions.
+* Using Parser:: Introduction to parsers.
+* Retrieving Node:: Retrieving node from syntax tree.
+* Accessing Node:: Accessing node information.
+* Pattern Matching:: Pattern matching with query patterns.
+* Multiple Languages:: Parse text written in multiple languages.
+* Tree-sitter C API:: Compare the C API and the ELisp API.
+
Syntax Descriptors
* Syntax Class Table:: Table of syntax classes.
@@ -1701,6 +1712,7 @@ Top
@include searching.texi
@include syntax.texi
+@include parsing.texi
@include abbrevs.texi
@include threads.texi
@include processes.texi
diff --git a/doc/lispref/modes.texi b/doc/lispref/modes.texi
index c29936d5ca..c38782752e 100644
--- a/doc/lispref/modes.texi
+++ b/doc/lispref/modes.texi
@@ -2826,11 +2826,13 @@ Font Lock Mode
in which contexts. This section explains how to customize Font Lock for
a particular major mode.
- Font Lock mode finds text to highlight in two ways: through
-syntactic parsing based on the syntax table, and through searching
-(usually for regular expressions). Syntactic fontification happens
-first; it finds comments and string constants and highlights them.
-Search-based fontification happens second.
+ Font Lock mode finds text to highlight in three ways: through
+syntactic parsing based on the syntax table, through searching
+(usually for regular expressions), and through parsing based on a
+full-blown parser. Syntactic fontification happens first; it finds
+comments and string constants and highlights them. Search-based
+fontification happens second. Parser-based fontification can be
+optionally enabled and it will precede the other two fontifications.
@menu
* Font Lock Basics:: Overview of customizing Font Lock.
@@ -2845,6 +2847,7 @@ Font Lock Mode
* Syntactic Font Lock:: Fontification based on syntax tables.
* Multiline Font Lock:: How to coerce Font Lock into properly
highlighting multiline constructs.
+* Parser-based Font Lock:: Use a parser for fontification.
@end menu
@node Font Lock Basics
@@ -3735,6 +3738,89 @@ Region to Refontify
reasonably fast.
@end defvar
+@node Parser-based Font Lock
+@subsection Parser-based Font Lock
+
+@c This node is written when the only parser Emacs has is tree-sitter,
+@c if in the future more parser are supported, feel free to reorganize
+@c and rewrite this node to describe multiple parsers in parallel.
+
+Besides simple syntactic font lock and search-based font lock, Emacs
+also provides complete syntactic font lock with the help of a parser,
+currently provided by the tree-sitter library (@pxref{Parsing Program
+Source}). Because it is an optional feature, parser-based font lock
+is less integrated with Emacs. Most variables introduced in previous
+sections only apply to search-based font lock, except for
+@var{font-lock-maximum-decoration}.
+
+@defun tree-sitter-font-lock-enable
+This function enables parser-based font lock in the current buffer.
+@end defun
+
+Parser-based font lock and other font lock mechanism are not mutually
+exclusive. By default, if enabled, parser-based font lock runs first,
+then the simple syntactic font lock (if enabled), then search-based
+font lock.
+
+Although parser-based font lock doesn't share the same customization
+variables with search-based font lock, parser-based font lock uses
+similar customization schemes. Just like @var{font-lock-keywords} and
+@var{font-lock-defaults}, parser-based font lock has
+@var{tree-sitter-font-lock-settings} and
+@var{tree-sitter-font-lock-defaults}.
+
+@defvar tree-sitter-font-lock-settings
+A list of @var{setting}s for tree-sitter font lock.
+
+Each @var{setting} should look like
+
+@example
+(@var{language} @var{query})
+@end example
+
+Each @var{setting} controls one parser (often of different language).
+And @var{language} is the language symbol (@pxref{Language
+Definitions}); @var{query} is either a string query or a sexp query
+(@pxref{Pattern Matching}).
+
+Capture names in @var{query} should be face names like
+@code{font-lock-keyword-face}. The captured node will be fontified
+with that face. Capture names can also be function names, in which
+case the function is called with (@var{start} @var{end} @var{node}),
+where @var{start} and @var{end} are the start and end position of the
+node in buffer, and @var{node} is the tree-sitter node object. If a
+capture name is both a face and a function, face takes priority.
+
+Generally, major modes should set @var{tree-sitter-font-lock-defaults},
+and let Emacs automatically populate this variable.
+@end defvar
+
+@defvar tree-sitter-font-lock-defaults
+This variable stores defaults for tree-sitter font Lock. It is a list
+of
+
+@example
+(@var{default} @var{:keyword} @var{value}...)
+@end example
+
+A @var{default} may be a symbol or a list of symbols (for different
+levels of fontification). The symbol(s) can be a variable or a
+function. If a symbol is both a variable and a function, it is used
+as a function. Different levels of fontification can be controlled by
+@var{font-lock-maximum-decoration}.
+
+The symbol(s) in @var{default} should contain or return a
+@var{setting} as described in @var{tree-sitter-font-lock-settings}.
+
+The rest @var{keyword}s and @var{value}s are additional settings that
+could be used to alter the fontification behavior. Currently there
+aren't any.
+@end defvar
+
+Multi-language major modes should provide range functions in
+@var{tree-sitter-range-functions}, and Emacs will set the ranges
+accordingly before fontifing a region (@pxref{Multiple Languages}).
+
@node Auto-Indentation
@section Automatic Indentation of code
@@ -3791,10 +3877,12 @@ Auto-Indentation
so if your language seems somewhat similar to one of those languages,
you might try to use that engine. @c FIXME: documentation?
Another one is SMIE which takes an approach in the spirit
-of Lisp sexps and adapts it to non-Lisp languages.
+of Lisp sexps and adapts it to non-Lisp languages. Yet another one is
+to rely on a full-blown parser, for example, the tree-sitter library.
@menu
* SMIE:: A simple minded indentation engine.
+* Parser-based indentation:: Parser-based indentation engine.
@end menu
@node SMIE
@@ -4454,6 +4542,169 @@ SMIE Customization
@code{eval: (smie-config-local '(@var{rules}))}.
@end defun
+@node Parser-based Indentation
+@subsection Parser-based Indentation
+
+@c This node is written when the only parser Emacs has is tree-sitter,
+@c if in the future more parser are supported, feel free to reorganize
+@c and rewrite this node to describe multiple parsers in parallel.
+
+When built with the tree-sitter library (@pxref{Parsing Program
+Source}), Emacs could parse program source and produce a syntax tree.
+And this syntax tree can be used for indentation. For maximum
+flexibility, we could write a custom indent function that queries the
+syntax tree and indents accordingly for each language, but that would
+be a lot of work. It is more convenient to use the simple indentation
+engine described below: we only need to write some indentation rules
+and the engine takes care of the rest.
+
+To enable the indentation engine, set the value of
+@var{indent-line-function} to @code{tree-sitter-indent}.
+
+@defvar tree-sitter-indent-function
+This variable stores the actual function called by
+@code{tree-sitter-indent}. By default, its value is
+@code{tree-sitter-simple-indent}. In the future we might add other
+more complex indentation engines, if @code{tree-sitter-simple-indent}
+proves to be insufficient.
+@end defvar
+
+@heading Writing indentation rules
+
+@defvar tree-sitter-simple-indent-rules
+This local variable stores indentation rules for every language. It is
+a list of
+
+@example
+(@var{language} . @var{rules})
+@end example
+
+where @var{language} is a language symbol, @var{rules} is a list of
+
+@example
+(@var{matcher} @var{anchor} @var{offset})
+@end example
+
+The @var{matcher} determines whether this rule applies, @var{anchor}
+and @var{offset} together determines which column to indent to.
+
+A @var{matcher} is a function that takes three arguments (@var{node}
+@var{parent} @var{bol}). Argument @var{bol} is the point at where we
+are indenting: the position of the first non-whitespace character from
+the beginning of line; @var{node} is the largest (highest-in-tree)
+node that starts at that point; @var{parent} is the parent of
+@var{node};
+
+If @var{matcher} returns non-nil, meaning the rule matches, Emacs then
+uses @var{anchor} to find an anchor, it should be a function that
+takes the same argument (@var{node} @var{parent} @var{bol}) and
+returns a point.
+
+Finally Emacs computes the column of that point returned by
+@var{anchor} and adds @var{offset} to it, and indents to that column.
+
+For @var{matcher} and @var{anchor}, Emacs provides some convenient
+presets to spare us from writing these functions ourselves. They are
+stored in @var{tree-sitter-simple-indent-presets}, see below.
+@end defvar
+
+@defvar tree-sitter-simple-indent-presets
+This is a list of presets for @var{matcher}s and @var{anchor}s in
+@var{tree-sitter-simple-indent-rules}. Each of them represent a
+function that takes @var{node}, @var{parent} and @var{bol} as
+arguments.
+
+@example
+(match @var{node-type} @var{parent-type}
+ @var{node-field} @var{node-index-min} @var{node-index-max})
+@end example
+
+This matcher checks if @var{node}'s type is @var{node-type},
+@var{parent}'s type is @var{parent-type}, @var{node}'s field name in
+@var{parent} is @var{node-field}, and @var{node}'s index among its
+siblings is between @var{node-index-min} and @var{node-index-max}. If
+the value of a constraint is nil, this matcher doesn't check for that
+constraint. For example, to match the first child where parent is
+@code{argument_list}, use
+
+@example
+(match nil "argument_list" nil nil 0 0)
+@end example
+
+@example
+no-node
+@end example
+
+This matcher matches the case where @var{node} is nil, i.e., there is
+no node that starts at @var{bol}. This is the case when @var{bol} is
+at an empty line or inside a multi-line string, etc.
+
+@example
+(parent-is @var{type})
+@end example
+
+This matcher matches if @var{parent}'s type is @var{type}.
+
+@example
+(node-is @var{type})
+@end example
+
+This matcher matches if @var{node}'s type is @var{type}.
+
+@example
+(query @var{query})
+@end example
+
+This matcher matches if querying @var{parent} with @var{query}
+captures @var{node}. The capture name does not matter.
+
+@example
+first-sibling
+@end example
+
+This anchor returns the start of the first child of @var{parent}.
+
+@example
+parent
+@end example
+
+This anchor returns the start of @var{parent}.
+
+@example
+prev-sibling
+@end example
+
+This anchor returns the start of the previous sibling of @var{node}.
+
+@example
+no-indent
+@end example
+
+This anchor returns the start of @var{node}, i.e., do not indent.
+
+@example
+prev-line
+@end example
+
+This anchor returns the start of the first named node on the previous
+line. This can be used for indenting an empty line.
+@end defvar
+
+@heading Indentation utilities
+
+Here are some utility functions that can help writing indentation
+rules.
+
+@defun tree-sitter-check-indent mode
+This function check current buffer's indentation against major mode
+@var{mode}. It indents the current line in @var{mode} and compares
+the indentation with the current indentation. Then it pops up a diff
+buffer showing the difference. Correct indentation (target) is in
+green, current indentation is in red.
+@end defun
+
+It is also helpful to use @code{tree-sitter-inspect-mode} when writing
+indentation rules.
@node Desktop Save Mode
@section Desktop Save Mode
diff --git a/doc/lispref/parsing.texi b/doc/lispref/parsing.texi
new file mode 100644
index 0000000000..4ebb13ac5e
--- /dev/null
+++ b/doc/lispref/parsing.texi
@@ -0,0 +1,1416 @@
+@c -*- mode: texinfo; coding: utf-8 -*-
+@c This is part of the GNU Emacs Lisp Reference Manual.
+@c Copyright (C) 2021 Free Software Foundation, Inc.
+@c See the file elisp.texi for copying conditions.
+@node Parsing Program Source
+@chapter Parsing Program Source
+
+Emacs provides various ways to parse program source text and produce a
+@dfn{syntax tree}. In a syntax tree, text is no longer a
+one-dimensional stream but a structured tree of nodes, where each node
+representing a piece of text. Thus a syntax tree can enable
+interesting features like precise fontification, indentation,
+navigation, structured editing, etc.
+
+Emacs has a simple facility for parsing balanced expressions
+(@pxref{Parsing Expressions}). There is also SMIE library for generic
+navigation and indentation (@pxref{SMIE}).
+
+Emacs also provides integration with tree-sitter library
+(@uref{https://tree-sitter.github.io/tree-sitter}) if compiled with
+it. The tree-sitter library implements an incremental parser and has
+support from a wide range of programming languages.
+
+@defun tree-sitter-available-p
+This function returns non-nil if tree-sitter features are available
+for this Emacs instance.
+@end defun
+
+For using tree-sitter features in font-lock and indentation,
+@pxref{Parser-based Font Lock}, @pxref{Parser-based Indentation}.
+
+To access the syntax tree of the text in a buffer, we need to first
+load a language definition and create a parser with it. Next, we can
+query the parser for specific nodes in the syntax tree. Then, we can
+access various information about the node, and we can pattern-match a
+node with a powerful syntax. Finally, we explain how to work with
+source files that mixes multiple languages. The following sections
+explain how to do each of the tasks in detail.
+
+@menu
+* Language Definitions:: Loading tree-sitter language definitions.
+* Using Parser:: Introduction to parsers.
+* Retrieving Node:: Retrieving node from syntax tree.
+* Accessing Node:: Accessing node information.
+* Pattern Matching:: Pattern matching with query patterns.
+* Multiple Languages:: Parse text written in multiple languages.
+* Tree-sitter C API:: Compare the C API and the ELisp API.
+@end menu
+
+@node Language Definitions
+@section Tree-sitter Language Definitions
+
+@heading Loading a language definition
+
+Tree-sitter relies on language definitions to parse text in that
+language. In Emacs, A language definition is represented by a symbol
+@code{tree-sitter-<language>}. For example, C language definition is
+represented as @code{tree-sitter-c}, and @code{tree-sitter-c} can be
+passed to tree-sitter functions as the @var{language} argument.
+
+@vindex tree-sitter-load-language-error
+Tree-sitter language definitions are distributed as dynamic
+libraries. In order to use a language definition in Emacs, you need to
+make sure that the dynamic library is installed on the system, either
+in standard locations or in @code{LD_LIBRARY_PATH} (on some systems,
+it is @code{DYLD_LIBRARY_PATH}). If Emacs cannot find the library or
+has problem loading it, Emacs signals
+@var{tree-sitter-load-language-error}. The signal data is a list of
+specific error messages.
+
+@defun tree-sitter-language-available-p language
+This function checks whether the dynamic library for @var{language} is
+present on the system, and return non-nil if it is.
+@end defun
+
+@vindex tree-sitter-load-name-override-list
+By convention, the dynamic library for @code{tree-sitter-<language>}
+is @code{libtree-sitter-<language>.@var{ext}}, where @var{ext} is the
+system-specific extension for dynamic libraries. Also by convention,
+the function provided by that library is named
+@code{tree_sitter_<language>}. If a language definition doesn't
+follow this convention, you should add an entry
+
+@example
+(@var{language-symbol} @var{library-base-name} @var{function-name})
+@end example
+
+to @var{tree-sitter-load-name-override-list}, where
+@var{library-base-name} is the base filename for the dynamic library
+(conventionally @code{libtree-sitter-<language>}), and
+@var{function-name} is the function provided by the library
+(conventionally @code{tree_sitter_<language>}). For example,
+
+@example
+(tree-sitter-cool-lang "libtree-sitter-coool" "tree_sitter_coool")
+@end example
+
+for a language too cool to abide by the rules.
+
+@heading Concrete syntax tree
+
+A syntax tree is what a language definition defines (more or less) and
+what a parser generates. In a syntax tree, each node represents a
+piece of text, and is connected to each other by a parent-child
+relationship. For example, if the source text is
+
+@example
+1 + 2
+@end example
+
+@noindent
+its syntax tree could be
+
+@example
+@group
+ +--------------+
+ | root "1 + 2" |
+ +--------------+
+ |
+ +--------------------------------+
+ | expression "1 + 2" |
+ +--------------------------------+
+ | | |
++------------+ +--------------+ +------------+
+| number "1" | | operator "+" | | number "2" |
++------------+ +--------------+ +------------+
+@end group
+@end example
+
+We can also represent it in s-expression:
+
+@example
+(root (expression (number) (operator) (number)))
+@end example
+
+@subheading Node types
+
+@cindex tree-sitter node type
+@anchor{tree-sitter node type}
+@cindex tree-sitter named node
+@anchor{tree-sitter named node}
+@cindex tree-sitter anonymous node
+Names like @code{root}, @code{expression}, @code{number},
+@code{operator} are nodes' @dfn{type}. However, not all nodes in a
+syntax tree have a type. Nodes that don't are @dfn{anonymous nodes},
+and nodes with a type are @dfn{named nodes}. Anonymous nodes are
+tokens with fixed spellings, including punctuation characters like
+bracket @samp{]}, and keywords like @code{return}.
+
+@subheading Field names
+
+@cindex tree-sitter node field name
+@anchor{tree-sitter node field name} To make the syntax tree easier to
+analyze, many language definitions assign @dfn{field names} to child
+nodes. For example, a @code{function_definition} node could have a
+@code{declarator} and a @code{body}:
+
+@example
+@group
+(function_definition
+ declarator: (declaration)
+ body: (compound_statement))
+@end group
+@end example
+
+@deffn Command tree-sitter-inspect-mode
+This minor mode displays the node that @emph{starts} at point in
+mode-line. The mode-line will display
+
+@example
+@var{parent} @var{field-name}: (@var{child} (@var{grand-child} (...)))
+@end example
+
+@var{child}, @var{grand-child}, and @var{grand-grand-child}, etc, are
+nodes that have their beginning at point. And @var{parent} is the
+parent of @var{child}.
+
+If there is no node that starts at point, i.e., point is in the middle
+of a node, then the mode-line only displays the smallest node that
+spans point, and its immediate parent.
+
+This minor mode doesn't create parsers on its own. It simply uses the
+first parser in @var{tree-sitter-parser-list} (@pxref{Using Parser}).
+@end deffn
+
+@heading Reading the grammar definition
+
+Authors of language definitions define the @dfn{grammar} of a
+language, and this grammar determines how does a parser construct a
+concrete syntax tree out of the text. In order to used the syntax
+tree effectively, we need to read the @dfn{grammar file}.
+
+The grammar file is usually @code{grammar.js} in a language
+definition’s project repository. The link to a language definition’s
+home page can be found in tree-sitter’s homepage
+(@uref{https://tree-sitter.github.io/tree-sitter}).
+
+The grammar is written in JavaScript syntax. For example, the rule
+matching a @code{function_definition} node looks like
+
+@example
+@group
+function_definition: $ => seq(
+ $.declaration_specifiers,
+ field('declarator', $.declaration),
+ field('body', $.compound_statement)
+)
+@end group
+@end example
+
+The rule is represented by a function that takes a single argument
+@var{$}, representing the whole grammar. The function itself is
+constructed by other functions: the @code{seq} function puts together a
+sequence of children; the @code{field} function annotates a child with
+a field name. If we write the above definition in BNF syntax, it
+would look like
+
+@example
+@group
+function_definition :=
+ <declaration_specifiers> <declaration> <compound_statement>
+@end group
+@end example
+
+@noindent
+and the node returned by the parser would look like
+
+@example
+@group
+(function_definition
+ (declaration_specifier)
+ declarator: (declaration)
+ body: (compound_statement))
+@end group
+@end example
+
+Below is a list of functions that one will see in a grammar
+definition. Each function takes other rules as arguments and returns
+a new rule.
+
+@itemize @bullet
+@item
+@code{seq(rule1, rule2, ...)} matches each rule one after another.
+
+@item
+@code{choice(rule1, rule2, ...)} matches one of the rules in its
+arguments.
+
+@item
+@code{repeat(rule)} matches @var{rule} for @emph{zero or more} times.
+This is like the @samp{*} operator in regular expressions.
+
+@item
+@code{repeat1(rule)} matches @var{rule} for @emph{one or more} times.
+This is like the @samp{+} operator in regular expressions.
+
+@item
+@code{optional(rule)} matches @var{rule} for @emph{zero or one} time.
+This is like the @samp{?} operator in regular expressions.
+
+@item
+@code{field(name, rule)} assigns field name @var{name} to the child
+node matched by @var{rule}.
+
+@item
+@code{alias(rule, alias)} makes nodes matched by @var{rule} appear as
+@var{alias} in the syntax tree generated by the parser. For example,
+
+@example
+alias(preprocessor_call_exp, call_expression)
+@end example
+
+makes any node matched by @code{preprocessor_call_exp} to appear as
+@code{call_expression}.
+@end itemize
+
+Below are grammar functions less interesting for a reader of a
+language definition.
+
+@itemize
+@item
+@code{token(rule)} marks @var{rule} to produce a single leaf node.
+That is, instead of generating a parent node with individual child
+nodes under it, everything is combined into a single leaf node.
+
+@item
+Normally, grammar rules ignore preceding whitespaces,
+@code{token.immediate(rule)} changes @var{rule} to match only when
+there is no preceding whitespaces.
+
+@item
+@code{prec(n, rule)} gives @var{rule} a level @var{n} precedence.
+
+@item
+@code{prec.left([n,] rule)} marks @var{rule} as left-associative,
+optionally with level @var{n}.
+
+@item
+@code{prec.right([n,] rule)} marks @var{rule} as right-associative,
+optionally with level @var{n}.
+
+@item
+@code{prec.dynamic(n, rule)} is like @code{prec}, but the precedence
+is applied at runtime instead.
+@end itemize
+
+The tree-sitter project talks about writing a grammar in more detail:
+@uref{https://tree-sitter.github.io/tree-sitter/creating-parsers}.
+Read especially ``The Grammar DSL'' section.
+
+@node Using Parser
+@section Using Tree-sitter Parser
+@cindex Tree-sitter parser
+
+This section described how to create and configure a tree-sitter
+parser. In Emacs, each tree-sitter parser is associated with a
+buffer. As we edit the buffer, the associated parser is automatically
+kept up-to-date.
+
+@defvar tree-sitter-disabled-modes
+Before creating a parser, it is perhaps good to check whether we
+should use tree-sitter at all. Sometimes a user don't want to use
+tree-sitter features for a major mode. To turn-off tree-sitter for a
+mode, they add that mode to this variable.
+@end defvar
+
+@defvar tree-sitter-maximum-size
+If users want to turn off tree-sitter for buffers larger than a
+particular size (because tree-sitter consumes memory ~10 times the
+buffer size for storing the syntax tree), they set this variable to
+that size.
+@end defvar
+
+@defun tree-sitter-should-enable-p &optional mode
+This function returns non-nil if @var{mode} (default to the current
+major mode) should activate tree-sitter features. The result depends
+on the value of @var{tree-sitter-disabled-modes} and
+@var{tree-sitter-maximum-size} described above. The result also
+depends on, of course, the result of @code{tree-sitter-avaliabe-p}.
+
+Writer of major modes or other packages are responsible for calling
+this function and determine whether to activate tree-sitter features.
+@end defun
+
+
+@cindex Creating tree-sitter parsers
+To create a parser, we provide a buffer to parse and the language to
+use (@pxref{Language Definitions}). Emacs provides several creation
+functions for different use cases.
+
+@defun tree-sitter-get-parser-create language
+This function is the most convenient one. It gives you a parser that
+recognizes @var{language} for the current buffer. The function
+checks if there already exists a parser suiting the need, and only
+creates a new one when it can't find one.
+
+@example
+@group
+;; Create a parser for C programming language.
+(tree-sitter-get-parser-create 'tree-sitter-c)
+ @c @result{} #<tree-sitter-parser for tree-sitter-c in *scratch*>
+@end group
+@end example
+@end defun
+
+@defun tree-sitter-get-parser language
+This function is like @code{tree-sitter-get-parser-create}, but it
+always creates a new parser.
+@end defun
+
+@defun tree-sitter-parser-create buffer language
+This function is the most primitive, requiring both the buffer to
+associate to, and the language to use. If @var{buffer} is nil, the
+current buffer is used.
+@end defun
+
+Given a parser, we can query information about it:
+
+@defun tree-sitter-parser-buffer parser
+Returns the buffer associated with @var{parser}.
+@end defun
+
+@defun tree-sitter-parser-language parser
+Returns the language that @var{parser} uses.
+@end defun
+
+@defun tree-sitter-parser-p object
+Checks if @var{object} is a tree-sitter parser. Return non-nil if it
+is, return nil otherwise.
+@end defun
+
+There is no need to explicitly parse a buffer, because parsing is done
+automatically and lazily. A parser only parses when we query for a
+node in its syntax tree. Therefore, when a parser is first created,
+it doesn't parse the buffer; instead, it waits until we query for a
+node for the first time. Similarly, when some change is made in the
+buffer, a parser doesn't re-parse immediately and only records some
+necessary information to later re-parse when necessary.
+
+@vindex tree-sitter-buffer-too-large
+When a parser do parse, it checks for the size of the buffer.
+Tree-sitter can only handle buffer no larger than about 4GB. If the
+size exceeds that, Emacs signals @var{tree-sitter-buffer-too-large}
+with signal data being the buffer size.
+
+@vindex tree-sitter-parser-list
+Once a parser is created, Emacs automatically adds it to the
+buffer-local variable @var{tree-sitter-parser-list}. Every time a
+change is made to the buffer, Emacs updates parsers in this list so
+they can update their syntax tree incrementally. Therefore, one must
+not remove parsers from this list and put the parser back in: if any
+change is made when that parser is absent, the parser will be
+permanently out-of-sync with the buffer content, and shouldn't be used
+anymore.
+
+@cindex tree-sitter narrowing
+@anchor{tree-sitter narrowing} Normally, a parser ``sees'' the whole
+buffer, but when the buffer is narrowed (@pxref{Narrowing}), the
+parser will only see the visible region. As far as the parser can
+tell, the hidden region is deleted. And when the buffer is later
+widened, the parser thinks text is inserted in the beginning and in
+the end. Although parsers respect narrowing, narrowing shouldn't be
+the mean to handle a multi-language buffer; instead, set the ranges in
+which a parser should operate in. @xref{Multiple Languages}.
+
+Because a parser parses lazily, when we narrow the buffer, the parser
+doesn't act immediately; as long as we don't query for a node while
+the buffer is narrowed, narrowing does not affect the parser.
+
+@cindex tree-sitter parse string
+@defun tree-sitter-parse-string string language
+Besides creating a parser for a buffer, we can also just parse a
+string. Unlike a buffer, parsing a string is a one-time deal, and
+there is no way to update the result.
+
+This function parses @var{string} with @var{language}, and returns the
+root node of the generated syntax tree.
+@end defun
+
+@node Retrieving Node
+@section Retrieving Node
+
+@cindex tree-sitter find node
+@cindex tree-sitter get node
+There are two ways to retrieve a node: directly from the syntax tree,
+or by traveling from other nodes. But before we continue, lets go
+over some conventions of tree-sitter functions.
+
+We talk about a node being ``smaller'' or ``larger'', and ``lower'' or
+``higher''. A smaller and lower node is lower in the syntax tree and
+therefore spans a smaller piece of text; a larger and higher node is
+higher up in the syntax tree, containing many smaller nodes as its
+children, and therefore spans a larger piece of text.
+
+When a function cannot find a node, it returns nil. And for the
+convenience for function chaining, all the functions that take a node
+as argument and returns a node accept the node to be nil; in that
+case, the function just returns nil.
+
+@vindex tree-sitter-node-outdated
+Nodes are not automatically updated when the associated buffer is
+modified. In fact, there is no way to update a node once it is
+retrieved. It is best to use a node and throw it away and not save
+it. A node is @dfn{outdated} if the buffer has changed since the node
+is retrieved. Using an outdated node throws
+@var{tree-sitter-node-outdated} error.
+
+@heading Retrieving node from syntax tree
+
+@defun tree-sitter-node-at beg &optional end parser-or-lang named
+This function returns the @emph{smallest} node that covers the span
+from @var{beg} to @var{end}. In other words, the start of the node
+@code{<=} @var{beg}, and the end of the node @code{>=} @var{end}. If
+@var{end} is omitted, it defaults to the value of @var{beg}.
+
+When @var{parser-or-lang} is nil, this function uses the first parser
+in @var{tree-sitter-parser-list} in the current buffer. If
+@var{parser-or-lang} is a parser object, it use that parser; if
+@var{parser-or-lang} is a language, it finds the first parser using
+that language in @var{tree-sitter-parser-list} and use that.
+
+If @var{named} is non-nil, this function looks for a named node
+instead (@pxref{tree-sitter named node, named node}).
+
+@example
+@group
+;; Find the node at point in a C parser's syntax tree.
+(tree-sitter-node-at (point) (point) 'tree-sitter-c)
+ @c @result{} #<tree-sitter-node from 1 to 4 in *scratch*>
+@end group
+@end example
+@end defun
+
+@defun tree-sitter-parser-root-node parser
+This function returns the root node of the syntax tree generated by
+@var{parser}.
+@end defun
+
+@defun tree-sitter-buffer-root-node &optional language
+This function finds the first parser that uses @var{language} in
+@var{tree-sitter-parser-list} in the current buffer, and returns the
+root node of that buffer. If it cannot find an appropriate parser, it
+returns nil.
+@end defun
+
+Once we have a node, we can retrieve other nodes from it, or query for
+information about this node.
+
+@heading Retrieving node from other nodes
+
+@subheading By kinship
+
+@defun tree-sitter-node-parent node
+This function returns the immediate parent of @var{node}.
+@end defun
+
+@defun tree-sitter-node-child node n &optional named
+This function returns the @var{n}'th child of @var{node}. If
+@var{named} is non-nil, then it only counts named nodes
+(@pxref{tree-sitter named node, named node}). For example, in a node
+that represents a string: @code{"text"}, there are three children
+nodes: the opening quote @code{"}, the string content @code{text}, and
+the enclosing quote @code{"}. Among these nodes, the first child is
+the opening quote @code{"}, the first named child is the string
+content @code{text}.
+@end defun
+
+@defun tree-sitter-node-children node &optional named
+This function returns all of @var{node}'s children in a list. If
+@var{named} is non-nil, then it only retrieves named nodes
+(@pxref{tree-sitter named node, named node}).
+@end defun
+
+@defun tree-sitter-next-sibling node &optional named
+This function finds the next sibling of @var{node}. If @var{named} is
+non-nil, it finds the next named sibling (@pxref{tree-sitter named
+node, named node}).
+@end defun
+
+@defun tree-sitter-prev-sibling node &optional named
+This function finds the previous sibling of @var{node}. If
+@var{named} is non-nil, it finds the previous named sibling
+(@pxref{tree-sitter named node, named node}).
+@end defun
+
+@subheading By field name
+
+To make the syntax tree easier to analyze, many language definitions
+assign @dfn{field names} to child nodes (@pxref{tree-sitter node field
+name, field name}). For example, a @code{function_definition} node
+could have a @code{declarator} and a @code{body}.
+
+@defun tree-sitter-child-by-field-name node field-name
+This function finds the child of @var{node} that has @var{field-name}
+as its field name.
+
+@example
+@group
+;; Get the child that has "body" as its field name.
+(tree-sitter-child-by-field-name node "body")
+ @c @result{} #<tree-sitter-node from 3 to 11 in *scratch*>
+@end group
+@end example
+@end defun
+
+@subheading By position
+
+@defun tree-sitter-first-child-for-pos node pos &optional named
+This function finds the first child of @var{node} that extends beyond
+@var{pos}. ``Extend beyond'' means the end of the child node
+@code{>=} @var{pos}. This function only looks for immediate children of
+@var{node}, and doesn't look in its grand children. If @var{named} is
+non-nil, it only looks for named child (@pxref{tree-sitter named node,
+named node}).
+@end defun
+
+@defun tree-sitter-node-descendant-for-range node beg end &optional named
+This function finds the @emph{smallest} (grand)child of @var{node}
+that spans the range from @var{beg} to @var{end}. It is similar to
+@code{tree-sitter-node-at}. If @var{named} is non-nil, it only looks
+for named child (@pxref{tree-sitter named node, named node}).
+@end defun
+
+@heading More convenient functions
+
+@defun tree-sitter-filter-child node pred &optional named
+This function finds children of @var{node} that satisfies @var{pred}.
+
+Function @var{pred} takes the child node as the argument and should
+return non-nil to indicated keeping the child. If @var{named}
+non-nil, this function only searches for named nodes."
+@end defun
+
+@defun tree-sitter-parent-until node pred
+This function repeatedly finds the parent of @var{node}, and returns
+the parent if it satisfies @var{pred} (which takes the parent as the
+argument). If no parent satisfies @var{pred}, this function returns
+nil.
+@end defun
+
+@defun tree-sitter-parent-while
+This function repeatedly finds the parent of @var{node}, and keeps
+doing so as long as the parent satisfies @var{pred} (which takes the
+parent as the single argument). I.e., this function returns the
+farthest parent that still satisfies @var{pred}.
+@end defun
+
+@node Accessing Node
+@section Accessing Node Information
+
+Before going further, make sure you have read the basic conventions
+about tree-sitter nodes in the previous node.
+
+@heading Basic information
+
+Every node is associated with a parser, and that parser is associated
+with a buffer. The following functions let you retrieve them.
+
+@defun tree-sitter-node-parser node
+This function returns @var{node}'s associated parser.
+@end defun
+
+@defun tree-sitter-node-buffer node
+This function returns @var{node}'s parser's associated buffer.
+@end defun
+
+@defun tree-sitter-node-language node
+This function returns @var{node}'s parser's associated language.
+@end defun
+
+Each node represents a piece of text in the buffer. Functions below
+finds relevant information about that text.
+
+@defun tree-sitter-node-start node
+Return the start position of @var{node}.
+@end defun
+
+@defun tree-sitter-node-end node
+Return the end position of @var{node}.
+@end defun
+
+@defun tree-sitter-node-text node &optional object
+Returns the buffer text that @var{node} represents. (If @var{node} is
+retrieved from parsing a string, it will be the text from that
+string.)
+@end defun
+
+Here are some basic checks on tree-sitter nodes.
+
+@defun tree-sitter-node-p object
+Checks if @var{object} is a tree-sitter syntax node.
+@end defun
+
+@defun tree-sitter-node-eq node1 node2
+Checks if @var{node1} and @var{node2} are the same node in a syntax
+tree.
+@end defun
+
+@heading Property information
+
+In general, nodes in a concrete syntax tree fall into two categories:
+@dfn{named nodes} and @dfn{anonymous nodes}. Whether a node is named
+or anonymous is determined by the language definition
+(@pxref{tree-sitter named node, named node}).
+
+@cindex tree-sitter missing node
+Apart from being named/anonymous, a node can have other properties. A
+node can be ``missing'': missing nodes are inserted by the parser in
+order to recover from certain kinds of syntax errors, i.e., something
+should probably be there according to the grammar, but not there.
+
+@cindex tree-sitter extra node
+A node can be ``extra'': extra nodes represent things like comments,
+which can appear anywhere in the text.
+
+@cindex tree-sitter node that has changes
+A node ``has changes'' if the buffer changed since when the node is
+retrieved. In this case, the node's start and end position would be
+off and we better throw it away and retrieve a new one.
+
+@cindex tree-sitter node that has error
+A node ``has error'' if the text it spans contains a syntax error. It
+can be the node itself has an error, or one of its (grand)children has
+an error.
+
+@defun tree-sitter-node-check node property
+This function checks if @var{node} has @var{property}. @var{property}
+can be @code{'named}, @code{'missing}, @code{'extra},
+@code{'has-changes}, or @code{'has-error}.
+@end defun
+
+Named nodes have ``types'' (@pxref{tree-sitter node type, node type}).
+For example, a named node can be a @code{string_literal} node, where
+@code{string_literal} is its type.
+
+@defun tree-sitter-node-type node
+Return @var{node}'s type as a string.
+@end defun
+
+@heading Information as a child or parent
+
+@defun tree-sitter-node-index node &optional named
+This function returns the index of @var{node} as a child node of its
+parent. If @var{named} is non-nil, it only count named nodes
+(@pxref{tree-sitter named node, named node}).
+@end defun
+
+@defun tree-sitter-node-field-name node
+A child of a parent node could have a field name (@pxref{tree-sitter
+node field name, field name}). This function returns the field name
+of @var{node} as a child of its parent.
+@end defun
+
+@defun tree-sitter-node-field-name-for-child node n
+This is a more primitive function that returns the field name of the
+@var{n}'th child of @var{node}.
+@end defun
+
+@defun tree-sitter-child-count node &optional named
+This function finds the number of children of @var{node}. If
+@var{named} is non-nil, it only counts named child (@pxref{tree-sitter
+named node, named node}).
+@end defun
+
+@node Pattern Matching
+@section Pattern Matching Tree-sitter Nodes
+
+Tree-sitter let us pattern match with a small declarative language.
+Pattern matching consists of two steps: first tree-sitter matches a
+@dfn{pattern} against nodes in the syntax tree, then it @dfn{captures}
+specific nodes in that pattern and returns the captured nodes.
+
+We describe first how to write the most basic query pattern and how to
+capture nodes in a pattern, then the pattern-match function, finally
+more advanced pattern syntax.
+
+@heading Basic query syntax
+
+@cindex Tree-sitter query syntax
+@cindex Tree-sitter query pattern
+A @dfn{query} consists of multiple @dfn{patterns}, each pattern is an
+s-expression that matches a certain node in the syntax node. A
+pattern has the following shape:
+
+@example
+(@var{type} @var{child}...)
+@end example
+
+@noindent
+For example, a pattern that matches a @code{binary_expression} node that
+contains @code{number_literal} child nodes would look like
+
+@example
+(binary_expression (number_literal))
+@end example
+
+To @dfn{capture} a node in the query pattern above, append
+@code{@@capture-name} after the node pattern you want to capture. For
+example,
+
+@example
+(binary_expression (number_literal) @@number-in-exp)
+@end example
+
+@noindent
+captures @code{number_literal} nodes that are inside a
+@code{binary_expression} node with capture name @code{number-in-exp}.
+
+We can capture the @code{binary_expression} node too, with capture
+name @code{biexp}:
+
+@example
+(binary_expression
+ (number_literal) @@number-in-exp) @@biexp
+@end example
+
+@heading Query function
+
+Now we can introduce the query functions.
+
+@defun tree-sitter-query-capture node query &optional beg end
+This function matches patterns in @var{query} in @var{node}.
+Argument @var{query} can be a either string or a s-expression. For
+now, we focus on the string syntax; s-expression syntax is described
+at the end of the section.
+
+The function returns all captured nodes in a list of
+@code{(@var{capture_name} . @var{node})}. If @var{beg} and @var{end}
+are both non-nil, it only pattern matches nodes in that range.
+
+@vindex tree-sitter-query-error
+This function raise a @var{tree-sitter-query-error} if @var{query} is
+malformed. The signal data contains a description of the specific
+error.
+@end defun
+
+@defun tree-sitter-query-in source query &optional beg end
+This function matches patterns in @var{query} in @var{source}, and
+returns all captured nodes in a list of @code{(@var{capture_name}
+. @var{node})}. If @var{beg} and @var{end} are both non-nil, it only
+pattern match nodes in that range.
+
+Argument @var{source} designates a node, it can be a language symbol,
+a parser, or simply a node. If a language symbol, @var{source}
+represents the root node of the first parser for that language in the
+current buffer; if a parser, @var{source} represents the root node of
+that parser.
+
+This function also raises @var{tree-sitter-query-error}.
+@end defun
+
+For example, suppose @var{node}'s content is @code{1 + 2}, and
+@var{query} is
+
+@example
+@group
+(setq query
+ "(binary_expression
+ (number_literal) @@number-in-exp) @@biexp")
+@end group
+@end example
+
+@noindent
+Querying that query would return
+
+@example
+@group
+(tree-sitter-query-capture node query)
+ @result{} ((biexp . @var{<node for "1 + 2">})
+ (number-in-exp . @var{<node for "1">})
+ (number-in-exp . @var{<node for "2">}))
+@end group
+@end example
+
+As we mentioned earlier, a @var{query} could contain multiple
+patterns. For example, it could have two top-level patterns:
+
+@example
+@group
+(setq query
+ "(binary_expression) @@biexp
+ (number_literal) @@number @@biexp")
+@end group
+@end example
+
+@defun tree-sitter-query-string string query language
+This function parses @var{string} with @var{language}, pattern matches
+its root node with @var{query}, and returns the result.
+@end defun
+
+@heading More query syntax
+
+Besides node type and capture, tree-sitter's query syntax can express
+anonymous node, field name, wildcard, quantification, grouping,
+alternation, anchor, and predicate.
+
+@subheading Anonymous node
+
+An anonymous node is written verbatim, surrounded by quotes. A
+pattern matching (and capturing) keyword @code{return} would be
+
+@example
+"return" @@keyword
+@end example
+
+@subheading Wild card
+
+In a query pattern, @samp{(_)} matches any named node, and @samp{_}
+matches any named and anonymous node. For example, to capture any
+named child of a @code{binary_expression} node, the pattern would be
+
+@example
+(binary_expression (_) @@in_biexp)
+@end example
+
+@subheading Field name
+
+We can capture child nodes that has specific field names:
+
+@example
+@group
+(function_definition
+ declarator: (_) @@func-declarator
+ body: (_) @@func-body)
+@end group
+@end example
+
+We can also capture a node that doesn't have certain field, say, a
+@code{function_definition} without a @code{body} field.
+
+@example
+(function_definition !body) @@func-no-body
+@end example
+
+@subheading Quantify node
+
+Tree-sitter recognizes quantification operators @samp{*}, @samp{+} and
+@samp{?}. Their meanings are the same as in regular expressions:
+@samp{*} matches the preceding pattern zero or more times, @samp{+}
+matches one or more times, and @samp{?} matches zero or one time.
+
+For example, this pattern matches @code{type_declaration} nodes
+that has @emph{zero or more} @code{long} keyword.
+
+@example
+(type_declaration "long"* @@long-in-type)
+@end example
+
+@noindent
+And this pattern matches a type declaration that has zero or one
+@code{long} keyword:
+
+@example
+(type_declaration "long"?) @@type-decl
+@end example
+
+@subheading Grouping
+
+Similar to groups in regular expression, we can bundle patterns into a
+group and apply quantification operators to it. For example, to
+express a comma separated list of identifiers, one could write
+
+@example
+(identifier) ("," (identifier))*
+@end example
+
+@subheading Alternation
+
+Again, similar to regular expressions, we can express ``match anyone
+from this group of patterns'' in the query pattern. The syntax is a
+list of patterns enclosed in square brackets. For example, to capture
+some keywords in C, the query pattern would be
+
+@example
+@group
+[
+ "return"
+ "break"
+ "if"
+ "else"
+] @@keyword
+@end group
+@end example
+
+@subheading Anchor
+
+The anchor operator @samp{.} can be used to enforce juxtaposition,
+i.e., to enforce two things to be directly next to each other. The
+two ``things'' can be two nodes, or a child and the end of its parent.
+For example, to capture the first child, the last child, or two
+adjacent children:
+
+@example
+@group
+;; Anchor the child with the end of its parent.
+(compound_expression (_) @@last-child .)
+
+;; Anchor the child with the beginning of its parent.
+(compound_expression . (_) @@first-child)
+
+;; Anchor two adjacent children.
+(compound_expression
+ (_) @@prev-child
+ .
+ (_) @@next-child)
+@end group
+@end example
+
+Note that the enforcement of juxtaposition ignores any anonymous
+nodes.
+
+@subheading Predicate
+
+We can add predicate constraints to a pattern. For example, if we use
+the following query pattern
+
+@example
+@group
+(
+ (array . (_) @@first (_) @@last .)
+ (#equal @@first @@last)
+)
+@end group
+@end example
+
+Then tree-sitter only matches arrays where the first element equals to
+the last element. To attach a predicate to a pattern, we need to
+group then together. A predicate always starts with a @samp{#}.
+Currently there are two predicates, @code{#equal} and @code{#match}.
+
+@deffn Predicate equal arg1 arg2
+Matches if @var{arg1} equals to @var{arg2}. Arguments can be either a
+string or a capture name. Capture names represent the text that the
+captured node spans in the buffer.
+@end deffn
+
+@deffn Predicate match regexp capture-name
+Matches if the text that @var{capture-name}’s node spans in the buffer
+matches regular expression @var{regexp}. Matching is case-sensitive.
+@end deffn
+
+Note that a predicate can only refer to capture names appeared in the
+same pattern. Indeed, it makes little sense to refer to capture names
+in other patterns anyway.
+
+@heading S-expression patterns
+
+Besides strings, Emacs provides a s-expression based syntax for query
+patterns. It largely resembles the string-based syntax. For example,
+the following pattern
+
+@example
+@group
+(tree-sitter-query-capture
+ node "(addition_expression
+ left: (_) @@left
+ \"+\" @@plus-sign
+ right: (_) @@right) @@addition
+
+ [\"return\" \"break\"] @@keyword")
+@end group
+@end example
+
+@noindent
+is equivalent to
+
+@example
+@group
+(tree-sitter-query-capture
+ node '((addition_expression
+ left: (_) @@left
+ "+" @@plus-sign
+ right: (_) @@right) @@addition
+
+ ["return" "break"] @@keyword))
+@end group
+@end example
+
+Most pattern syntax can be written directly as strange but
+never-the-less valid s-expressions. Only a few of them needs
+modification:
+
+@itemize
+@item
+Anchor @samp{.} is written as @code{:anchor}.
+@item
+@samp{?} is written as @samp{:?}.
+@item
+@samp{*} is written as @samp{:*}.
+@item
+@samp{+} is written as @samp{:+}.
+@item
+@code{#equal} is written as @code{:equal}. In general, predicates
+change their @samp{#} to @samp{:}.
+@end itemize
+
+For example,
+
+@example
+@group
+"(
+ (compound_expression . (_) @@first (_)* @@rest)
+ (#match \"love\" @@first)
+ )"
+@end group
+@end example
+
+is written in s-expression as
+
+@example
+@group
+'((
+ (compound_expression :anchor (_) @@first (_) :* @@rest)
+ (:match "love" @@first)
+ ))
+@end group
+@end example
+
+@defun tree-sitter-expand-query query
+This function expands the s-expression @var{query} into a string
+query. It is usually a good idea to expand the s-expression patterns
+into strings for font-lock queries since they are called repeatedly.
+@end defun
+
+Tree-sitter project's documentation about pattern-matching can be
+found at
+@uref{https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries}.
+
+@node Multiple Languages
+@section Parsing Text in Multiple Languages
+
+Sometimes, the source of a programming language could contain sources
+of other languages, HTML + CSS + JavaScript is one example. In that
+case, we need to assign individual parsers to text segments written in
+different languages. Traditionally this is achieved by using
+narrowing. While tree-sitter works with narrowing (@pxref{tree-sitter
+narrowing, narrowing}), the recommended way is to set ranges in which
+a parser will operate.
+
+@defun tree-sitter-parser-set-included-ranges parser ranges
+This function sets the range of @var{parser} to @var{ranges}. Then
+@var{parser} will only read the text covered in each range. Each
+range in @var{ranges} is a list of cons @code{(@var{beg}
+. @var{end})}.
+
+Each range in @var{ranges} must come in order and not overlap. That
+is, in pseudo code:
+
+@example
+@group
+(cl-loop for idx from 1 to (1- (length ranges))
+ for prev = (nth (1- idx) ranges)
+ for next = (nth idx ranges)
+ should (<= (car prev) (cdr prev)
+ (car next) (cdr next)))
+@end group
+@end example
+
+@vindex tree-sitter-range-invalid
+If @var{ranges} violates this constraint, or something else went
+wrong, this function signals a @var{tree-sitter-range-invalid}. The
+signal data contains a specific error message and the ranges we are
+trying to set.
+
+This function can also be used for disabling ranges. If @var{ranges}
+is nil, the parser is set to parse the whole buffer.
+
+Example:
+
+@example
+@group
+(tree-sitter-parser-set-included-ranges
+ parser '((1 . 9) (16 . 24) (24 . 25)))
+@end group
+@end example
+@end defun
+
+@defun tree-sitter-parser-included-ranges parser
+This function returns the ranges set for @var{parser}. The return
+value is the same as the @var{ranges} argument of
+@code{tree-sitter-parser-included-ranges}: a list of cons
+@code{(@var{beg} . @var{end})}. And if @var{parser} doesn't have any
+ranges, the return value is nil.
+
+@example
+@group
+(tree-sitter-parser-included-ranges parser)
+ @result{} ((1 . 9) (16 . 24) (24 . 25))
+@end group
+@end example
+@end defun
+
+@defun tree-sitter-set-ranges parser-or-lang ranges
+Like @code{tree-sitter-parser-set-included-ranges}, this function sets
+the ranges of @var{parser-or-lang} to @var{ranges}. Conveniently,
+@var{parser-or-lang} could be either a parser or a language. If it is
+a language, this function looks for the first parser in
+@var{tree-sitter-parser-list} for that language in the current buffer,
+and set range for it.
+@end defun
+
+@defun tree-sitter-get-ranges parser-or-lang
+This function returns the ranges of @var{parser-or-lang}, like
+@code{tree-sitter-parser-included-ranges}. And like
+@code{tree-sitter-set-ranges}, @var{parser-or-lang} can be a parser or
+a language symbol.
+@end defun
+
+@defun tree-sitter-query-range source pattern &optional beg end
+This function matches @var{source} with @var{pattern} and returns the
+ranges of captured nodes. The return value has the same shape of
+other functions: a list of @code{(@var{beg} . @var{end})}.
+
+For convenience, @var{source} can be a language symbol, a parser, or a
+node. If a language symbol, this function matches in the root node of
+the first parser using that language; if a parser, this function
+matches in the root node of that parser; if a node, this function
+matches in that node.
+
+Parameter @var{pattern} is the query pattern used to capture nodes
+(@pxref{Pattern Matching}). The capture names don't matter. Parameter
+@var{beg} and @var{end}, if both non-nil, limits the range in which
+this function queries.
+
+Like other query functions, this function raises an
+@var{tree-sitter-query-error} if @var{pattern} is malformed.
+@end defun
+
+@defun tree-sitter-language-at point
+This function tries to figure out which language is responsible for
+the text at @var{point}. It goes over each parser in
+@var{tree-sitter-parser-list} and see if that parser's range covers
+@var{point}.
+@end defun
+
+@defvar tree-sitter-range-functions
+A list of range functions. Font-locking and indenting code uses
+functions in this alist to set correct ranges for a language parser
+before using it.
+
+The signature of each function should be
+
+@example
+(@var{start} @var{end} &rest @var{_})
+@end example
+
+where @var{start} and @var{end} marks the region that is about to be
+used. A range function only need to (but not limited to) update
+ranges in that region.
+
+Each function in the list is called in-order.
+@end defvar
+
+@defun tree-sitter-update-ranges &optional start end
+This function is used by font-lock and indent to update ranges before
+using any parser. Each range function in
+@var{tree-sitter-range-functions} is called in-order. Arguments
+@var{start} and @var{end} are passed to each range function.
+@end defun
+
+@heading An example
+
+Normally, in a set of languages that can be mixed together, there is a
+major language and several embedded languages. The major language
+parses the whole document, and skips the embedded languages. Then the
+parser for the major language knows the ranges of the embedded
+languages. So we first parse the whole document with the major
+language’s parser, set ranges for the embedded languages, then parse
+the embedded languages.
+
+Suppose we want to parse a very simple document that mixes HTML, CSS
+and JavaScript:
+
+@example
+@group
+<html>
+ <script>1 + 2</script>
+ <style>body @{ color: "blue"; @}</style>
+</html>
+@end group
+@end example
+
+We first parse with HTML, then set ranges for CSS and JavaScript:
+
+@example
+@group
+;; Create parsers.
+(setq html (tree-sitter-get-parser-create 'tree-sitter-html))
+(setq css (tree-sitter-get-parser-create 'tree-sitter-css))
+(setq js (tree-sitter-get-parser-create 'tree-sitter-javascript))
+
+;; Set CSS ranges.
+(setq css-range
+ (tree-sitter-query-range
+ 'tree-sitter-html
+ "(style_element (raw_text) @@capture)"))
+(tree-sitter-parser-set-included-ranges css css-range)
+
+;; Set JavaScript ranges.
+(setq js-range
+ (tree-sitter-query-range
+ 'tree-sitter-html
+ "(script_element (raw_text) @@capture)"))
+(tree-sitter-parser-set-included-ranges js js-range)
+@end group
+@end example
+
+We use a query pattern @code{(style_element (raw_text) @@capture)} to
+find CSS nodes in the HTML parse tree. For how to write query
+patterns, @pxref{Pattern Matching}.
+
+@node Tree-sitter C API
+@section Tree-sitter C API Correspondence
+
+Emacs' tree-sitter integration doesn't expose every feature
+tree-sitter's C API provides. Missing features include:
+
+@itemize
+@item
+Creating a tree cursor and navigating the syntax tree with it.
+@item
+Setting timeout and cancellation flag for a parser.
+@item
+Setting the logger for a parser.
+@item
+Printing a DOT graph of the syntax tree to a file.
+@item
+Coping and modifying a syntax tree. (Emacs doesn't expose a tree
+object.)
+@item
+Using (row, column) coordinates as position.
+@item
+Updating a node with changes. (In Emacs, retrieve a new node instead
+of updating the existing one.)
+@item
+Querying statics of a language definition.
+@end itemize
+
+In addition, Emacs makes some changes to the C API to make the API more
+convenient and idiomatic:
+
+@itemize
+@item
+Instead of using byte positions, the ELisp API uses character
+positions.
+@item
+Null nodes are converted to nil.
+@end itemize
+
+Below is the correspondence between all C API functions and their
+ELisp counterparts. Sometimes one ELisp function corresponds to
+multiple C functions, and many C functions don't have an ELisp
+counterpart.
+
+@example
+ts_parser_new tree-sitter-parser-create
+ts_parser_delete
+ts_parser_set_language
+ts_parser_language tree-sitter-parser-language
+ts_parser_set_included_ranges tree-sitter-parser-set-included-ranges
+ts_parser_included_ranges tree-sitter-parser-included-ranges
+ts_parser_parse
+ts_parser_parse_string tree-sitter-parse-string
+ts_parser_parse_string_encoding
+ts_parser_reset
+ts_parser_set_timeout_micros
+ts_parser_timeout_micros
+ts_parser_set_cancellation_flag
+ts_parser_cancellation_flag
+ts_parser_set_logger
+ts_parser_logger
+ts_parser_print_dot_graphs
+ts_tree_copy
+ts_tree_delete
+ts_tree_root_node
+ts_tree_language
+ts_tree_edit
+ts_tree_get_changed_ranges
+ts_tree_print_dot_graph
+ts_node_type tree-sitter-node-type
+ts_node_symbol
+ts_node_start_byte tree-sitter-node-start
+ts_node_start_point
+ts_node_end_byte tree-sitter-node-end
+ts_node_end_point
+ts_node_string tree-sitter-node-string
+ts_node_is_null
+ts_node_is_named tree-sitter-node-check
+ts_node_is_missing tree-sitter-node-check
+ts_node_is_extra tree-sitter-node-check
+ts_node_has_changes tree-sitter-node-check
+ts_node_has_error tree-sitter-node-check
+ts_node_parent tree-sitter-node-parent
+ts_node_child tree-sitter-node-child
+ts_node_field_name_for_child tree-sitter-node-field-name-for-child
+ts_node_child_count tree-sitter-node-child-count
+ts_node_named_child tree-sitter-node-child
+ts_node_named_child_count tree-sitter-node-child-count
+ts_node_child_by_field_name tree-sitter-node-by-field-name
+ts_node_child_by_field_id
+ts_node_next_sibling tree-sitter-next-sibling
+ts_node_prev_sibling tree-sitter-prev-sibling
+ts_node_next_named_sibling tree-sitter-next-sibling
+ts_node_prev_named_sibling tree-sitter-prev-sibling
+ts_node_first_child_for_byte tree-sitter-first-child-for-pos
+ts_node_first_named_child_for_byte tree-sitter-first-child-for-pos
+ts_node_descendant_for_byte_range tree-sitter-descendant-for-range
+ts_node_descendant_for_point_range
+ts_node_named_descendant_for_byte_range tree-sitter-descendant-for-range
+ts_node_named_descendant_for_point_range
+ts_node_edit
+ts_node_eq tree-sitter-node-eq
+ts_tree_cursor_new
+ts_tree_cursor_delete
+ts_tree_cursor_reset
+ts_tree_cursor_current_node
+ts_tree_cursor_current_field_name
+ts_tree_cursor_current_field_id
+ts_tree_cursor_goto_parent
+ts_tree_cursor_goto_next_sibling
+ts_tree_cursor_goto_first_child
+ts_tree_cursor_goto_first_child_for_byte
+ts_tree_cursor_goto_first_child_for_point
+ts_tree_cursor_copy
+ts_query_new
+ts_query_delete
+ts_query_pattern_count
+ts_query_capture_count
+ts_query_string_count
+ts_query_start_byte_for_pattern
+ts_query_predicates_for_pattern
+ts_query_step_is_definite
+ts_query_capture_name_for_id
+ts_query_string_value_for_id
+ts_query_disable_capture
+ts_query_disable_pattern
+ts_query_cursor_new
+ts_query_cursor_delete
+ts_query_cursor_exec tree-sitter-query-capture
+ts_query_cursor_did_exceed_match_limit
+ts_query_cursor_match_limit
+ts_query_cursor_set_match_limit
+ts_query_cursor_set_byte_range
+ts_query_cursor_set_point_range
+ts_query_cursor_next_match
+ts_query_cursor_remove_match
+ts_query_cursor_next_capture
+ts_language_symbol_count
+ts_language_symbol_name
+ts_language_symbol_for_name
+ts_language_field_count
+ts_language_field_name_for_id
+ts_language_field_id_for_name
+ts_language_symbol_type
+ts_language_version
+@end example
diff --git a/lisp/emacs-lisp/cl-preloaded.el b/lisp/emacs-lisp/cl-preloaded.el
index 6aa45526d8..b4be54bbd6 100644
--- a/lisp/emacs-lisp/cl-preloaded.el
+++ b/lisp/emacs-lisp/cl-preloaded.el
@@ -68,6 +68,8 @@ cl--typeof-types
(font-spec atom) (font-entity atom) (font-object atom)
(vector array sequence atom)
(user-ptr atom)
+ (tree-sitter-parser atom)
+ (tree-sitter-node atom)
;; Plus, really hand made:
(null symbol list sequence atom))
"Alist of supertypes.
diff --git a/lisp/tree-sitter.el b/lisp/tree-sitter.el
new file mode 100644
index 0000000000..25886b393b
--- /dev/null
+++ b/lisp/tree-sitter.el
@@ -0,0 +1,844 @@
+;;; tree-sitter.el --- tree-sitter utilities -*- lexical-binding: t -*-
+
+;; Copyright (C) 2021 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+;;
+;; Note to self: we don't create parsers automatically in any provided
+;; functions.
+
+;;; Code:
+
+(eval-when-compile (require 'cl-lib))
+(require 'cl-seq)
+(require 'font-lock)
+
+;;; Activating tree-sitter
+
+(defgroup tree-sitter
+ nil
+ "Tree-sitter is an incremental parser."
+ :group 'tools)
+
+(defcustom tree-sitter-disabled-modes nil
+ "A list of major-modes for which tree-sitter support is disabled."
+ :type '(list symbol))
+
+(defcustom tree-sitter-maximum-size (* 4 1024 1024)
+ "Maximum buffer size for enabling tree-sitter parsing."
+ :type 'integer)
+
+(defun tree-sitter-available-p ()
+ "Return non-nil if tree-sitter features are available."
+ (fboundp 'tree-sitter-parser-create))
+
+(defun tree-sitter-should-enable-p (&optional mode)
+ "Return non-nil if MODE should activate tree-sitter support.
+MODE defaults to the value of `major-mode'. The result depends
+on the value of `tree-sitter-disabled-modes',
+`tree-sitter-maximum-size', and of course, whether tree-sitter is
+available on the system at all."
+ (let* ((mode (or mode major-mode))
+ (disabled (cl-loop
+ for disabled-mode in tree-sitter-disabled-modes
+ if (provided-mode-derived-p mode disabled-mode)
+ return t
+ finally return nil)))
+ (and (tree-sitter-available-p)
+ (not disabled)
+ (< (buffer-size) tree-sitter-maximum-size))))
+
+;;; Parser API supplement
+
+(defun tree-sitter-get-parser (language)
+ "Find the first parser using LANGUAGE in `tree-sitter-parser-list'."
+ (catch 'found
+ (dolist (parser tree-sitter-parser-list)
+ (when (eq language (tree-sitter-parser-language parser))
+ (throw 'found parser)))))
+
+(defun tree-sitter-get-parser-create (language)
+ "Find the first parser using LANGUAGE in `tree-sitter-parser-list'.
+If none exists, create one and return it."
+ (or (tree-sitter-get-parser language)
+ (tree-sitter-parser-create
+ (current-buffer) language)))
+
+(defun tree-sitter-parse-string (string language)
+ "Parse STRING using a parser for LANGUAGE.
+Return the root node of the syntax tree."
+ (with-temp-buffer
+ (insert string)
+ (tree-sitter-parser-root-node
+ (tree-sitter-parser-create (current-buffer) language))))
+
+(defun tree-sitter-language-at (point)
+ "Return the language used at POINT."
+ (cl-loop for parser in tree-sitter-parser-list
+ if (tree-sitter-node-at point nil parser)
+ return (tree-sitter-parser-language parser)))
+
+(defun tree-sitter-set-ranges (parser-or-lang ranges)
+ "Set the ranges of PARSER-OR-LANG to RANGES."
+ (tree-sitter-parser-set-included-ranges
+ (cond ((symbolp parser-or-lang)
+ (or (tree-sitter-get-parser parser-or-lang)
+ (error "Cannot find a parser for %s" parser-or-lang)))
+ ((tree-sitter-parser-p parser-or-lang)
+ parser-or-lang)
+ (t (error "Expecting a parser or language, but got %s"
+ parser-or-lang)))
+ ranges))
+
+(defun tree-sitter-get-ranges (parser-or-lang)
+ "Get the ranges of PARSER-OR-LANG."
+ (tree-sitter-parser-included-ranges
+ (cond ((symbolp parser-or-lang)
+ (or (tree-sitter-get-parser parser-or-lang)
+ (error "Cannot find a parser for %s" parser-or-lang)))
+ ((tree-sitter-parser-p parser-or-lang)
+ parser-or-lang)
+ (t (error "Expecting a parser or language, but got %s"
+ parser-or-lang)))))
+
+;;; Node API supplement
+
+(defun tree-sitter-node-buffer (node)
+ "Return the buffer in where NODE belongs."
+ (tree-sitter-parser-buffer
+ (tree-sitter-node-parser node)))
+
+(defun tree-sitter-node-language (node)
+ "Return the language symbol that NODE's parser uses."
+ (tree-sitter-parser-language
+ (tree-sitter-node-parser node)))
+
+(defun tree-sitter-node-at (beg &optional end parser-or-lang named)
+ "Return the smallest node covering BEG to END.
+
+If omitted, END defaults to BEG. Return nil if none find. If
+NAMED non-nil, only look for named node. NAMED defaults to nil.
+
+If PARSER-OR-LANG is nil, use the first parser in
+`tree-sitter-parser-list'; if PARSER-OR-LANG is a parser, use
+that parser; if PARSER-OR-LANG is a language, find a parser using
+that language in the current buffer, and use that."
+ (let ((root (if (tree-sitter-parser-p parser-or-lang)
+ (tree-sitter-parser-root-node parser-or-lang)
+ (tree-sitter-buffer-root-node parser-or-lang))))
+ (tree-sitter-node-descendant-for-range root beg (or end beg) named)))
+
+(defun tree-sitter-buffer-root-node (&optional language)
+ "Return the root node of the current buffer.
+Use the first parser in `tree-sitter-parser-list', if LANGUAGE is
+non-nil, use the first parser for LANGUAGE."
+ (if-let ((parser
+ (or (if language
+ (or (tree-sitter-get-parser language)
+ (error "Cannot find a parser for %s" language))
+ (or (car tree-sitter-parser-list)
+ (error "Buffer has no parser"))))))
+ (tree-sitter-parser-root-node parser)))
+
+(defun tree-sitter-filter-child (node pred &optional named)
+ "Return children of NODE that satisfies PRED.
+PRED is a function that takes one argument, the child node. If
+NAMED non-nil, only search for named node."
+ (let ((child (tree-sitter-node-child node 0 named))
+ result)
+ (while child
+ (when (funcall pred child)
+ (push child result))
+ (setq child (tree-sitter-node-next-sibling child named)))
+ (reverse result)))
+
+(defun tree-sitter-node-text (node &optional no-property)
+ "Return the buffer (or string) content corresponding to NODE.
+If NO-PROPERTY is non-nil, remove text properties."
+ (with-current-buffer (tree-sitter-node-buffer node)
+ (if no-property
+ (buffer-substring-no-properties
+ (tree-sitter-node-start node)
+ (tree-sitter-node-end node))
+ (buffer-substring
+ (tree-sitter-node-start node)
+ (tree-sitter-node-end node)))))
+
+(defun tree-sitter-parent-until (node pred)
+ "Return the closest parent of NODE that satisfies PRED.
+Return nil if none found. PRED should be a function that takes
+one argument, the parent node."
+ (let ((node (tree-sitter-node-parent node)))
+ (while (and node (not (funcall pred node)))
+ (setq node (tree-sitter-node-parent node)))
+ node))
+
+(defun tree-sitter-parent-while (node pred)
+ "Return the furthest parent of NODE that satisfies PRED.
+Return nil if none found. PRED should be a function that takes
+one argument, the parent node."
+ (let ((last nil))
+ (while (and node (funcall pred node))
+ (setq last node
+ node (tree-sitter-node-parent node)))
+ last))
+
+(defun tree-sitter-node-children (node &optional named)
+ "Return a list of NODE's children.
+If NAMED is non-nil, collect named child only."
+ (mapcar (lambda (idx)
+ (tree-sitter-node-child node idx named))
+ (number-sequence
+ 0 (1- (tree-sitter-node-child-count node named)))))
+
+(defun tree-sitter-node-index (node &optional named)
+ "Return the index of NODE in its parent.
+If NAMED is non-nil, count named child only."
+ (let ((count 0))
+ (while (setq node (tree-sitter-node-prev-sibling node named))
+ (cl-incf count))
+ count))
+
+(defun tree-sitter-node-field-name (node)
+ "Return the field name of NODE as a child of its parent."
+ (when-let ((parent (tree-sitter-node-parent node))
+ (idx (tree-sitter-node-index node)))
+ (tree-sitter-node-field-name-for-child parent idx)))
+
+;;; Query API supplement
+
+(defun tree-sitter-query-in (source query &optional beg end)
+ "Query the current buffer with QUERY.
+
+SOURCE can be a language symbol, a parser, or a node. If a
+language symbol, use the root node of the first parser for that
+language; if a parser, use the root node of that parser; if a
+node, use that node.
+
+QUERY is either a string query or a sexp query. See Info node
+`(elisp)Pattern Matching' for how to write a query pattern in either
+string or s-expression form.
+
+BEG and END, if _both_ non-nil, specifies the range in which the query
+is executed.
+
+Raise an tree-sitter-query-error if QUERY is malformed."
+ (tree-sitter-query-capture
+ (cond ((symbolp source) (tree-sitter-buffer-root-node source))
+ ((tree-sitter-parser-p source)
+ (tree-sitter-parser-root-node source))
+ ((tree-sitter-node-p source) source))
+ query
+ beg end))
+
+(defun tree-sitter-query-string (string query language)
+ "Query STRING with QUERY in LANGUAGE.
+See `tree-sitter-query-capture' for QUERY."
+ (with-temp-buffer
+ (insert string)
+ (let ((parser (tree-sitter-parser-create (current-buffer) language)))
+ (tree-sitter-query-capture
+ (tree-sitter-parser-root-node parser)
+ query))))
+
+(defun tree-sitter-query-range (source query &optional beg end)
+ "Query the current buffer and return ranges of captured nodes.
+
+QUERY, SOURCE, BEG, END are the same as in
+`tree-sitter-query-in'. This function returns a list
+of (START . END), where START and END specifics the range of each
+captured node. Capture names don't matter."
+ (cl-loop for capture
+ in (tree-sitter-query-in source query beg end)
+ for node = (cdr capture)
+ collect (cons (tree-sitter-node-start node)
+ (tree-sitter-node-end node))))
+
+;;; Range API supplement
+
+(defvar-local tree-sitter-range-functions nil
+ "A list of range functions.
+Font-locking and indenting code uses functions in this alist to
+set correct ranges for a language parser before using it.
+
+The signature of each function should be
+
+ (start end &rest _)
+
+where START and END marks the region that is about to be used. A
+range function only need to (but not limited to) update ranges in
+that region.
+
+Each function in the list is called in-order.")
+
+(defun tree-sitter-update-ranges (&optional start end)
+ "Update the ranges for each language in the current buffer.
+Calls each range functions in `tree-sitter-range-functions'
+in-order. START and END are passed to each range function."
+ (dolist (range-fn tree-sitter-range-functions)
+ (funcall range-fn (or start (point-min)) (or end (point-max)))))
+
+;;; Font-lock
+
+(defvar-local tree-sitter-font-lock-settings nil
+ "A list of SETTINGs for tree-sitter-based fontification.
+
+Each SETTING should look like
+
+ (LANGUAGE QUERY)
+
+Each SETTING controls one parser (often of different language).
+LANGUAGE is the language symbol. See Info node `(elisp)Language
+Definitions'.
+
+QUERY is either a string query or a sexp query.
+See Info node `(elisp)Pattern Matching' for writing queries.
+
+Capture names in QUERY should be face names like
+`font-lock-keyword-face'. The captured node will be fontified
+with that face. Capture names can also be function names, in
+which case the function is called with (START END NODE), where
+START and END are the start and end position of the node in
+buffer, and NODE is the tree-sitter node object. If a capture
+name is both a face and a function, face takes priority.
+
+Generally, major modes should set
+`tree-sitter-font-lock-defaults', and let Emacs automatically
+populate this variable.")
+
+(defvar-local tree-sitter-font-lock-defaults nil
+ "Defaults for tree-sitter Font Lock specified by the major mode.
+
+This variable should be a list of
+
+ (DEFAULT :KEYWORD VALUE...)
+
+A DEFAULT may be a symbol or a list of symbols (specifying
+different levels of fontification). The symbol(s) can be of a
+variable or a function. If a symbol is both a variable and a
+function, it is used as a function. Different levels of
+fontification can be controlled by
+`font-lock-maximum-decoration'.
+
+The symbol(s) in DEFAULT should contain or return a SETTING as
+explained in `tree-sitter-font-lock-settings', which looks like
+
+ (LANGUAGE QUERY)
+
+KEYWORD and VALUE are additional settings could be used to alter
+fontification behavior. Currently there aren't any.
+
+Multi-language major-modes should provide a range function for
+eacn language it supports in `tree-sitter-range-functions', and
+Emacs will set the ranges accordingly before fontifing a region.
+See Info node `(elisp)Multiple Languages' for what does it mean
+to set ranges for a parser.")
+
+(defun tree-sitter-font-lock-fontify-region (start end &optional loudly)
+ "Fontify the region between START and END.
+If LOUDLY is non-nil, message some debugging information."
+ (tree-sitter-update-ranges start end)
+ (font-lock-unfontify-region start end)
+ (dolist (setting tree-sitter-font-lock-settings)
+ (when-let* ((language (nth 0 setting))
+ (match-pattern (nth 1 setting))
+ (parser (tree-sitter-get-parser-create language)))
+ (when-let ((node (tree-sitter-node-at start end parser)))
+ (let ((captures (tree-sitter-query-capture
+ node match-pattern
+ ;; Specifying the range is important. More
+ ;; often than not, NODE will be the root
+ ;; node, and if we don't specify the range,
+ ;; we are basically querying the whole file.
+ start end)))
+ (with-silent-modifications
+ (dolist (capture captures)
+ (let* ((face (car capture))
+ (node (cdr capture))
+ (start (tree-sitter-node-start node))
+ (end (tree-sitter-node-end node)))
+ (cond ((facep face)
+ (put-text-property start end 'face face))
+ ((functionp face)
+ (funcall face start end node))
+ (t (error "Capture name %s is neither a face nor a function" face)))
+ (when loudly
+ (message "Fontifying text from %d to %d, Face: %s Language: %s"
+ start end face language)))))))))
+ ;; Call regexp font-lock after tree-sitter, as it is usually used
+ ;; for custom fontification.
+ (let ((font-lock-unfontify-region-function #'ignore))
+ (funcall #'font-lock-default-fontify-region start end loudly)))
+
+(defun tree-sitter-font-lock-enable ()
+ "Enable tree-sitter font-locking for the current buffer."
+ (let ((default (car tree-sitter-font-lock-defaults))
+ (attributes (cdr tree-sitter-font-lock-defaults)))
+ (ignore attributes)
+ (setq-local tree-sitter-font-lock-settings
+ (font-lock-eval-keywords
+ (font-lock-choose-keywords
+ default
+ (font-lock-value-in-major-mode
+ font-lock-maximum-decoration)))))
+ (setq-local font-lock-fontify-region-function
+ #'tree-sitter-font-lock-fontify-region)
+ ;; If we don't set `font-lock-defaults' to some non-nil value,
+ ;; font-lock doesn't enable properly (the font-lock-mode-internal
+ ;; doesn't run). See `font-lock-add-keywords'.
+ (when (and font-lock-mode
+ (null font-lock-keywords)
+ (null font-lock-defaults))
+ (font-lock-mode -1)
+ (setq-local font-lock-defaults '(nil t))
+ (font-lock-mode 1)))
+
+;;; Indent
+
+(defvar tree-sitter--indent-verbose nil
+ "If non-nil, log progress when indenting.")
+
+;; This is not bound locally like we normally do with major-mode
+;; stuff, because for tree-sitter, a buffer could contain more than
+;; one language.
+(defvar tree-sitter-simple-indent-rules nil
+ "A list of indent rule settings.
+Each indent rule setting should be (LANGUAGE . RULES),
+where LANGUAGE is a language symbol, and RULES is a list of
+
+ (MATCHER ANCHOR OFFSET).
+
+MATCHER determines whether this rule applies, ANCHOR and OFFSET
+together determines which column to indent to.
+
+A MATCHER is a function that takes three arguments (NODE PARENT
+BOL). BOL is the point where we are indenting: the beginning of
+line content, the position of the first non-whitespace character.
+NODE is the largest (highest-in-tree) node starting at that
+point. PARENT is the parent of NODE.
+
+If MATCHER returns non-nil, meaning the rule matches, Emacs then
+uses ANCHOR to find an anchor, it should be a function that takes
+the same argument (NODE PARENT BOL) and returns a point.
+
+Finally Emacs computes the column of that point returned by ANCHOR
+and adds OFFSET to it, and indents to that column.
+
+For MATCHER and ANCHOR, Emacs provides some convenient presets.
+See `tree-sitter-simple-indent-presets'.")
+
+(defvar tree-sitter-simple-indent-presets
+ '((match . (lambda
+ (&optional node-type parent-type node-field
+ node-index-min node-index-max)
+ `(lambda (node parent bol &rest _)
+ (and (or (null ,node-type)
+ (equal (tree-sitter-node-type node)
+ ,node-type))
+ (or (null ,parent-type)
+ (equal (tree-sitter-node-type parent)
+ ,parent-type))
+ (or (null ,node-field)
+ (equal (tree-sitter-node-field-name node)
+ ,node-field))
+ (or (null ,node-index-min)
+ (>= (tree-sitter-node-index node t)
+ ,node-index-min))
+ (or (null ,node-index-max)
+ (<= (tree-sitter-node-index node t)
+ ,node-index-max))))))
+ (no-node . (lambda (node parent bol &rest _) (null node)))
+ (parent-is . (lambda (type)
+ `(lambda (node parent bol &rest _)
+ (equal ,type (tree-sitter-node-type parent)))))
+
+ (node-is . (lambda (type)
+ `(lambda (node parent bol &rest _)
+ (equal ,type (tree-sitter-node-type node)))))
+
+ (query . (lambda (pattern)
+ `(lambda (node parent bol &rest _)
+ (cl-loop for capture
+ in (tree-sitter-query-capture
+ parent ,pattern)
+ if (tree-sitter-node-eq node (cdr capture))
+ return t
+ finally return nil))))
+ (first-sibling . (lambda (node parent bol &rest _)
+ (tree-sitter-node-start
+ (tree-sitter-node-child parent 0 t))))
+
+ (parent . (lambda (node parent bol &rest _)
+ (tree-sitter-node-start
+ (tree-sitter-node-parent node))))
+ (prev-sibling . (lambda (node parent bol &rest _)
+ (tree-sitter-node-start
+ (tree-sitter-node-prev-sibling node))))
+ (no-indent . (lambda (node parent bol &rest _) bol))
+ (prev-line . (lambda (node parent bol &rest _)
+ (save-excursion
+ (goto-char bol)
+ (forward-line -1)
+ (skip-chars-forward " \t")
+ (tree-sitter-node-start
+ (tree-sitter-node-at (point) nil nil t))))))
+ "A list of presets.
+These presets that can be used as MATHER and ANCHOR in
+`tree-sitter-simple-indent-rules'.
+
+MATCHER:
+
+\(match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN NODE-INDEX-MAX)
+
+ NODE-TYPE checks for node's type, PARENT-TYPE checks for
+ parent's type, NODE-FIELD checks for the filed name of node
+ in the parent, NODE-INDEX-MIN and NODE-INDEX-MAX checks for
+ the node's index in the parent. Therefore, to match the
+ first child where parent is \"argument_list\", use
+
+ (match nil \"argument_list\" nil nil 0 0).
+
+no-node
+
+ Matches the case where node is nil, i.e., there is no node
+ that starts at point. This is the case when indenting an
+ empty line.
+
+\(parent-is TYPE)
+
+ Check that the parent has type TYPE.
+
+\(node-is TYPE)
+
+ Checks that the node has type TYPE.
+
+\(query QUERY)
+
+ Queries the parent node with QUERY, and checks if the node
+ is captured (by any capture name).
+
+ANCHOR:
+
+first-sibling
+
+ Find the first child of the parent.
+
+parent
+
+ Find the parent.
+
+prev-sibling
+
+ Find node's previous sibling.
+
+no-indent
+
+ Do nothing.
+
+prev-line
+
+ Find the named node on the previous line. This can be used when
+ indenting an empty line: just indent like the previous node.")
+
+(defun tree-sitter--simple-apply (fn args)
+ "Apply ARGS to FN.
+
+If FN is a key in `tree-sitter-simple-indent-presets', use the
+corresponding value as the function."
+ ;; We don't want to match uncompiled lambdas, so make sure this cons
+ ;; is not a function. We could move the condition functionp
+ ;; forward, but better be explicit.
+ (cond ((and (consp fn) (not (functionp fn)))
+ (apply (tree-sitter--simple-apply (car fn) (cdr fn))
+ ;; We don't evaluate ARGS with `simple-apply', i.e.,
+ ;; no composing, better keep it simple.
+ args))
+ ((and (symbolp fn)
+ (alist-get fn tree-sitter-simple-indent-presets))
+ (apply (alist-get fn tree-sitter-simple-indent-presets)
+ args))
+ ((functionp fn) (apply fn args))
+ (t (error "Couldn't find the function corresponding to %s" fn))))
+
+;; This variable might seem unnecessary: why split
+;; `tree-sitter-indent' and `tree-sitter-simple-indent' into two
+;; functions? We add this variable in between because later we might
+;; add more powerful indentation engines, and that new engine can
+;; probably share `tree-sitter-indent'. It is also useful, suggested
+;; by Stefan M, to have a function that figures out how much to indent
+;; but doesn't actually performs the indentation, because we might
+;; want to know where will a node indent to if we put it at some other
+;; location, and use that information to calculate the actual
+;; indentation. And `tree-sitter-simple-indent' is that function. I
+;; forgot the example Stefan gave, but it makes a lot of sense.
+(defvar tree-sitter-indent-function #'tree-sitter-simple-indent
+ "Function used by `tree-sitter-indent' to do some of the work.
+
+This function is called with
+
+ (NODE PARENT BOL &rest _)
+
+and returns
+
+ (ANCHOR . OFFSET).
+
+BOL is the position of the beginning of the line; NODE is the
+\"largest\" node that starts at BOL; PARENT is its parent; ANCHOR
+is a point (not a node), and OFFSET is a number. Emacs finds the
+column of ANCHOR and adds OFFSET to it as the final indentation
+of the current line.")
+
+(defun tree-sitter-indent ()
+ "Indent according to the result of `tree-sitter-indent-function'."
+ (tree-sitter-update-ranges)
+ (let* ((orig-pos (point))
+ (bol (save-excursion
+ (forward-line 0)
+ (skip-chars-forward " \t")
+ (point)))
+ (smallest-node
+ (cl-loop for parser in tree-sitter-parser-list
+ for node = (tree-sitter-node-at
+ bol nil parser)
+ if node return node))
+ (node (tree-sitter-parent-while
+ smallest-node
+ (lambda (node)
+ (eq bol (tree-sitter-node-start node))))))
+ (pcase-let*
+ ((parser (if smallest-node
+ (tree-sitter-node-parser smallest-node)
+ nil))
+ ;; NODE would be nil if BOL is on a whitespace. In that case
+ ;; we set PARENT to the "node at point", which would
+ ;; encompass the whitespace.
+ (parent (cond ((and node parser)
+ (tree-sitter-node-parent node))
+ (parser
+ (tree-sitter-node-at bol nil parser))
+ (t nil)))
+ (`(,anchor . ,offset)
+ (funcall tree-sitter-indent-function node parent bol)))
+ (if (null anchor)
+ (when tree-sitter--indent-verbose
+ (message "Failed to find the anchor"))
+ (let ((col (+ (save-excursion
+ (goto-char anchor)
+ (current-column))
+ offset)))
+ (if (< bol orig-pos)
+ (save-excursion
+ (indent-line-to col))
+ (indent-line-to col)))))))
+
+(defun tree-sitter-simple-indent (node parent bol)
+ "Calculate indentation according to `tree-sitter-simple-indent-rules'.
+
+BOL is the position of the first non-whitespace character on the
+current line. NODE is the largest node that starts at BOL,
+PARENT is NODE's parent.
+
+Return (ANCHOR . OFFSET) where ANCHOR is a node, OFFSET is the
+indentation offset, meaning indent to align with ANCHOR and add
+OFFSET."
+ (if (null parent)
+ (when tree-sitter--indent-verbose
+ (message "PARENT is nil, not indenting"))
+ (let* ((language (tree-sitter-node-language parent))
+ (rules (alist-get language
+ tree-sitter-simple-indent-rules)))
+ (cl-loop for rule in rules
+ for pred = (nth 0 rule)
+ for anchor = (nth 1 rule)
+ for offset = (nth 2 rule)
+ if (tree-sitter--simple-apply
+ pred (list node parent bol))
+ do (when tree-sitter--indent-verbose
+ (message "Matched rule: %S" rule))
+ and
+ return (cons (tree-sitter--simple-apply
+ anchor (list node parent bol))
+ offset)))))
+
+(defun tree-sitter-check-indent (mode)
+ "Check current buffer's indentation against a major mode MODE.
+
+Pop up a diff buffer showing the difference. Correct
+indentation (target) is in green, current indentation is in red."
+ (interactive "CTarget major mode: ")
+ (let ((source-buf (current-buffer)))
+ (with-temp-buffer
+ (insert-buffer-substring source-buf)
+ (funcall mode)
+ (indent-region (point-min) (point-max))
+ (diff-buffers source-buf (current-buffer)))))
+
+;;; Debugging
+
+(defvar-local tree-sitter--inspect-name nil
+ "Tree-sitter-inspect-mode uses this to show node name in mode-line.")
+
+(defun tree-sitter-inspect-node-at-point (&optional arg)
+ "Show information of the node at point.
+If called interactively, show in echo area, otherwise set
+`tree-sitter--inspect-name' (which will appear in the mode-line
+if `tree-sitter-inspect-mode' is enabled). Uses the first parser
+in `tree-sitter-parser-list'."
+ (interactive "p")
+ ;; NODE-LIST contains all the node that starts at point.
+ (let* ((node-list
+ (cl-loop for node = (tree-sitter-node-at (point))
+ then (tree-sitter-node-parent node)
+ while node
+ if (eq (tree-sitter-node-start node)
+ (point))
+ collect node))
+ (largest-node (car (last node-list)))
+ (parent (tree-sitter-node-parent largest-node))
+ ;; node-list-acending contains all the node bottom-up, then
+ ;; the parent.
+ (node-list-acending
+ (if (null largest-node)
+ ;; If there are no nodes that start at point, just show
+ ;; the node at point and its parent.
+ (list (tree-sitter-node-at (point))
+ (tree-sitter-node-parent
+ (tree-sitter-node-at (point))))
+ (append node-list (list parent))))
+ (name ""))
+ ;; We draw nodes like (parent field-name: (node)) recursively,
+ ;; so it could be (node1 field-name: (node2 field-name: (node3))).
+ (dolist (node node-list-acending)
+ (setq
+ name
+ (concat
+ (if (tree-sitter-node-field-name node)
+ (format " %s: " (tree-sitter-node-field-name node))
+ " ")
+ (if (tree-sitter-node-check node 'named) "(" "\"")
+ (or (tree-sitter-node-type node)
+ "N/A")
+ name
+ (if (tree-sitter-node-check node 'named) ")" "\""))))
+ (setq tree-sitter--inspect-name name)
+ (force-mode-line-update)
+ (when arg
+ (if node-list
+ (message "%s" tree-sitter--inspect-name)
+ (message "No node at point")))))
+
+(define-minor-mode tree-sitter-inspect-mode
+ "Shows the node that _starts_ at point in the mode-line.
+
+The mode-line displays
+
+ PARENT FIELD-NAME: (CHILD (GRAND-CHILD (...)))
+
+CHILD, GRAND-CHILD, and GRAND-GRAND-CHILD, etc, are nodes that
+have their beginning at point. And PARENT is the parent of
+CHILD.
+
+If no node starts at point, i.e., point is in the middle of a
+node, then we just display the smallest node that spans point and
+its immediate parent.
+
+This minor mode doesn't create parsers on its own. It simply
+uses the first parser in `tree-sitter-parser-list'."
+ :lighter nil
+ (if tree-sitter-inspect-mode
+ (progn
+ (add-hook 'post-command-hook
+ #'tree-sitter-inspect-node-at-point 0 t)
+ (add-to-list 'mode-line-misc-info
+ '(:eval tree-sitter--inspect-name)))
+ (remove-hook 'post-command-hook
+ #'tree-sitter-inspect-node-at-point t)
+ (setq mode-line-misc-info
+ (remove '(:eval tree-sitter--inspect-name)
+ mode-line-misc-info))))
+
+(defun tree-sitter-check-query (query language)
+ "Check if QUERY is valid for LANGUAGE.
+If QUERY is invalid, display the query in a popup buffer, jumps
+to the offending pattern and highlight the pattern."
+ (let ((buf (get-buffer-create "*tree-sitter check query*")))
+ (with-temp-buffer
+ (tree-sitter-get-parser-create language)
+ (condition-case err
+ (progn (tree-sitter-query-in language query)
+ (message "QUERY is valid"))
+ (tree-sitter-query-error
+ (with-current-buffer buf
+ (let* ((data (cdr err))
+ (message (nth 0 data))
+ (start (nth 1 data)))
+ (erase-buffer)
+ (insert query)
+ (goto-char start)
+ (search-forward " " nil t)
+ (put-text-property start (point) 'face 'error)
+ (message "%s" (buffer-substring start (point)))
+ (goto-char (point-min))
+ (insert (format "%s: %d\n" message start))
+ (forward-char start)))
+ (pop-to-buffer buf))))))
+
+;;; Etc
+
+(declare-function find-library-name "find-func.el")
+(defun tree-sitter--check-manual-covarage ()
+ "Print tree-sitter functions missing from the manual in message buffer."
+ (interactive)
+ (require 'find-func)
+ (let ((functions-in-source
+ (with-temp-buffer
+ (insert-file-contents (find-library-name "tree-sitter"))
+ (cl-remove-if
+ (lambda (name) (string-match "tree-sitter--" name))
+ (cl-sort
+ (save-excursion
+ (goto-char (point-min))
+ (cl-loop while (re-search-forward
+ "^(defun \\([^ ]+\\)" nil t)
+ collect (match-string-no-properties 1)))
+ #'string<))))
+ (functions-in-manual
+ (with-temp-buffer
+ (insert-file-contents (expand-file-name
+ "doc/lispref/parsing.texi"
+ source-directory))
+ (insert-file-contents (expand-file-name
+ "doc/lispref/modes.texi"
+ source-directory))
+ (cl-sort
+ (save-excursion
+ (goto-char (point-min))
+ (cl-loop while (re-search-forward
+ "^@defun \\([^ ]+\\)" nil t)
+ collect (match-string-no-properties 1)))
+ #'string<))))
+ (message "Missing: %s"
+ (string-join
+ (cl-remove-if
+ (lambda (name) (member name functions-in-manual))
+ functions-in-source)
+ "\n"))))
+
+(provide 'tree-sitter)
+
+;;; tree-sitter.el ends here
diff --git a/src/Makefile.in b/src/Makefile.in
index 2b7c4bb316..6ae55b19e1 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -337,6 +337,10 @@ JSON_LIBS =
JSON_CFLAGS = @JSON_CFLAGS@
JSON_OBJ = @JSON_OBJ@
+TREE_SITTER_LIBS = @TREE_SITTER_LIBS@
+TREE_SITTER_FLAGS = @TREE_SITTER_FLAGS@
+TREE_SITTER_OBJ = @TREE_SITTER_OBJ@
+
INTERVALS_H = dispextern.h intervals.h composite.h
GETLOADAVG_LIBS = @GETLOADAVG_LIBS@
@@ -400,7 +404,7 @@ EMACS_CFLAGS=
$(XINPUT_CFLAGS) $(WEBP_CFLAGS) $(WEBKIT_CFLAGS) $(LCMS2_CFLAGS) \
$(SETTINGS_CFLAGS) $(FREETYPE_CFLAGS) $(FONTCONFIG_CFLAGS) \
$(HARFBUZZ_CFLAGS) $(LIBOTF_CFLAGS) $(M17N_FLT_CFLAGS) $(DEPFLAGS) \
- $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) $(XSYNC_CFLAGS) \
+ $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) $(XSYNC_CFLAGS) $(TREE_SITTER_CFLAGS) \
$(LIBGNUTLS_CFLAGS) $(NOTIFY_CFLAGS) $(CAIRO_CFLAGS) \
$(WERROR_CFLAGS) $(HAIKU_CFLAGS)
ALL_CFLAGS = $(EMACS_CFLAGS) $(WARN_CFLAGS) $(CFLAGS)
@@ -439,7 +443,7 @@ base_obj =
$(if $(HYBRID_MALLOC),sheap.o) \
$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \
$(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) $(JSON_OBJ) \
- $(HAIKU_OBJ) $(PGTK_OBJ)
+ $(TREE_SITTER_OBJ) $(HAIKU_OBJ) $(PGTK_OBJ)
doc_obj = $(base_obj) $(NS_OBJC_OBJ)
obj = $(doc_obj) $(HAIKU_CXX_OBJ)
@@ -559,7 +563,7 @@ LIBES =
$(LIBGNUTLS_LIBS) $(LIB_PTHREAD) $(GETADDRINFO_A_LIBS) $(LCMS2_LIBS) \
$(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS) \
$(JSON_LIBS) $(LIBGMP) $(LIBGCCJIT_LIBS) $(XINPUT_LIBS) $(HAIKU_LIBS) \
- $(SQLITE3_LIBS)
+ $(TREE_SITTER_LIBS) $(SQLITE3_LIBS)
## FORCE it so that admin/unidata can decide whether this file is
## up-to-date. Although since charprop depends on bootstrap-emacs,
diff --git a/src/alloc.c b/src/alloc.c
index 9ed94dc8a1..d4209bb3b9 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -50,6 +50,10 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2022 Free Software
#include TERM_HEADER
#endif /* HAVE_WINDOW_SYSTEM */
+#ifdef HAVE_TREE_SITTER
+#include "tree-sitter.h"
+#endif
+
#include <flexmember.h>
#include <verify.h>
#include <execinfo.h> /* For backtrace. */
@@ -3177,6 +3181,15 @@ cleanup_vector (struct Lisp_Vector *vector)
if (uptr->finalizer)
uptr->finalizer (uptr->p);
}
+#ifdef HAVE_TREE_SITTER
+ else if (PSEUDOVECTOR_TYPEP (&vector->header, PVEC_TS_PARSER))
+ {
+ struct Lisp_TS_Parser *lisp_parser
+ = PSEUDOVEC_STRUCT (vector, Lisp_TS_Parser);
+ ts_tree_delete(lisp_parser->tree);
+ ts_parser_delete(lisp_parser->parser);
+ }
+#endif
#ifdef HAVE_MODULES
else if (PSEUDOVECTOR_TYPEP (&vector->header, PVEC_MODULE_FUNCTION))
{
diff --git a/src/casefiddle.c b/src/casefiddle.c
index 2ea5f09b4c..ac9e73be0c 100644
--- a/src/casefiddle.c
+++ b/src/casefiddle.c
@@ -30,6 +30,10 @@ Copyright (C) 1985, 1994, 1997-1999, 2001-2022 Free Software Foundation,
#include "composite.h"
#include "keymap.h"
+#ifdef HAVE_TREE_SITTER
+#include "tree-sitter.h"
+#endif
+
enum case_action {CASE_UP, CASE_DOWN, CASE_CAPITALIZE, CASE_CAPITALIZE_UP};
/* State for casing individual characters. */
@@ -530,6 +534,11 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
modify_text (start, end);
prepare_casing_context (&ctx, flag, true);
+#ifdef HAVE_TREE_SITTER
+ ptrdiff_t start_byte = CHAR_TO_BYTE (start);
+ ptrdiff_t old_end_byte = CHAR_TO_BYTE (end);
+#endif
+
ptrdiff_t orig_end = end;
record_delete (start, make_buffer_string (start, end, true), false);
if (NILP (BVAR (current_buffer, enable_multibyte_characters)))
@@ -548,6 +557,9 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
{
signal_after_change (start, end - start - added, end - start);
update_compositions (start, end, CHECK_ALL);
+#ifdef HAVE_TREE_SITTER
+ ts_record_change (start_byte, old_end_byte, CHAR_TO_BYTE (end));
+#endif
}
return orig_end + added;
diff --git a/src/data.c b/src/data.c
index 1526cc0c73..64095b3bb9 100644
--- a/src/data.c
+++ b/src/data.c
@@ -260,6 +260,10 @@ DEFUN ("type-of", Ftype_of, Stype_of, 1, 1, 0,
return Qxwidget;
case PVEC_XWIDGET_VIEW:
return Qxwidget_view;
+ case PVEC_TS_PARSER:
+ return Qtree_sitter_parser;
+ case PVEC_TS_NODE:
+ return Qtree_sitter_node;
case PVEC_SQLITE:
return Qsqlite;
/* "Impossible" cases. */
@@ -4203,6 +4207,8 @@ #define PUT_ERROR(sym, tail, msg) \
DEFSYM (Qterminal, "terminal");
DEFSYM (Qxwidget, "xwidget");
DEFSYM (Qxwidget_view, "xwidget-view");
+ DEFSYM (Qtree_sitter_parser, "tree-sitter-parser");
+ DEFSYM (Qtree_sitter_node, "tree-sitter-node");
DEFSYM (Qdefun, "defun");
diff --git a/src/emacs.c b/src/emacs.c
index d1060bca0b..703b8785c8 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -85,6 +85,7 @@ #define MAIN_PROGRAM
#include "intervals.h"
#include "character.h"
#include "buffer.h"
+#include "tree-sitter.h"
#include "window.h"
#include "xwidget.h"
#include "atimer.h"
@@ -2147,6 +2148,9 @@ main (int argc, char **argv)
syms_of_floatfns ();
syms_of_buffer ();
+ #ifdef HAVE_TREE_SITTER
+ syms_of_tree_sitter ();
+ #endif
syms_of_bytecode ();
syms_of_callint ();
syms_of_casefiddle ();
diff --git a/src/eval.c b/src/eval.c
index 294d79e67a..ecf57efb92 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -1915,6 +1915,19 @@ signal_error (const char *s, Lisp_Object arg)
xsignal (Qerror, Fcons (build_string (s), arg));
}
+void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent)
+{
+ eassert (SYMBOLP (name));
+ eassert (SYMBOLP (parent));
+ Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
+ eassert (CONSP (parent_conditions));
+ eassert (!NILP (Fmemq (parent, parent_conditions)));
+ eassert (NILP (Fmemq (name, parent_conditions)));
+ Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
+ Fput (name, Qerror_message, build_pure_c_string (message));
+}
+
/* Use this for arithmetic overflow, e.g., when an integer result is
too large even for a bignum. */
void
diff --git a/src/insdel.c b/src/insdel.c
index 6f180ac580..72b6a8ad1b 100644
--- a/src/insdel.c
+++ b/src/insdel.c
@@ -31,6 +31,10 @@
#include "region-cache.h"
#include "pdumper.h"
+#ifdef HAVE_TREE_SITTER
+#include "tree-sitter.h"
+#endif
+
static void insert_from_string_1 (Lisp_Object, ptrdiff_t, ptrdiff_t, ptrdiff_t,
ptrdiff_t, bool, bool);
static void insert_from_buffer_1 (struct buffer *, ptrdiff_t, ptrdiff_t, bool);
@@ -940,6 +944,12 @@ insert_1_both (const char *string,
set_text_properties (make_fixnum (PT), make_fixnum (PT + nchars),
Qnil, Qnil, Qnil);
+#ifdef HAVE_TREE_SITTER
+ eassert (nbytes >= 0);
+ eassert (PT_BYTE >= 0);
+ ts_record_change (PT_BYTE, PT_BYTE, PT_BYTE + nbytes);
+#endif
+
adjust_point (nchars, nbytes);
check_markers ();
@@ -1071,6 +1081,12 @@ insert_from_string_1 (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
graft_intervals_into_buffer (intervals, PT, nchars,
current_buffer, inherit);
+#ifdef HAVE_TREE_SITTER
+ eassert (nbytes >= 0);
+ eassert (PT_BYTE >= 0);
+ ts_record_change (PT_BYTE, PT_BYTE, PT_BYTE + nbytes);
+#endif
+
adjust_point (nchars, outgoing_nbytes);
check_markers ();
@@ -1137,6 +1153,12 @@ insert_from_gap (ptrdiff_t nchars, ptrdiff_t nbytes, bool text_at_gap_tail)
current_buffer, 0);
}
+#ifdef HAVE_TREE_SITTER
+ eassert (nbytes >= 0);
+ eassert (ins_bytepos >= 0);
+ ts_record_change (ins_bytepos, ins_bytepos, ins_bytepos + nbytes);
+#endif
+
if (ins_charpos < PT)
adjust_point (nchars, nbytes);
@@ -1287,6 +1309,12 @@ insert_from_buffer_1 (struct buffer *buf,
/* Insert those intervals. */
graft_intervals_into_buffer (intervals, PT, nchars, current_buffer, inherit);
+#ifdef HAVE_TREE_SITTER
+ eassert (outgoing_nbytes >= 0);
+ eassert (PT_BYTE >= 0);
+ ts_record_change (PT_BYTE, PT_BYTE, PT_BYTE + outgoing_nbytes);
+#endif
+
adjust_point (nchars, outgoing_nbytes);
}
\f
@@ -1535,6 +1563,13 @@ replace_range (ptrdiff_t from, ptrdiff_t to, Lisp_Object new,
graft_intervals_into_buffer (intervals, from, inschars,
current_buffer, inherit);
+#ifdef HAVE_TREE_SITTER
+ eassert (to_byte >= from_byte);
+ eassert (outgoing_insbytes >= 0);
+ eassert (from_byte >= 0);
+ ts_record_change (from_byte, to_byte, from_byte + outgoing_insbytes);
+#endif
+
/* Relocate point as if it were a marker. */
if (from < PT)
adjust_point ((from + inschars - (PT < to ? PT : to)),
@@ -1569,7 +1604,11 @@ replace_range (ptrdiff_t from, ptrdiff_t to, Lisp_Object new,
If MARKERS, relocate markers.
Unlike most functions at this level, never call
- prepare_to_modify_buffer and never call signal_after_change. */
+ prepare_to_modify_buffer and never call signal_after_change.
+ Because this function is called in a loop, one character at a time.
+ The caller of 'replace_range_2' calls these hooks for the entire
+ region once. Apart from signal_after_change, any caller of this
+ function should also call ts_record_change. */
void
replace_range_2 (ptrdiff_t from, ptrdiff_t from_byte,
@@ -1892,6 +1931,12 @@ del_range_2 (ptrdiff_t from, ptrdiff_t from_byte,
evaporate_overlays (from);
+#ifdef HAVE_TREE_SITTER
+ eassert (from_byte <= to_byte);
+ eassert (from_byte >= 0);
+ ts_record_change (from_byte, to_byte, from_byte);
+#endif
+
return deletion;
}
diff --git a/src/json.c b/src/json.c
index db1be07f19..957f91b46b 100644
--- a/src/json.c
+++ b/src/json.c
@@ -1090,22 +1090,6 @@ DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
return unbind_to (count, lisp);
}
-/* Simplified version of 'define-error' that works with pure
- objects. */
-
-static void
-define_error (Lisp_Object name, const char *message, Lisp_Object parent)
-{
- eassert (SYMBOLP (name));
- eassert (SYMBOLP (parent));
- Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
- eassert (CONSP (parent_conditions));
- eassert (!NILP (Fmemq (parent, parent_conditions)));
- eassert (NILP (Fmemq (name, parent_conditions)));
- Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
- Fput (name, Qerror_message, build_pure_c_string (message));
-}
-
void
syms_of_json (void)
{
diff --git a/src/lisp.h b/src/lisp.h
index 778bd1bfa5..aecbfed7fa 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -575,6 +575,8 @@ #define ENUM_BF(TYPE) enum TYPE
your object -- this way, the same object could be used to represent
several disparate C structures.
+ In addition, you need to add switch branches in data.c for Ftype_of.
+
You also need to add the new type to the constant
`cl--typeof-types' in lisp/emacs-lisp/cl-preloaded.el. */
@@ -1053,6 +1055,8 @@ DEFINE_GDB_SYMBOL_END (PSEUDOVECTOR_FLAG)
PVEC_CONDVAR,
PVEC_MODULE_FUNCTION,
PVEC_NATIVE_COMP_UNIT,
+ PVEC_TS_PARSER,
+ PVEC_TS_NODE,
PVEC_SQLITE,
/* These should be last, for internal_equal and sxhash_obj. */
@@ -5407,6 +5411,11 @@ maybe_gc (void)
maybe_garbage_collect ();
}
+/* Simplified version of 'define-error' that works with pure
+ objects. */
+void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent);
+
INLINE_HEADER_END
#endif /* EMACS_LISP_H */
diff --git a/src/lread.c b/src/lread.c
index 0486a98883..8989e2d12d 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -5196,6 +5196,14 @@ syms_of_lread (void)
Fcons (build_pure_c_string (MODULES_SECONDARY_SUFFIX), Vload_suffixes);
#endif
+ DEFVAR_LISP ("dynamic-library-suffixes", Vdynamic_library_suffixes,
+ doc: /* A list of suffixes for loadable dynamic libraries. */);
+ Vdynamic_library_suffixes =
+ Fcons (build_pure_c_string (DYNAMIC_LIB_SECONDARY_SUFFIX), Qnil);
+ Vdynamic_library_suffixes =
+ Fcons (build_pure_c_string (DYNAMIC_LIB_SUFFIX),
+ Vdynamic_library_suffixes);
+
#endif
DEFVAR_LISP ("module-file-suffix", Vmodule_file_suffix,
doc: /* Suffix of loadable module file, or nil if modules are not supported. */);
diff --git a/src/print.c b/src/print.c
index 8cce8a1ad8..b55b0d2e39 100644
--- a/src/print.c
+++ b/src/print.c
@@ -48,6 +48,10 @@ Copyright (C) 1985-1986, 1988, 1993-1995, 1997-2022 Free Software
# include <sys/socket.h> /* for F_DUPFD_CLOEXEC */
#endif
+#ifdef HAVE_TREE_SITTER
+#include "tree-sitter.h"
+#endif
+
struct terminal;
/* Avoid actual stack overflow in print. */
@@ -1936,6 +1940,30 @@ print_vectorlike (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag,
}
break;
#endif
+
+#ifdef HAVE_TREE_SITTER
+ case PVEC_TS_PARSER:
+ print_c_string ("#<tree-sitter-parser for ", printcharfun);
+ Lisp_Object language = XTS_PARSER (obj)->language_symbol;
+ print_string (Fsymbol_name (language), printcharfun);
+ print_c_string (" in ", printcharfun);
+ print_object (XTS_PARSER (obj)->buffer, printcharfun, escapeflag);
+ printchar ('>', printcharfun);
+ break;
+ case PVEC_TS_NODE:
+ print_c_string ("#<tree-sitter-node from ", printcharfun);
+ print_object (Ftree_sitter_node_start (obj),
+ printcharfun, escapeflag);
+ print_c_string (" to ", printcharfun);
+ print_object (Ftree_sitter_node_end (obj),
+ printcharfun, escapeflag);
+ print_c_string (" in ", printcharfun);
+ print_object (XTS_PARSER (XTS_NODE (obj)->parser)->buffer,
+ printcharfun, escapeflag);
+ printchar ('>', printcharfun);
+ break;
+#endif
+
case PVEC_SQLITE:
{
print_c_string ("#<sqlite ", printcharfun);
diff --git a/src/tree-sitter.c b/src/tree-sitter.c
new file mode 100644
index 0000000000..d2633d05db
--- /dev/null
+++ b/src/tree-sitter.c
@@ -0,0 +1,1601 @@
+/* Tree-sitter integration for GNU Emacs.
+
+Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include "lisp.h"
+#include "buffer.h"
+#include "tree-sitter.h"
+
+/* Commentary
+
+ The Emacs wrapper of tree-sitter does not expose everything the C
+ API provides, most notably:
+
+ - It doesn't expose a syntax tree, we put the syntax tree in the
+ parser object, and updating the tree is handled in the C level.
+
+ - We don't expose tree cursor either. I think Lisp is slow enough
+ to nullify any performance advantage of using a cursor, though I
+ don't have evidence.
+
+ - Because updating the change is handled in the C level as each
+ change is made in the buffer, there is no way for Lisp to update
+ a node. But since we can just retrieve a new node, it shouldn't
+ be a limitation.
+
+ - I didn't expose setting timeout and cancellation flag for a
+ parser, mainly because I don't think they are really necessary
+ in Emacs' use cases.
+
+ - Many tree-sitter functions asks for a TSPoint, basically a (row,
+ column) location. Emacs uses a gap buffer and keeps no
+ information about row and column position. According to the
+ author of tree-sitter, tree-sitter only asks for (row, column)
+ position to carry it around and return back to the user later;
+ and the real position used is the byte position. He also said
+ that he _think_ that it will work to use byte position only.
+ That's why whenever a TSPoint is asked, we pass a dummy one to
+ it. Judging by the nature of parsing algorithms, I think it is
+ safe to use only byte position, and I don't think this will
+ change in the future.
+
+ REF: https://github.com/tree-sitter/tree-sitter/issues/445
+
+ tree-sitter.h has some commentary on the two main data structure
+ for the parser and node. ts_ensure_position_synced has some
+ commentary on how do we make tree-sitter play well with narrowing
+ (tree-sitter parser only sees the visible region, so we need to
+ translate positions back and forth). Most action happens in
+ ts_ensure_parsed, ts_read_buffer and ts_record_change.
+
+ A complete correspondence list between tree-sitter functions and
+ exposed Lisp functions can be found in the manual (elisp)API
+ Correspondence.
+
+ Placement of CHECK_xxx functions: call CHECK_xxx before using any
+ unchecked Lisp values; these include argument of Lisp functions,
+ return value of Fsymbol_value, car of a cons.
+
+ Initializing tree-sitter: there are two entry points to tree-sitter
+ functions: 'tree-sitter-parser-create' and
+ 'tree-sitter-language-available-p'. Therefore we only need to call
+ initialization function in those two functions.
+
+ Tree-sitter offset (0-based) and buffer position (1-based):
+ tree-sitter offset + buffer position = buffer position
+ buffer position - buffer position = tree-sitter offset
+ */
+
+/*** Initialization */
+
+bool ts_initialized = false;
+
+static void *
+ts_calloc_wrapper (size_t n, size_t size)
+{
+ return xzalloc (n * size);
+}
+
+void
+ts_initialize ()
+{
+ if (!ts_initialized)
+ {
+ ts_set_allocator (xmalloc, ts_calloc_wrapper, xrealloc, xfree);
+ ts_initialized = true;
+ }
+}
+
+/*** Loading language library */
+
+/* Translates a symbol tree-sitter-<lang> to a C name
+ tree_sitter_<lang>. */
+void
+ts_symbol_to_c_name (char *symbol_name)
+{
+ for (int idx=0; idx < strlen (symbol_name); idx++)
+ {
+ if (symbol_name[idx] == '-')
+ symbol_name[idx] = '_';
+ }
+}
+
+bool
+ts_find_override_name
+(Lisp_Object language_symbol, Lisp_Object *name, Lisp_Object *c_symbol)
+{
+ for (Lisp_Object list = Vtree_sitter_load_name_override_list;
+ !NILP (list); list = XCDR (list))
+ {
+ Lisp_Object lang = XCAR (XCAR (list));
+ CHECK_SYMBOL (lang);
+ if (EQ (lang, language_symbol))
+ {
+ *name = Fnth (make_fixnum (1), XCAR (list));
+ CHECK_STRING (*name);
+ *c_symbol = Fnth (make_fixnum (2), XCAR (list));
+ CHECK_STRING (*c_symbol);
+ return true;
+ }
+ }
+ return false;
+}
+
+/* Load the dynamic library of LANGUAGE_SYMBOL and return the pointer
+ to the language definition. Signals
+ Qtree_sitter_load_language_error if something goes wrong.
+ Qtree_sitter_load_language_error carries the error message from
+ trying to load the library with each extension.
+
+ If SIGNAL is true, signal an error when failed to load LANGUAGE; if
+ false, return NULL when failed. */
+TSLanguage *
+ts_load_language (Lisp_Object language_symbol, bool signal)
+{
+ Lisp_Object symbol_name = Fsymbol_name (language_symbol);
+
+ /* Figure out the library name and C name. */
+ Lisp_Object lib_base_name =
+ (concat2 (build_pure_c_string ("lib"), symbol_name));
+ char *c_name = strdup (SSDATA (symbol_name));
+ ts_symbol_to_c_name (c_name);
+
+ /* Override the library name and C name, if appropriate. */
+ Lisp_Object override_name;
+ Lisp_Object override_c_name;
+ bool found_override = ts_find_override_name
+ (language_symbol, &override_name, &override_c_name);
+ if (found_override)
+ {
+ lib_base_name = override_name;
+ c_name = SSDATA (override_c_name);
+ }
+
+ dynlib_handle_ptr handle;
+ char const *error;
+ Lisp_Object error_list = Qnil;
+ /* Try loading dynamic library with each extension in
+ 'tree-sitter-load-suffixes'. Stop when succeed, record error
+ message and try the next one when fail. */
+ for (Lisp_Object suffixes = Vdynamic_library_suffixes;
+ !NILP (suffixes); suffixes = XCDR (suffixes))
+ {
+ char *library_name =
+ SSDATA (concat2 (lib_base_name, XCAR (suffixes)));
+ dynlib_error ();
+ handle = dynlib_open (library_name);
+ error = dynlib_error ();
+ if (error == NULL)
+ break;
+ else
+ error_list = Fcons (build_string (error), error_list);
+ }
+ if (error != NULL)
+ {
+ if (signal)
+ xsignal2 (Qtree_sitter_load_language_error,
+ symbol_name, Fnreverse (error_list));
+ else
+ return NULL;
+ }
+
+ /* Load TSLanguage. */
+ dynlib_error ();
+ TSLanguage *(*langfn) ();
+ langfn = dynlib_sym (handle, c_name);
+ error = dynlib_error ();
+ if (error != NULL)
+ {
+ if (signal)
+ xsignal1 (Qtree_sitter_load_language_error,
+ build_string (error));
+ else
+ return NULL;
+ }
+ TSLanguage *lang = (*langfn) ();
+
+ /* Check if language version matches tree-sitter version. */
+ TSParser *parser = ts_parser_new ();
+ bool success = ts_parser_set_language (parser, lang);
+ ts_parser_delete (parser);
+ if (!success)
+ {
+ if (signal)
+ xsignal2 (Qtree_sitter_load_language_error,
+ build_pure_c_string ("Language version doesn't match tree-sitter version, language version:"),
+ make_fixnum (ts_language_version (lang)));
+ else
+ return NULL;
+ }
+ return lang;
+}
+
+DEFUN ("tree-sitter-language-available-p",
+ Ftree_sitter_langauge_available_p,
+ Stree_sitter_language_available_p,
+ 1, 1, 0,
+ doc: /* Return non-nil if LANGUAGE exists and is loadable. */)
+ (Lisp_Object language)
+{
+ CHECK_SYMBOL (language);
+ ts_initialize ();
+ if (ts_load_language(language, false) == NULL)
+ return Qnil;
+ else
+ return Qt;
+}
+
+/*** Parsing functions */
+
+/* An auxiliary function that saves a few lines of code. Assumes TREE
+ is not NULL. */
+static inline void
+ts_tree_edit_1 (TSTree *tree, ptrdiff_t start_byte,
+ ptrdiff_t old_end_byte, ptrdiff_t new_end_byte)
+{
+ TSPoint dummy_point = {0, 0};
+ TSInputEdit edit = {(uint32_t) start_byte,
+ (uint32_t) old_end_byte,
+ (uint32_t) new_end_byte,
+ dummy_point, dummy_point, dummy_point};
+ ts_tree_edit (tree, &edit);
+}
+
+/* Update each parser's tree after the user made an edit. This
+function does not parse the buffer and only updates the tree. (So it
+should be very fast.) */
+void
+ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
+ ptrdiff_t new_end_byte)
+{
+ for (Lisp_Object parser_list =
+ Fsymbol_value (Qtree_sitter_parser_list);
+ !NILP (parser_list);
+ parser_list = XCDR (parser_list))
+ {
+ CHECK_CONS (parser_list);
+ Lisp_Object lisp_parser = XCAR (parser_list);
+ CHECK_TS_PARSER (lisp_parser);
+ TSTree *tree = XTS_PARSER (lisp_parser)->tree;
+ if (tree != NULL)
+ {
+ eassert (start_byte <= old_end_byte);
+ eassert (start_byte <= new_end_byte);
+ /* Think the recorded change as a delete followed by an
+ insert, and think of them as moving unchanged text back
+ and forth. After all, the whole point of updating the
+ tree is to update the position of unchanged text. */
+ ptrdiff_t bytes_del = old_end_byte - start_byte;
+ ptrdiff_t bytes_ins = new_end_byte - start_byte;
+
+ ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
+ ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
+
+ ptrdiff_t affected_start =
+ max (visible_beg, start_byte) - visible_beg;
+ ptrdiff_t affected_old_end =
+ min (visible_end, affected_start + bytes_del);
+ ptrdiff_t affected_new_end =
+ affected_start + bytes_ins;
+
+ ts_tree_edit_1 (tree, affected_start, affected_old_end,
+ affected_new_end);
+ XTS_PARSER (lisp_parser)->visible_end = affected_new_end;
+ XTS_PARSER (lisp_parser)->need_reparse = true;
+ XTS_PARSER (lisp_parser)->timestamp++;
+ }
+ }
+}
+
+void
+ts_ensure_position_synced (Lisp_Object parser)
+{
+ TSParser *ts_parser = XTS_PARSER (parser)->parser;
+ TSTree *tree = XTS_PARSER (parser)->tree;
+
+ if (tree == NULL)
+ return;
+
+ struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+ ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg;
+ ptrdiff_t visible_end = XTS_PARSER (parser)->visible_end;
+ /* Before we parse or set ranges, catch up with the narrowing
+ situation. We change visible_beg and visible_end to match
+ BUF_BEGV_BYTE and BUF_ZV_BYTE, and inform tree-sitter of the
+ change. We want to move the visible range of tree-sitter to
+ match the narrowed range. For example,
+ from ________|xxxx|__
+ to |xxxx|__________ */
+
+ /* 1. Make sure visible_beg <= BUF_BEGV_BYTE. */
+ if (visible_beg > BUF_BEGV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: insert at the beginning. */
+ ts_tree_edit_1 (tree, 0, 0, visible_beg - BUF_BEGV_BYTE (buffer));
+ visible_beg = BUF_BEGV_BYTE (buffer);
+ }
+ /* 2. Make sure visible_end = BUF_ZV_BYTE. */
+ if (visible_end < BUF_ZV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: insert at the end. */
+ ts_tree_edit_1 (tree, visible_end - visible_beg,
+ visible_end - visible_beg,
+ BUF_ZV_BYTE (buffer) - visible_beg);
+ visible_end = BUF_ZV_BYTE (buffer);
+ }
+ else if (visible_end > BUF_ZV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: delete at the end. */
+ ts_tree_edit_1 (tree, BUF_ZV_BYTE (buffer) - visible_beg,
+ visible_end - visible_beg,
+ BUF_ZV_BYTE (buffer) - visible_beg);
+ visible_end = BUF_ZV_BYTE (buffer);
+ }
+ /* 3. Make sure visible_beg = BUF_BEGV_BYTE. */
+ if (visible_beg < BUF_BEGV_BYTE (buffer))
+ {
+ /* Tree-sitter sees: delete at the beginning. */
+ ts_tree_edit_1 (tree, 0, BUF_BEGV_BYTE (buffer) - visible_beg, 0);
+ visible_beg = BUF_BEGV_BYTE (buffer);
+ }
+ eassert (0 <= visible_beg);
+ eassert (visible_beg <= visible_end);
+
+ XTS_PARSER (parser)->visible_beg = visible_beg;
+ XTS_PARSER (parser)->visible_end = visible_end;
+}
+
+void
+ts_check_buffer_size (struct buffer *buffer)
+{
+ ptrdiff_t buffer_size =
+ (BUF_Z (buffer) - BUF_BEG (buffer));
+ if (buffer_size > UINT32_MAX)
+ xsignal2 (Qtree_sitter_buffer_too_large,
+ build_pure_c_string ("Buffer size too large, size:"),
+ make_fixnum (buffer_size));
+}
+
+/* Parse the buffer. We don't parse until we have to. When we have
+to, we call this function to parse and update the tree. */
+void
+ts_ensure_parsed (Lisp_Object parser)
+{
+ if (!XTS_PARSER (parser)->need_reparse)
+ return;
+ TSParser *ts_parser = XTS_PARSER (parser)->parser;
+ TSTree *tree = XTS_PARSER(parser)->tree;
+ TSInput input = XTS_PARSER (parser)->input;
+ struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+ ts_check_buffer_size (buffer);
+
+ /* Before we parse, catch up with the narrowing situation. */
+ ts_ensure_position_synced (parser);
+
+ TSTree *new_tree = ts_parser_parse(ts_parser, tree, input);
+ /* This should be very rare (impossible, really): it only happens
+ when 1) language is not set (impossible in Emacs because the user
+ has to supply a language to create a parser), 2) parse canceled
+ due to timeout (impossible because we don't set a timeout), 3)
+ parse canceled due to cancellation flag (impossible because we
+ don't set the flag). (See comments for ts_parser_parse in
+ tree_sitter/api.h.) */
+ if (new_tree == NULL)
+ {
+ Lisp_Object buf;
+ XSETBUFFER (buf, buffer);
+ xsignal1 (Qtree_sitter_parse_error, buf);
+ }
+
+ ts_tree_delete (tree);
+ XTS_PARSER (parser)->tree = new_tree;
+ XTS_PARSER (parser)->need_reparse = false;
+}
+
+/* This is the read function provided to tree-sitter to read from a
+ buffer. It reads one character at a time and automatically skips
+ the gap. */
+const char*
+ts_read_buffer (void *parser, uint32_t byte_index,
+ TSPoint position, uint32_t *bytes_read)
+{
+ struct buffer *buffer =
+ XBUFFER (((struct Lisp_TS_Parser *) parser)->buffer);
+ ptrdiff_t visible_beg = ((struct Lisp_TS_Parser *) parser)->visible_beg;
+ ptrdiff_t visible_end = ((struct Lisp_TS_Parser *) parser)->visible_end;
+ ptrdiff_t byte_pos = byte_index + visible_beg;
+ /* We will make sure visible_beg = BUF_BEGV_BYTE before re-parse (in
+ ts_ensure_parsed), so byte_pos will never be smaller than
+ BUF_BEG_BYTE. */
+ eassert (visible_beg = BUF_BEGV_BYTE (buffer));
+ eassert (visible_end = BUF_ZV_BYTE (buffer));
+
+ /* Read one character. Tree-sitter wants us to set bytes_read to 0
+ if it reads to the end of buffer. It doesn't say what it wants
+ for the return value in that case, so we just give it an empty
+ string. */
+ char *beg;
+ int len;
+ /* This function could run from a user command, so it is better to
+ do nothing instead of raising an error. (It was a pain in the a**
+ to decrypt mega-if-conditions in Emacs source, so I wrote the two
+ branches separately.) */
+ if (!BUFFER_LIVE_P (buffer))
+ {
+ beg = NULL;
+ len = 0;
+ }
+ /* Reached visible end-of-buffer, tell tree-sitter to read no more. */
+ else if (byte_pos >= visible_end)
+ {
+ beg = NULL;
+ len = 0;
+ }
+ /* Normal case, read a character. */
+ else
+ {
+ beg = (char *) BUF_BYTE_ADDRESS (buffer, byte_pos);
+ len = BYTES_BY_CHAR_HEAD ((int) *beg);
+ }
+ *bytes_read = (uint32_t) len;
+ return beg;
+}
+
+/*** Functions for parser and node object*/
+
+/* Wrap the parser in a Lisp_Object to be used in the Lisp machine. */
+Lisp_Object
+make_ts_parser (Lisp_Object buffer, TSParser *parser,
+ TSTree *tree, Lisp_Object language_symbol)
+{
+ struct Lisp_TS_Parser *lisp_parser
+ = ALLOCATE_PSEUDOVECTOR
+ (struct Lisp_TS_Parser, buffer, PVEC_TS_PARSER);
+
+ lisp_parser->language_symbol = language_symbol;
+ lisp_parser->buffer = buffer;
+ lisp_parser->parser = parser;
+ lisp_parser->tree = tree;
+ TSInput input = {lisp_parser, ts_read_buffer, TSInputEncodingUTF8};
+ lisp_parser->input = input;
+ lisp_parser->need_reparse = true;
+ lisp_parser->visible_beg = BUF_BEGV (XBUFFER (buffer));
+ lisp_parser->visible_end = BUF_ZV (XBUFFER (buffer));
+ return make_lisp_ptr (lisp_parser, Lisp_Vectorlike);
+}
+
+/* Wrap the node in a Lisp_Object to be used in the Lisp machine. */
+Lisp_Object
+make_ts_node (Lisp_Object parser, TSNode node)
+{
+ struct Lisp_TS_Node *lisp_node
+ = ALLOCATE_PSEUDOVECTOR (struct Lisp_TS_Node, parser, PVEC_TS_NODE);
+ lisp_node->parser = parser;
+ lisp_node->node = node;
+ lisp_node->timestamp = XTS_PARSER (parser)->timestamp;
+ return make_lisp_ptr (lisp_node, Lisp_Vectorlike);
+}
+
+DEFUN ("tree-sitter-parser-p",
+ Ftree_sitter_parser_p, Stree_sitter_parser_p, 1, 1, 0,
+ doc: /* Return t if OBJECT is a tree-sitter parser. */)
+ (Lisp_Object object)
+{
+ if (TS_PARSERP (object))
+ return Qt;
+ else
+ return Qnil;
+}
+
+DEFUN ("tree-sitter-node-p",
+ Ftree_sitter_node_p, Stree_sitter_node_p, 1, 1, 0,
+ doc: /* Return t if OBJECT is a tree-sitter node. */)
+ (Lisp_Object object)
+{
+ if (TS_NODEP (object))
+ return Qt;
+ else
+ return Qnil;
+}
+
+DEFUN ("tree-sitter-node-parser",
+ Ftree_sitter_node_parser, Stree_sitter_node_parser,
+ 1, 1, 0,
+ doc: /* Return the parser to which NODE belongs. */)
+ (Lisp_Object node)
+{
+ CHECK_TS_NODE (node);
+ return XTS_NODE (node)->parser;
+}
+
+DEFUN ("tree-sitter-parser-create",
+ Ftree_sitter_parser_create, Stree_sitter_parser_create,
+ 2, 2, 0,
+ doc: /* Create and return a parser in BUFFER for LANGUAGE.
+
+The parser is automatically added to BUFFER's
+`tree-sitter-parser-list'. LANGUAGE should be the symbol of a
+function provided by a tree-sitter language dynamic module, e.g.,
+'tree-sitter-json. If BUFFER is nil, use the current buffer. */)
+ (Lisp_Object buffer, Lisp_Object language)
+{
+ if (NILP (buffer))
+ buffer = Fcurrent_buffer ();
+
+ CHECK_BUFFER (buffer);
+ CHECK_SYMBOL (language);
+ ts_check_buffer_size (XBUFFER (buffer));
+
+ ts_initialize ();
+
+ TSParser *parser = ts_parser_new ();
+ TSLanguage *lang = ts_load_language (language, true);
+ /* We check language version when loading a language, so this should
+ always succeed. */
+ ts_parser_set_language (parser, lang);
+
+ Lisp_Object lisp_parser
+ = make_ts_parser (buffer, parser, NULL, language);
+
+ struct buffer *old_buffer = current_buffer;
+ set_buffer_internal (XBUFFER (buffer));
+
+ Fset (Qtree_sitter_parser_list,
+ Fcons (lisp_parser, Fsymbol_value (Qtree_sitter_parser_list)));
+
+ set_buffer_internal (old_buffer);
+ return lisp_parser;
+}
+
+DEFUN ("tree-sitter-parser-buffer",
+ Ftree_sitter_parser_buffer, Stree_sitter_parser_buffer,
+ 1, 1, 0,
+ doc: /* Return the buffer of PARSER. */)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ Lisp_Object buf;
+ XSETBUFFER (buf, XBUFFER (XTS_PARSER (parser)->buffer));
+ return buf;
+}
+
+DEFUN ("tree-sitter-parser-language",
+ Ftree_sitter_parser_language, Stree_sitter_parser_language,
+ 1, 1, 0,
+ doc: /* Return parser's language symbol.
+This symbol is the one used to create the parser. */)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ return XTS_PARSER (parser)->language_symbol;
+}
+
+/*** Parser API */
+
+DEFUN ("tree-sitter-parser-root-node",
+ Ftree_sitter_parser_root_node, Stree_sitter_parser_root_node,
+ 1, 1, 0,
+ doc: /* Return the root node of PARSER. */)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ ts_ensure_parsed (parser);
+ TSNode root_node = ts_tree_root_node (XTS_PARSER (parser)->tree);
+ return make_ts_node (parser, root_node);
+}
+
+/* Checks that the RANGES argument of
+ tree-sitter-parser-set-included-ranges is valid. */
+void
+ts_check_range_argument (Lisp_Object ranges)
+{
+ EMACS_INT last_point = 1;
+ for (Lisp_Object tail = ranges;
+ !NILP (tail); tail = XCDR (tail))
+ {
+ CHECK_CONS (tail);
+ Lisp_Object range = XCAR (tail);
+ CHECK_CONS (range);
+ CHECK_FIXNUM (XCAR (range));
+ CHECK_FIXNUM (XCDR (range));
+ EMACS_INT beg = XFIXNUM (XCAR (range));
+ EMACS_INT end = XFIXNUM (XCDR (range));
+ /* TODO: Maybe we should check for point-min/max, too? */
+ if (!(last_point <= beg && beg <= end))
+ xsignal2 (Qtree_sitter_range_invalid,
+ build_pure_c_string
+ ("RANGE is either overlapping or out-of-order"),
+ ranges);
+ last_point = end;
+ }
+}
+
+DEFUN ("tree-sitter-parser-set-included-ranges",
+ Ftree_sitter_parser_set_included_ranges,
+ Stree_sitter_parser_set_included_ranges,
+ 2, 2, 0,
+ doc: /* Limit PARSER to RANGES.
+
+RANGES is a list of (BEG . END), each (BEG . END) confines a range in
+which the parser should operate in. Each range must not overlap, and
+each range should come in order. Signal `tree-sitter-set-range-error'
+if the argument is invalid, or something else went wrong. If RANGES
+is nil, set PARSER to parse the whole buffer. */)
+ (Lisp_Object parser, Lisp_Object ranges)
+{
+ CHECK_TS_PARSER (parser);
+ CHECK_CONS (ranges);
+ ts_check_range_argument (ranges);
+
+ /* Before we parse, catch up with narrowing/widening. */
+ ts_ensure_position_synced (parser);
+
+ bool success;
+ if (NILP (ranges))
+ {
+ /* If RANGES is nil, make parser to parse the whole document.
+ To do that we give tree-sitter a 0 length, the range is a
+ dummy. */
+ TSRange ts_range = {0, 0, 0, 0};
+ success = ts_parser_set_included_ranges
+ (XTS_PARSER (parser)->parser, &ts_range , 0);
+ }
+ else
+ {
+ /* Set ranges for PARSER. */
+ ptrdiff_t len = list_length (ranges);
+ TSRange *ts_ranges = malloc (sizeof(TSRange) * len);
+ struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+
+ for (int idx=0; !NILP (ranges); idx++, ranges = XCDR (ranges))
+ {
+ Lisp_Object range = XCAR (ranges);
+ struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+
+ EMACS_INT beg_byte = buf_charpos_to_bytepos
+ (buffer, XFIXNUM (XCAR (range)));
+ EMACS_INT end_byte = buf_charpos_to_bytepos
+ (buffer, XFIXNUM (XCDR (range)));
+ /* We don't care about start and end points, put in dummy
+ value. */
+ TSRange rg = {{0,0}, {0,0},
+ (uint32_t) beg_byte - BUF_BEGV_BYTE (buffer),
+ (uint32_t) end_byte - BUF_BEGV_BYTE (buffer)};
+ ts_ranges[idx] = rg;
+ }
+ success = ts_parser_set_included_ranges
+ (XTS_PARSER (parser)->parser, ts_ranges, (uint32_t) len);
+ /* Although XFIXNUM could signal, it should be impossible
+ because we have checked the input by ts_check_range_argument.
+ So there is no need for unwind-protect. */
+ free (ts_ranges);
+ }
+
+ if (!success)
+ xsignal2 (Qtree_sitter_range_invalid,
+ build_pure_c_string
+ ("Something went wrong when setting ranges"),
+ ranges);
+
+ XTS_PARSER (parser)->need_reparse = true;
+ return Qnil;
+}
+
+DEFUN ("tree-sitter-parser-included-ranges",
+ Ftree_sitter_parser_included_ranges,
+ Stree_sitter_parser_included_ranges,
+ 1, 1, 0,
+ doc: /* Return the ranges set for PARSER.
+See `tree-sitter-parser-set-ranges'. If no range is set, return
+nil. */)
+ (Lisp_Object parser)
+{
+ CHECK_TS_PARSER (parser);
+ uint32_t len;
+ const TSRange *ranges = ts_parser_included_ranges
+ (XTS_PARSER (parser)->parser, &len);
+ if (len == 0)
+ return Qnil;
+ struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
+
+ Lisp_Object list = Qnil;
+ for (int idx=0; idx < len; idx++)
+ {
+ TSRange range = ranges[idx];
+ uint32_t beg_byte = range.start_byte + BUF_BEGV_BYTE (buffer);
+ uint32_t end_byte = range.end_byte + BUF_BEGV_BYTE (buffer);
+
+ Lisp_Object lisp_range =
+ Fcons (make_fixnum (buf_bytepos_to_charpos (buffer, beg_byte)) ,
+ make_fixnum (buf_bytepos_to_charpos (buffer, end_byte)));
+ list = Fcons (lisp_range, list);
+ }
+ return Fnreverse (list);
+}
+
+/*** Node API */
+
+/* Check that OBJ is a positive integer and signal an error if
+ otherwise. */
+static void
+ts_check_positive_integer (Lisp_Object obj)
+{
+ CHECK_INTEGER (obj);
+ if (XFIXNUM (obj) < 0)
+ xsignal1 (Qargs_out_of_range, obj);
+}
+
+static void
+ts_check_node (Lisp_Object obj)
+{
+ CHECK_TS_NODE (obj);
+ Lisp_Object lisp_parser = XTS_NODE (obj)->parser;
+ if (XTS_NODE (obj)->timestamp !=
+ XTS_PARSER (lisp_parser)->timestamp)
+ xsignal1 (Qtree_sitter_node_outdated, obj);
+}
+
+DEFUN ("tree-sitter-node-type",
+ Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
+ doc: /* Return the NODE's type as a string.
+If NODE is nil, return nil. */)
+ (Lisp_Object node)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ const char *type = ts_node_type (ts_node);
+ return build_string (type);
+}
+
+DEFUN ("tree-sitter-node-start",
+ Ftree_sitter_node_start, Stree_sitter_node_start, 1, 1, 0,
+ doc: /* Return the NODE's start position.
+If NODE is nil, return nil. */)
+ (Lisp_Object node)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ uint32_t start_byte_offset = ts_node_start_byte (ts_node);
+ struct buffer *buffer =
+ XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ ptrdiff_t start_pos = buf_bytepos_to_charpos
+ (buffer, start_byte_offset + visible_beg);
+ return make_fixnum (start_pos);
+}
+
+DEFUN ("tree-sitter-node-end",
+ Ftree_sitter_node_end, Stree_sitter_node_end, 1, 1, 0,
+ doc: /* Return the NODE's end position.
+If NODE is nil, return nil. */)
+ (Lisp_Object node)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ uint32_t end_byte_offset = ts_node_end_byte (ts_node);
+ struct buffer *buffer =
+ XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ ptrdiff_t end_pos = buf_bytepos_to_charpos
+ (buffer, end_byte_offset + visible_beg);
+ return make_fixnum (end_pos);
+}
+
+DEFUN ("tree-sitter-node-string",
+ Ftree_sitter_node_string, Stree_sitter_node_string, 1, 1, 0,
+ doc: /* Return the string representation of NODE.
+If NODE is nil, return nil. */)
+ (Lisp_Object node)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ char *string = ts_node_string (ts_node);
+ return make_string (string, strlen (string));
+}
+
+DEFUN ("tree-sitter-node-parent",
+ Ftree_sitter_node_parent, Stree_sitter_node_parent, 1, 1, 0,
+ doc: /* Return the immediate parent of NODE.
+Return nil if there isn't any. If NODE is nil, return nil. */)
+ (Lisp_Object node)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode parent = ts_node_parent (ts_node);
+
+ if (ts_node_is_null (parent))
+ return Qnil;
+
+ return make_ts_node (XTS_NODE (node)->parser, parent);
+}
+
+DEFUN ("tree-sitter-node-child",
+ Ftree_sitter_node_child, Stree_sitter_node_child, 2, 3, 0,
+ doc: /* Return the Nth child of NODE.
+
+Return nil if there isn't any. If NAMED is non-nil, look for named
+child only. NAMED defaults to nil. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object n, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ ts_check_positive_integer (n);
+ EMACS_INT idx = XFIXNUM (n);
+ if (idx > UINT32_MAX) xsignal1 (Qargs_out_of_range, n);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child;
+ if (NILP (named))
+ child = ts_node_child (ts_node, (uint32_t) idx);
+ else
+ child = ts_node_named_child (ts_node, (uint32_t) idx);
+
+ if (ts_node_is_null (child))
+ return Qnil;
+
+ return make_ts_node (XTS_NODE (node)->parser, child);
+}
+
+DEFUN ("tree-sitter-node-check",
+ Ftree_sitter_node_check, Stree_sitter_node_check, 2, 2, 0,
+ doc: /* Return non-nil if NODE has PROPERTY, nil otherwise.
+
+PROPERTY could be 'named, 'missing, 'extra, 'has-changes, 'has-error.
+Named nodes correspond to named rules in the language definition,
+whereas "anonymous" nodes correspond to string literals in the
+language definition.
+
+Missing nodes are inserted by the parser in order to recover from
+certain kinds of syntax errors, i.e., should be there but not there.
+
+Extra nodes represent things like comments, which are not required the
+language definition, but can appear anywhere.
+
+A node "has changes" if the buffer changed since the node is
+created. (Don't forget the "s" at the end of 'has-changes.)
+
+A node "has error" if itself is a syntax error or contains any syntax
+errors. */)
+ (Lisp_Object node, Lisp_Object property)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ CHECK_SYMBOL (property);
+ TSNode ts_node = XTS_NODE (node)->node;
+ bool result;
+ if (EQ (property, Qnamed))
+ result = ts_node_is_named (ts_node);
+ else if (EQ (property, Qmissing))
+ result = ts_node_is_missing (ts_node);
+ else if (EQ (property, Qextra))
+ result = ts_node_is_extra (ts_node);
+ else if (EQ (property, Qhas_error))
+ result = ts_node_has_error (ts_node);
+ else if (EQ (property, Qhas_changes))
+ result = ts_node_has_changes (ts_node);
+ else
+ signal_error ("Expecting 'named, 'missing, 'extra, 'has-changes or 'has-error, got",
+ property);
+ return result ? Qt : Qnil;
+}
+
+DEFUN ("tree-sitter-node-field-name-for-child",
+ Ftree_sitter_node_field_name_for_child,
+ Stree_sitter_node_field_name_for_child, 2, 2, 0,
+ doc: /* Return the field name of the Nth child of NODE.
+
+Return nil if there isn't any child or no field is found.
+If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object n)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ ts_check_positive_integer (n);
+ EMACS_INT idx = XFIXNUM (n);
+ if (idx > UINT32_MAX) xsignal1 (Qargs_out_of_range, n);
+ TSNode ts_node = XTS_NODE (node)->node;
+ const char *name
+ = ts_node_field_name_for_child (ts_node, (uint32_t) idx);
+
+ if (name == NULL)
+ return Qnil;
+
+ return make_string (name, strlen (name));
+}
+
+DEFUN ("tree-sitter-node-child-count",
+ Ftree_sitter_node_child_count,
+ Stree_sitter_node_child_count, 1, 2, 0,
+ doc: /* Return the number of children of NODE.
+
+If NAMED is non-nil, count named child only. NAMED defaults to
+nil. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ uint32_t count;
+ if (NILP (named))
+ count = ts_node_child_count (ts_node);
+ else
+ count = ts_node_named_child_count (ts_node);
+ return make_fixnum (count);
+}
+
+DEFUN ("tree-sitter-node-child-by-field-name",
+ Ftree_sitter_node_child_by_field_name,
+ Stree_sitter_node_child_by_field_name, 2, 2, 0,
+ doc: /* Return the child of NODE with FIELD-NAME.
+Return nil if there isn't any. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object field_name)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ CHECK_STRING (field_name);
+ char *name_str = SSDATA (field_name);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child
+ = ts_node_child_by_field_name (ts_node, name_str, strlen (name_str));
+
+ if (ts_node_is_null(child))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, child);
+}
+
+DEFUN ("tree-sitter-node-next-sibling",
+ Ftree_sitter_node_next_sibling,
+ Stree_sitter_node_next_sibling, 1, 2, 0,
+ doc: /* Return the next sibling of NODE.
+
+Return nil if there isn't any. If NAMED is non-nil, look for named
+child only. NAMED defaults to nil. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode sibling;
+ if (NILP (named))
+ sibling = ts_node_next_sibling (ts_node);
+ else
+ sibling = ts_node_next_named_sibling (ts_node);
+
+ if (ts_node_is_null(sibling))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, sibling);
+}
+
+DEFUN ("tree-sitter-node-prev-sibling",
+ Ftree_sitter_node_prev_sibling,
+ Stree_sitter_node_prev_sibling, 1, 2, 0,
+ doc: /* Return the previous sibling of NODE.
+
+Return nil if there isn't any. If NAMED is non-nil, look for named
+child only. NAMED defaults to nil. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode sibling;
+
+ if (NILP (named))
+ sibling = ts_node_prev_sibling (ts_node);
+ else
+ sibling = ts_node_prev_named_sibling (ts_node);
+
+ if (ts_node_is_null(sibling))
+ return Qnil;
+
+ return make_ts_node(XTS_NODE (node)->parser, sibling);
+}
+
+DEFUN ("tree-sitter-node-first-child-for-pos",
+ Ftree_sitter_node_first_child_for_pos,
+ Stree_sitter_node_first_child_for_pos, 2, 3, 0,
+ doc: /* Return the first child of NODE on POS.
+
+Specifically, return the first child that extends beyond POS. POS is
+a position in the buffer. Return nil if there isn't any. If NAMED is
+non-nil, look for named child only. NAMED defaults to nil. Note that
+this function returns an immediate child, not the smallest
+(grand)child. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object pos, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ ts_check_positive_integer (pos);
+
+ struct buffer *buf =
+ XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ ptrdiff_t byte_pos = buf_charpos_to_bytepos (buf, XFIXNUM (pos));
+
+ if (byte_pos < BUF_BEGV_BYTE (buf) || byte_pos > BUF_ZV_BYTE (buf))
+ xsignal1 (Qargs_out_of_range, pos);
+
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child;
+ if (NILP (named))
+ child = ts_node_first_child_for_byte
+ (ts_node, byte_pos - visible_beg);
+ else
+ child = ts_node_first_named_child_for_byte
+ (ts_node, byte_pos - visible_beg);
+
+ if (ts_node_is_null (child))
+ return Qnil;
+
+ return make_ts_node (XTS_NODE (node)->parser, child);
+}
+
+DEFUN ("tree-sitter-node-descendant-for-range",
+ Ftree_sitter_node_descendant_for_range,
+ Stree_sitter_node_descendant_for_range, 3, 4, 0,
+ doc: /* Return the smallest node that covers BEG to END.
+
+The returned node is a descendant of NODE. POS is a position. Return
+nil if there isn't any. If NAMED is non-nil, look for named child
+only. NAMED defaults to nil. If NODE is nil, return nil. */)
+ (Lisp_Object node, Lisp_Object beg, Lisp_Object end, Lisp_Object named)
+{
+ if (NILP (node)) return Qnil;
+ ts_check_node (node);
+ CHECK_INTEGER (beg);
+ CHECK_INTEGER (end);
+
+ struct buffer *buf =
+ XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ ptrdiff_t byte_beg = buf_charpos_to_bytepos (buf, XFIXNUM (beg));
+ ptrdiff_t byte_end = buf_charpos_to_bytepos (buf, XFIXNUM (end));
+
+ /* Checks for BUFFER_BEG <= BEG <= END <= BUFFER_END. */
+ if (!(BUF_BEGV_BYTE (buf) <= byte_beg
+ && byte_beg <= byte_end
+ && byte_end <= BUF_ZV_BYTE (buf)))
+ xsignal2 (Qargs_out_of_range, beg, end);
+
+ TSNode ts_node = XTS_NODE (node)->node;
+ TSNode child;
+ if (NILP (named))
+ child = ts_node_descendant_for_byte_range
+ (ts_node, byte_beg - visible_beg , byte_end - visible_beg);
+ else
+ child = ts_node_named_descendant_for_byte_range
+ (ts_node, byte_beg - visible_beg, byte_end - visible_beg);
+
+ if (ts_node_is_null (child))
+ return Qnil;
+
+ return make_ts_node (XTS_NODE (node)->parser, child);
+}
+
+DEFUN ("tree-sitter-node-eq",
+ Ftree_sitter_node_eq,
+ Stree_sitter_node_eq, 2, 2, 0,
+ doc: /* Return non-nil if NODE1 and NODE2 are the same node.
+If any one of NODE1 and NODE2 is nil, return nil. */)
+ (Lisp_Object node1, Lisp_Object node2)
+{
+ if (NILP (node1) || NILP (node2))
+ return Qnil;
+ CHECK_TS_NODE (node1);
+ CHECK_TS_NODE (node2);
+
+ TSNode ts_node_1 = XTS_NODE (node1)->node;
+ TSNode ts_node_2 = XTS_NODE (node2)->node;
+
+ bool same_node = ts_node_eq (ts_node_1, ts_node_2);
+ return same_node ? Qt : Qnil;
+}
+
+/*** Query functions */
+
+/* If we decide to pre-load tree-sitter.el, maybe we can implement
+ this function in Lisp. */
+DEFUN ("tree-sitter-expand-pattern",
+ Ftree_sitter_expand_pattern,
+ Stree_sitter_expand_pattern, 1, 1, 0,
+ doc: /* Expand PATTERN to its string form.
+
+PATTERN can be
+
+ :anchor
+ :?
+ :*
+ :+
+ :equal
+ :match
+ (TYPE PATTERN...)
+ [PATTERN...]
+ FIELD-NAME:
+ @CAPTURE-NAME
+ (_)
+ _
+ \"TYPE\"
+
+Consult Info node `(elisp)Pattern Matching' form detailed
+explanation. */)
+ (Lisp_Object pattern)
+{
+ if (EQ (pattern, intern_c_string (":anchor")))
+ return build_pure_c_string(".");
+ if (EQ (pattern, intern_c_string (":?")))
+ return build_pure_c_string("?");
+ if (EQ (pattern, intern_c_string (":*")))
+ return build_pure_c_string("*");
+ if (EQ (pattern, intern_c_string (":+")))
+ return build_pure_c_string("+");
+ if (EQ (pattern, intern_c_string (":equal")))
+ return build_pure_c_string("#equal");
+ if (EQ (pattern, intern_c_string (":match")))
+ return build_pure_c_string("#match");
+ Lisp_Object opening_delimeter =
+ build_pure_c_string (VECTORP (pattern) ? "[" : "(");
+ Lisp_Object closing_delimiter =
+ build_pure_c_string (VECTORP (pattern) ? "]" : ")");
+ if (VECTORP (pattern) || CONSP (pattern))
+ return concat3 (opening_delimeter,
+ Fmapconcat (intern_c_string
+ ("tree-sitter-expand-pattern"),
+ pattern,
+ build_pure_c_string (" ")),
+ closing_delimiter);
+ return CALLN (Fformat, build_pure_c_string("%S"), pattern);
+}
+
+DEFUN ("tree-sitter-expand-query",
+ Ftree_sitter_expand_query,
+ Stree_sitter_expand_query, 1, 1, 0,
+ doc: /* Expand sexp QUERY to its string form.
+
+A PATTERN in QUERY can be
+
+ :anchor
+ :?
+ :*
+ :+
+ :equal
+ :match
+ (TYPE PATTERN...)
+ [PATTERN...]
+ FIELD-NAME:
+ @CAPTURE-NAME
+ (_)
+ _
+ \"TYPE\"
+
+Consult Info node `(elisp)Pattern Matching' form detailed
+explanation. */)
+ (Lisp_Object query)
+{
+ return Fmapconcat (intern_c_string ("tree-sitter-expand-pattern"),
+ query, build_pure_c_string (" "));
+}
+
+char*
+ts_query_error_to_string (TSQueryError error)
+{
+ switch (error)
+ {
+ case TSQueryErrorNone:
+ return "None";
+ case TSQueryErrorSyntax:
+ return "Syntax error at";
+ case TSQueryErrorNodeType:
+ return "Node type error at";
+ case TSQueryErrorField:
+ return "Field error at";
+ case TSQueryErrorCapture:
+ return "Capture error at";
+ case TSQueryErrorStructure:
+ return "Structure error at";
+ default:
+ return "Unknown error";
+ }
+}
+
+/* Collect predicates for this match and return them in a list. Each
+ predicate is a list of strings and symbols. */
+Lisp_Object
+ts_predicates_for_pattern
+(TSQuery *query, uint32_t pattern_index)
+{
+ uint32_t len;
+ const TSQueryPredicateStep *predicate_list =
+ ts_query_predicates_for_pattern (query, pattern_index, &len);
+ Lisp_Object result = Qnil;
+ Lisp_Object predicate = Qnil;
+ for (int idx=0; idx < len; idx++)
+ {
+ TSQueryPredicateStep step = predicate_list[idx];
+ switch (step.type)
+ {
+ case TSQueryPredicateStepTypeCapture:
+ {
+ uint32_t str_len;
+ const char *str = ts_query_capture_name_for_id
+ (query, step.value_id, &str_len);
+ predicate = Fcons (intern_c_string_1 (str, str_len),
+ predicate);
+ break;
+ }
+ case TSQueryPredicateStepTypeString:
+ {
+ uint32_t str_len;
+ const char *str = ts_query_string_value_for_id
+ (query, step.value_id, &str_len);
+ predicate = Fcons (make_string (str, str_len), predicate);
+ break;
+ }
+ case TSQueryPredicateStepTypeDone:
+ result = Fcons (Fnreverse (predicate), result);
+ predicate = Qnil;
+ break;
+ }
+ }
+ return Fnreverse (result);
+}
+
+/* Translate a capture NAME (symbol) to the text of the captured node.
+ Signals tree-sitter-query-error if such node is not captured. */
+Lisp_Object
+ts_predicate_capture_name_to_text (Lisp_Object name, Lisp_Object captures)
+{
+ Lisp_Object node = Qnil;
+ for (Lisp_Object tail = captures; !NILP (tail); tail = XCDR (tail))
+ {
+ if (EQ (XCAR (XCAR (tail)), name))
+ {
+ node = XCDR (XCAR (tail));
+ break;
+ }
+ }
+
+ if (NILP (node))
+ xsignal3 (Qtree_sitter_query_error,
+ build_pure_c_string ("Cannot find captured node"),
+ name, build_pure_c_string ("A predicate can only refer to captured nodes in the same pattern"));
+
+ struct buffer *old_buffer = current_buffer;
+ set_buffer_internal
+ (XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer));
+ Lisp_Object text = Fbuffer_substring
+ (Ftree_sitter_node_start (node), Ftree_sitter_node_end (node));
+ set_buffer_internal (old_buffer);
+ return text;
+}
+
+/* Handles predicate (#equal A B). Return true if A equals B; return
+ false otherwise. A and B can be either string, or a capture name.
+ The capture name evaluates to the text its captured node spans in
+ the buffer. */
+bool
+ts_predicate_equal (Lisp_Object args, Lisp_Object captures)
+{
+ if (XFIXNUM (Flength (args)) != 2)
+ xsignal2 (Qtree_sitter_query_error, build_pure_c_string ("Predicate `equal' requires two arguments but only given"), Flength (args));
+
+ Lisp_Object arg1 = XCAR (args);
+ Lisp_Object arg2 = XCAR (XCDR (args));
+ Lisp_Object tail = captures;
+ Lisp_Object text1 = STRINGP (arg1) ? arg1 :
+ ts_predicate_capture_name_to_text (arg1, captures);
+ Lisp_Object text2 = STRINGP (arg2) ? arg2 :
+ ts_predicate_capture_name_to_text (arg2, captures);
+
+ if (NILP (Fstring_equal (text1, text2)))
+ return false;
+ else
+ return true;
+}
+
+/* Handles predicate (#match "regexp" @node). Return true if "regexp"
+ matches the text spanned by @node; return false otherwise. Matching
+ is case-sensitive. */
+bool
+ts_predicate_match (Lisp_Object args, Lisp_Object captures)
+{
+ if (XFIXNUM (Flength (args)) != 2)
+ xsignal2 (Qtree_sitter_query_error, build_pure_c_string ("Predicate `equal' requires two arguments but only given"), Flength (args));
+
+ Lisp_Object regexp = XCAR (args);
+ Lisp_Object capture_name = XCAR (XCDR (args));
+ Lisp_Object tail = captures;
+ Lisp_Object text = ts_predicate_capture_name_to_text
+ (capture_name, captures);
+
+ /* It's probably common to get the argument order backwards. Catch
+ this mistake early and show helpful explanation, because Emacs
+ loves you. (We put the regexp first because that's what
+ string-match does.) */
+ if (!STRINGP (regexp))
+ xsignal1 (Qtree_sitter_query_error, build_pure_c_string ("The first argument to `match' should be a regexp string, not a capture name"));
+ if (!SYMBOLP (capture_name))
+ xsignal1 (Qtree_sitter_query_error, build_pure_c_string ("The second argument to `match' should be a capture name, not a string"));
+
+ if (fast_string_match (regexp, text) >= 0)
+ return true;
+ else
+ return false;
+}
+
+/* About predicates: I decide to hard-code predicates in C instead of
+ implementing an extensible system where predicates are translated
+ to Lisp functions, and new predicates can be added by extending a
+ list of functions, because I really couldn't imagine any useful
+ predicates besides equal and match. If we later found out that
+ such system is indeed useful and necessary, it can be easily
+ added. */
+
+/* If all predicates in PREDICATES passes, return true; otherwise
+ return false. */
+bool
+ts_eval_predicates (Lisp_Object captures, Lisp_Object predicates)
+{
+ bool pass = true;
+ /* Evaluate each predicates. */
+ for (Lisp_Object tail = predicates;
+ !NILP (tail); tail = XCDR (tail))
+ {
+ Lisp_Object predicate = XCAR (tail);
+ Lisp_Object fn = XCAR (predicate);
+ Lisp_Object args = XCDR (predicate);
+ if (!NILP (Fstring_equal (fn, build_pure_c_string("equal"))))
+ pass = ts_predicate_equal (args, captures);
+ else if (!NILP (Fstring_equal
+ (fn, build_pure_c_string("match"))))
+ pass = ts_predicate_match (args, captures);
+ else
+ xsignal3 (Qtree_sitter_query_error,
+ build_pure_c_string ("Invalid predicate"),
+ fn, build_pure_c_string ("Currently Emacs only supports equal and match predicate"));
+ }
+ /* If all predicates passed, add captures to result list. */
+ return pass;
+}
+
+DEFUN ("tree-sitter-query-capture",
+ Ftree_sitter_query_capture,
+ Stree_sitter_query_capture, 2, 4, 0,
+ doc: /* Query NODE with patterns in QUERY.
+
+Return a list of (CAPTURE_NAME . NODE). CAPTURE_NAME is the name
+assigned to the node in PATTERN. NODE is the captured node.
+
+QUERY is either a string query or a sexp query. See Info node
+`(elisp)Pattern Matching' for how to write a query in either string or
+s-expression form.
+
+BEG and END, if both non-nil, specifies the range in which the query
+is executed.
+
+Raise an tree-sitter-query-error if QUERY is malformed, or something
+else goes wrong. */)
+ (Lisp_Object node, Lisp_Object query,
+ Lisp_Object beg, Lisp_Object end)
+{
+ ts_check_node (node);
+ if (!NILP (beg))
+ CHECK_INTEGER (beg);
+ if (!NILP (end))
+ CHECK_INTEGER (end);
+
+ if (CONSP (query))
+ query = Ftree_sitter_expand_query (query);
+ else
+ CHECK_STRING (query);
+
+ /* Extract C values from Lisp objects. */
+ TSNode ts_node = XTS_NODE (node)->node;
+ Lisp_Object lisp_parser = XTS_NODE (node)->parser;
+ ptrdiff_t visible_beg =
+ XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ const TSLanguage *lang = ts_parser_language
+ (XTS_PARSER (lisp_parser)->parser);
+ char *source = SSDATA (query);
+
+ /* Initialize query objects, and execute query. */
+ uint32_t error_offset;
+ TSQueryError error_type;
+ /* TODO: We could cache the query object, so that repeatedly
+ querying with the same query can reuse the query object. It also
+ saves us from expanding the sexp query into a string. I don't
+ know how much time that could save though. */
+ TSQuery *ts_query = ts_query_new (lang, source, strlen (source),
+ &error_offset, &error_type);
+ TSQueryCursor *cursor = ts_query_cursor_new ();
+
+ if (ts_query == NULL)
+ {
+ xsignal2 (Qtree_sitter_query_error,
+ build_string (ts_query_error_to_string (error_type)),
+ make_fixnum (error_offset + 1));
+ }
+ if (!NILP (beg) && !NILP (end))
+ {
+ EMACS_INT beg_byte = XFIXNUM (beg);
+ EMACS_INT end_byte = XFIXNUM (end);
+ ts_query_cursor_set_byte_range
+ (cursor, (uint32_t) beg_byte - visible_beg,
+ (uint32_t) end_byte - visible_beg);
+ }
+
+ ts_query_cursor_exec (cursor, ts_query, ts_node);
+ TSQueryMatch match;
+
+ /* Go over each match, collect captures and predicates. Include the
+ captures in the return list if all predicates in that match
+ passes. */
+ Lisp_Object result = Qnil;
+ while (ts_query_cursor_next_match (cursor, &match))
+ {
+ /* Get captured nodes. */
+ Lisp_Object captures_lisp = Qnil;
+ const TSQueryCapture *captures = match.captures;
+ for (int idx=0; idx < match.capture_count; idx++)
+ {
+ uint32_t capture_name_len;
+ TSQueryCapture capture = captures[idx];
+ Lisp_Object captured_node =
+ make_ts_node(lisp_parser, capture.node);
+ const char *capture_name = ts_query_capture_name_for_id
+ (ts_query, capture.index, &capture_name_len);
+ Lisp_Object cap =
+ Fcons (intern_c_string_1 (capture_name, capture_name_len),
+ captured_node);
+ captures_lisp = Fcons (cap, captures_lisp);
+ }
+ /* Get predicates. */
+ Lisp_Object predicates =
+ ts_predicates_for_pattern (ts_query, match.pattern_index);
+
+ captures_lisp = Fnreverse (captures_lisp);
+ if (ts_eval_predicates (captures_lisp, predicates))
+ {
+ result = CALLN (Fnconc, result, captures_lisp);
+ }
+ }
+ ts_query_delete (ts_query);
+ ts_query_cursor_delete (cursor);
+ return result;
+}
+
+/*** Initialization */
+
+/* Initialize the tree-sitter routines. */
+void
+syms_of_tree_sitter (void)
+{
+ DEFSYM (Qtree_sitter_parser_p, "tree-sitter-parser-p");
+ DEFSYM (Qtree_sitter_node_p, "tree-sitter-node-p");
+ DEFSYM (Qnamed, "named");
+ DEFSYM (Qmissing, "missing");
+ DEFSYM (Qextra, "extra");
+ DEFSYM (Qhas_changes, "has-changes");
+ DEFSYM (Qhas_error, "has-error");
+
+ DEFSYM (Qtree_sitter_error, "tree-sitter-error");
+ DEFSYM (Qtree_sitter_query_error, "tree-sitter-query-error");
+ DEFSYM (Qtree_sitter_parse_error, "tree-sitter-parse-error");
+ DEFSYM (Qtree_sitter_range_invalid, "tree-sitter-range-invalid");
+ DEFSYM (Qtree_sitter_buffer_too_large,
+ "tree-sitter-buffer-too-large");
+ DEFSYM (Qtree_sitter_load_language_error,
+ "tree-sitter-load-language-error");
+ DEFSYM (Qtree_sitter_node_outdated,
+ "tree-sitter-node-outdated");
+
+ define_error (Qtree_sitter_error, "Generic tree-sitter error", Qerror);
+ define_error (Qtree_sitter_query_error, "Query pattern is malformed",
+ Qtree_sitter_error);
+ /* Should be impossible, no need to document this error. */
+ define_error (Qtree_sitter_parse_error, "Parse failed",
+ Qtree_sitter_error);
+ define_error (Qtree_sitter_range_invalid,
+ "RANGES are invalid, they have to be ordered and not overlapping",
+ Qtree_sitter_error);
+ define_error (Qtree_sitter_buffer_too_large, "Buffer too large (> 4GB)",
+ Qtree_sitter_error);
+ define_error (Qtree_sitter_load_language_error,
+ "Cannot load language definition",
+ Qtree_sitter_error);
+ define_error (Qtree_sitter_node_outdated,
+ "This node is outdated, please retrieve a new one",
+ Qtree_sitter_error);
+
+ DEFSYM (Qtree_sitter_parser_list, "tree-sitter-parser-list");
+ DEFVAR_LISP ("tree-sitter-parser-list", Vtree_sitter_parser_list,
+ doc: /* A list of tree-sitter parsers.
+
+If you removed a parser from this list, do not put it back in. Emacs
+keeps the parser in this list updated with any change in the buffer.
+If removed and put back in, there is no guarantee that the parser is in
+sync with the buffer's content. */);
+ Vtree_sitter_parser_list = Qnil;
+ Fmake_variable_buffer_local (Qtree_sitter_parser_list);
+
+ DEFVAR_LISP ("tree-sitter-load-name-override-list",
+ Vtree_sitter_load_name_override_list,
+ doc:
+ /* An override alist for irregular tree-sitter libraries.
+
+By default, Emacs assumes the dynamic library for tree-sitter-<lang>
+is libtree-sitter-<lang>.<ext>, where <ext> is the OS specific
+extension for dynamic libraries. Emacs also assumes that the name of
+the C function the library provides is tree_sitter_<lang>. If that is
+not the case, add an entry
+
+ (LANGUAGE-SYMBOL LIBRARY-BASE-NAME FUNCTION-NAME)
+
+to this alist, where LIBRARY-BASE-NAME is the filename of the dynamic
+library without extension, FUNCTION-NAME is the function provided by
+the library. */);
+ Vtree_sitter_load_name_override_list = Qnil;
+
+ defsubr (&Stree_sitter_language_available_p);
+
+ defsubr (&Stree_sitter_parser_p);
+ defsubr (&Stree_sitter_node_p);
+
+ defsubr (&Stree_sitter_node_parser);
+
+ defsubr (&Stree_sitter_parser_create);
+ defsubr (&Stree_sitter_parser_buffer);
+ defsubr (&Stree_sitter_parser_language);
+
+ defsubr (&Stree_sitter_parser_root_node);
+ /* defsubr (&Stree_sitter_parse_string); */
+
+ defsubr (&Stree_sitter_parser_set_included_ranges);
+ defsubr (&Stree_sitter_parser_included_ranges);
+
+ defsubr (&Stree_sitter_node_type);
+ defsubr (&Stree_sitter_node_start);
+ defsubr (&Stree_sitter_node_end);
+ defsubr (&Stree_sitter_node_string);
+ defsubr (&Stree_sitter_node_parent);
+ defsubr (&Stree_sitter_node_child);
+ defsubr (&Stree_sitter_node_check);
+ defsubr (&Stree_sitter_node_field_name_for_child);
+ defsubr (&Stree_sitter_node_child_count);
+ defsubr (&Stree_sitter_node_child_by_field_name);
+ defsubr (&Stree_sitter_node_next_sibling);
+ defsubr (&Stree_sitter_node_prev_sibling);
+ defsubr (&Stree_sitter_node_first_child_for_pos);
+ defsubr (&Stree_sitter_node_descendant_for_range);
+ defsubr (&Stree_sitter_node_eq);
+
+ defsubr (&Stree_sitter_expand_pattern);
+ defsubr (&Stree_sitter_expand_query);
+ defsubr (&Stree_sitter_query_capture);
+}
diff --git a/src/tree-sitter.h b/src/tree-sitter.h
new file mode 100644
index 0000000000..05d8e14fe6
--- /dev/null
+++ b/src/tree-sitter.h
@@ -0,0 +1,139 @@
+/* Header file for the tree-sitter integration.
+
+Copyright (C) 2021 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#ifndef EMACS_TREE_SITTER_H
+#define EMACS_TREE_SITTER_H
+
+#include <config.h>
+#include "lisp.h"
+
+#include <tree_sitter/api.h>
+
+INLINE_HEADER_BEGIN
+
+/* A wrapper for a tree-sitter parser, but also contains a parse tree
+ and other goodies for convenience. */
+struct Lisp_TS_Parser
+{
+ union vectorlike_header header;
+ /* A symbol represents the language this parser uses. See the
+ manual for more explanation. */
+ Lisp_Object language_symbol;
+ /* The buffer associated with this parser. */
+ Lisp_Object buffer;
+ /* The pointer to the tree-sitter parser. Never NULL. */
+ TSParser *parser;
+ /* Pointer to the syntax tree. Initially is NULL, so check for NULL
+ before use. */
+ TSTree *tree;
+ /* Teaches tree-sitter how to read an Emacs buffer. */
+ TSInput input;
+ /* Re-parsing an unchanged buffer is not free for tree-sitter, so we
+ only make it re-parse when need_reparse == true. That usually
+ means some change is made in the buffer. But others could set
+ this field to true to force tree-sitter to re-parse. */
+ bool need_reparse;
+ /* These two positions record the buffer byte position (1-based) of
+ the "visible region" that tree-sitter sees. Unlike markers,
+ These two positions do not change as the user inserts and deletes
+ text around them. Before re-parse, we move these positions to
+ match BUF_BEGV_BYTE and BUF_ZV_BYTE. Note that we don't need to
+ synchronize these positions when retrieving them in a function
+ that involves a node: if the node is not outdated, these
+ positions are synchronized. */
+ ptrdiff_t visible_beg;
+ ptrdiff_t visible_end;
+ /* This counter is incremented every time a change is made to the
+ buffer in ts_record_change. The node retrieved from this parser
+ inherits this timestamp. This way we can make sure the node is
+ not outdated when we access its information. */
+ ptrdiff_t timestamp;
+};
+
+/* A wrapper around a tree-sitter node. */
+struct Lisp_TS_Node
+{
+ union vectorlike_header header;
+ /* This prevents gc from collecting the tree before the node is done
+ with it. TSNode contains a pointer to the tree it belongs to,
+ and the parser object, when collected by gc, will free that
+ tree. */
+ Lisp_Object parser;
+ TSNode node;
+ /* A node inherits its parser's timestamp at creation time. The
+ parser's timestamp increments as the buffer changes. This way we
+ can make sure the node is not outdated when we access its
+ information. */
+ ptrdiff_t timestamp;
+};
+
+INLINE bool
+TS_PARSERP (Lisp_Object x)
+{
+ return PSEUDOVECTORP (x, PVEC_TS_PARSER);
+}
+
+INLINE struct Lisp_TS_Parser *
+XTS_PARSER (Lisp_Object a)
+{
+ eassert (TS_PARSERP (a));
+ return XUNTAG (a, Lisp_Vectorlike, struct Lisp_TS_Parser);
+}
+
+INLINE bool
+TS_NODEP (Lisp_Object x)
+{
+ return PSEUDOVECTORP (x, PVEC_TS_NODE);
+}
+
+INLINE struct Lisp_TS_Node *
+XTS_NODE (Lisp_Object a)
+{
+ eassert (TS_NODEP (a));
+ return XUNTAG (a, Lisp_Vectorlike, struct Lisp_TS_Node);
+}
+
+INLINE void
+CHECK_TS_PARSER (Lisp_Object parser)
+{
+ CHECK_TYPE (TS_PARSERP (parser), Qtree_sitter_parser_p, parser);
+}
+
+INLINE void
+CHECK_TS_NODE (Lisp_Object node)
+{
+ CHECK_TYPE (TS_NODEP (node), Qtree_sitter_node_p, node);
+}
+
+void
+ts_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
+ ptrdiff_t new_end_byte);
+
+Lisp_Object
+make_ts_parser (Lisp_Object buffer, TSParser *parser,
+ TSTree *tree, Lisp_Object language_symbol);
+
+Lisp_Object
+make_ts_node (Lisp_Object parser, TSNode node);
+
+extern void syms_of_tree_sitter (void);
+
+INLINE_HEADER_END
+
+#endif /* EMACS_TREE_SITTER_H */
diff --git a/test/src/tree-sitter-tests.el b/test/src/tree-sitter-tests.el
new file mode 100644
index 0000000000..46e6e692eb
--- /dev/null
+++ b/test/src/tree-sitter-tests.el
@@ -0,0 +1,366 @@
+;;; tree-sitter-tests.el --- tests for src/tree-sitter.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2021 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Code:
+
+(require 'ert)
+(require 'tree-sitter)
+
+(ert-deftest tree-sitter-basic-parsing ()
+ "Test basic parsing routines."
+ (with-temp-buffer
+ (let ((parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json)))
+ (should
+ (eq parser (car tree-sitter-parser-list)))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(ERROR)"))
+
+ (insert "[1,2,3]")
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number) (number)))"))
+
+ (goto-char (point-min))
+ (forward-char 3)
+ (insert "{\"name\": \"Bob\"},")
+ (should
+ (equal
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number) (number)))")))))
+
+(ert-deftest tree-sitter-node-api ()
+ "Tests for node API."
+ (with-temp-buffer
+ (let (parser root-node doc-node object-node pair-node)
+ (progn
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
+ ;; `tree-sitter-node-type'.
+ (should (equal "document" (tree-sitter-node-type root-node)))
+ ;; `tree-sitter-node-check'.
+ (should (eq t (tree-sitter-node-check root-node 'named)))
+ (should (eq nil (tree-sitter-node-check root-node 'missing)))
+ (should (eq nil (tree-sitter-node-check root-node 'extra)))
+ (should (eq nil (tree-sitter-node-check root-node 'has-error)))
+ ;; `tree-sitter-node-child'.
+ (setq doc-node (tree-sitter-node-child root-node 0))
+ (should (equal "array" (tree-sitter-node-type doc-node)))
+ (should (equal (tree-sitter-node-string doc-node)
+ "(array (number) (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number))"))
+ ;; `tree-sitter-node-child-count'.
+ (should (eql 9 (tree-sitter-node-child-count doc-node)))
+ (should (eql 4 (tree-sitter-node-child-count doc-node t)))
+ ;; `tree-sitter-node-field-name-for-child'.
+ (setq object-node (tree-sitter-node-child doc-node 2 t))
+ (setq pair-node (tree-sitter-node-child object-node 0 t))
+ (should (equal "object" (tree-sitter-node-type object-node)))
+ (should (equal "pair" (tree-sitter-node-type pair-node)))
+ (should (equal "key"
+ (tree-sitter-node-field-name-for-child
+ pair-node 0)))
+ ;; `tree-sitter-node-child-by-field-name'.
+ (should (equal "(string (string_content))"
+ (tree-sitter-node-string
+ (tree-sitter-node-child-by-field-name
+ pair-node "key"))))
+ ;; `tree-sitter-node-next-sibling'.
+ (should (equal "(number)"
+ (tree-sitter-node-string
+ (tree-sitter-node-next-sibling object-node t))))
+ (should (equal "(\",\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-next-sibling object-node))))
+ ;; `tree-sitter-node-prev-sibling'.
+ (should (equal "(number)"
+ (tree-sitter-node-string
+ (tree-sitter-node-prev-sibling object-node t))))
+ (should (equal "(\",\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-prev-sibling object-node))))
+ ;; `tree-sitter-node-first-child-for-pos'.
+ (should (equal "(number)"
+ (tree-sitter-node-string
+ (tree-sitter-node-first-child-for-pos
+ doc-node 3 t))))
+ (should (equal "(\",\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-first-child-for-pos
+ doc-node 3))))
+ ;; `tree-sitter-node-descendant-for-range'.
+ (should (equal "(\"{\")"
+ (tree-sitter-node-string
+ (tree-sitter-node-descendant-for-range
+ root-node 6 7))))
+ (should (equal "(object (pair key: (string (string_content)) value: (string (string_content))))"
+ (tree-sitter-node-string
+ (tree-sitter-node-descendant-for-range
+ root-node 6 7 t))))
+ ;; `tree-sitter-node-eq'.
+ (should (tree-sitter-node-eq root-node root-node))
+ (should (not (tree-sitter-node-eq root-node doc-node))))))
+
+(ert-deftest tree-sitter-query-api ()
+ "Tests for query API."
+ (with-temp-buffer
+ (let (parser root-node pattern doc-node object-node pair-node)
+ (progn
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
+
+ (dolist (pattern
+ '("(string) @string
+(pair key: (_) @keyword)
+((_) @bob (#match \"^B.b$\" @bob))
+(number) @number
+((number) @n3 (#equal \"3\" @n3)) "
+ ((string) @string
+ (pair key: (_) @keyword)
+ ((_) @bob (:match "^B.b$" @bob))
+ (number) @number
+ ((number) @n3 (:equal "3" @n3)))))
+ (should
+ (equal
+ '((number . "1") (number . "2")
+ (keyword . "\"name\"")
+ (string . "\"name\"")
+ (string . "\"Bob\"")
+ (bob . "Bob")
+ (number . "3")
+ (n3 . "3"))
+ (mapcar (lambda (entry)
+ (cons (car entry)
+ (tree-sitter-node-text
+ (cdr entry))))
+ (tree-sitter-query-capture root-node pattern))))
+ (should
+ (equal
+ "(type field: (_) @capture .) ? * + \"return\""
+ (tree-sitter-expand-query
+ '((type field: (_) @capture :anchor)
+ :? :* :+ "return"))))))))
+
+(ert-deftest tree-sitter-narrow ()
+ "Tests if narrowing works."
+ (with-temp-buffer
+ (let (parser root-node pattern doc-node object-node pair-node)
+ (progn
+ (insert "xxx[1,{\"name\": \"Bob\"},2,3]xxx")
+ (narrow-to-region (+ (point-min) 3) (- (point-max) 3))
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
+ ;; This test is from the basic test.
+ (should
+ (equal
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number) (number)))"))
+
+ (widen)
+ (goto-char (point-min))
+ (insert "ooo")
+ (should (equal "oooxxx[1,{\"name\": \"Bob\"},2,3]xxx"
+ (buffer-string)))
+ (delete-region 10 26)
+ (should (equal "oooxxx[1,2,3]xxx"
+ (buffer-string)))
+ (narrow-to-region (+ (point-min) 6) (- (point-max) 3))
+ ;; This test is also from the basic test.
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number) (number)))"))
+ (widen)
+ (goto-char (point-max))
+ (insert "[1,2]")
+ (should (equal "oooxxx[1,2,3]xxx[1,2]"
+ (buffer-string)))
+ (narrow-to-region (- (point-max) 5) (point-max))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number) (number)))"))
+ (widen)
+ (goto-char (point-min))
+ (insert "[1]")
+ (should (equal "[1]oooxxx[1,2,3]xxx[1,2]"
+ (buffer-string)))
+ (narrow-to-region (point-min) (+ (point-min) 3))
+ (should
+ (equal (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))
+ "(document (array (number)))")))))
+
+(ert-deftest tree-sitter-range ()
+ "Tests if range works."
+ (with-temp-buffer
+ (let (parser root-node pattern doc-node object-node pair-node)
+ (progn
+ (insert "[[1],oooxxx[1,2,3],xxx[1,2]]")
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser)))
+ (should-error
+ (tree-sitter-parser-set-included-ranges
+ parser '((1 . 6) (5 . 20)))
+ :type '(tree-sitter-range-invalid))
+
+ (tree-sitter-parser-set-included-ranges
+ parser '((1 . 6) (12 . 20) (23 . 29)))
+ (should (equal '((1 . 6) (12 . 20) (23 . 29))
+ (tree-sitter-parser-included-ranges parser)))
+ (should (equal "(document (array (array (number)) (array (number) (number) (number)) (array (number) (number))))"
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node parser))))
+ ;; TODO: More tests.
+ )))
+
+(ert-deftest tree-sitter-multi-lang ()
+ "Tests if parsing multiple language works."
+ (with-temp-buffer
+ (let (html css js html-range css-range js-range)
+ (progn
+ (insert "<html><script>1</script><style>body {}</style></html>")
+ (setq html (tree-sitter-get-parser-create 'tree-sitter-html))
+ (setq css (tree-sitter-get-parser-create 'tree-sitter-css))
+ (setq js (tree-sitter-get-parser-create 'tree-sitter-javascript)))
+ ;; JavaScript.
+ (setq js-range
+ (tree-sitter-query-range
+ 'tree-sitter-html
+ '((script_element (raw_text) @capture))))
+ (should (equal '((15 . 16)) js-range))
+ (tree-sitter-parser-set-included-ranges js js-range)
+ (should (equal "(program (expression_statement (number)))"
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node js))))
+ ;; CSS.
+ (setq css-range
+ (tree-sitter-query-range
+ 'tree-sitter-html
+ '((style_element (raw_text) @capture))))
+ (should (equal '((32 . 39)) css-range))
+ (tree-sitter-parser-set-included-ranges css css-range)
+ (should
+ (equal "(stylesheet (rule_set (selectors (tag_name)) (block)))"
+ (tree-sitter-node-string
+ (tree-sitter-parser-root-node css))))
+ ;; TODO: More tests.
+ )))
+
+(ert-deftest tree-sitter-parser-supplemental ()
+ "Supplemental node functions."
+ ;; `tree-sitter-get-parser'.
+ (with-temp-buffer
+ (should (equal (tree-sitter-get-parser 'tree-sitter-json) nil)))
+ ;; `tree-sitter-get-parser-create'.
+ (with-temp-buffer
+ (should (not (equal (tree-sitter-get-parser-create 'tree-sitter-json)
+ nil))))
+ ;; `tree-sitter-parse-string'.
+ (should (equal (tree-sitter-node-string
+ (tree-sitter-parse-string
+ "[1,2,{\"name\": \"Bob\"},3]"
+ 'tree-sitter-json))
+ "(document (array (number) (number) (object (pair key: (string (string_content)) value: (string (string_content)))) (number)))"))
+ (with-temp-buffer
+ (let (parser root-node doc-node object-node pair-node)
+ (progn
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser))
+ (setq doc-node (tree-sitter-node-child root-node 0)))
+ ;; `tree-sitter-get-parser'.
+ (should (not (equal (tree-sitter-get-parser 'tree-sitter-json)
+ nil)))
+ ;; `tree-sitter-language-at'.
+ (should (equal (tree-sitter-language-at (point))
+ 'tree-sitter-json))
+ ;; `tree-sitter-set-ranges', `tree-sitter-get-ranges'.
+ (tree-sitter-set-ranges 'tree-sitter-json
+ '((1 . 2)))
+ (should (equal (tree-sitter-get-ranges 'tree-sitter-json)
+ '((1 . 2)))))))
+
+(ert-deftest tree-sitter-node-supplemental ()
+ "Supplemental node functions."
+ (let (parser root-node doc-node array-node)
+ (progn
+ (insert "[1,2,{\"name\": \"Bob\"},3]")
+ (setq parser (tree-sitter-parser-create
+ (current-buffer) 'tree-sitter-json))
+ (setq root-node (tree-sitter-parser-root-node
+ parser))
+ (setq doc-node (tree-sitter-node-child root-node 0)))
+ ;; `tree-sitter-node-buffer'.
+ (should (equal (tree-sitter-node-buffer root-node)
+ (current-buffer)))
+ ;; `tree-sitter-node-language'.
+ (should (eq (tree-sitter-node-language root-node)
+ 'tree-sitter-json))
+ ;; `tree-sitter-node-at'.
+ (should (equal (tree-sitter-node-string
+ (tree-sitter-node-at 1 2 'tree-sitter-json))
+ "(\"[\")"))
+ ;; `tree-sitter-buffer-root-node'.
+ (should (tree-sitter-node-eq
+ (tree-sitter-buffer-root-node 'tree-sitter-json)
+ root-node))
+ ;; `tree-sitter-filter-child'.
+ (should (equal (mapcar
+ (lambda (node)
+ (tree-sitter-node-type node))
+ (tree-sitter-filter-child
+ doc-node (lambda (node)
+ (tree-sitter-node-check node 'named))))
+ '("number" "number" "object" "number")))
+ ;; `tree-sitter-node-text'.
+ (should (equal (tree-sitter-node-text doc-node)
+ "[1,2,{\"name\": \"Bob\"},3]"))
+ ;; `tree-sitter-node-index'.
+ (should (eq (tree-sitter-node-index doc-node)
+ 0))
+ ;; TODO:
+ ;; `tree-sitter-parent-until'
+ ;; `tree-sitter-parent-while'
+ ;; `tree-sitter-node-children'
+ ;; `tree-sitter-node-field-name'
+ ))
+
+;; TODO
+;; - Functions in tree-sitter.el
+;; - tree-sitter-load-name-override-list
+
+(provide 'tree-sitter-tests)
+;;; tree-sitter-tests.el ends here
--
2.33.1
^ permalink raw reply related [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-13 6:25 ` Yuan Fu
@ 2022-03-13 7:13 ` Po Lu
2022-03-14 0:23 ` Yuan Fu
2022-03-29 16:40 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Po Lu @ 2022-03-13 7:13 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Clément Pit-Claudel, Theodor Thornhill,
ubolonton, Emacs developers, Philipp, Stefan Monnier, Yoav Marco,
Stephen Leake, John Yates
Yuan Fu <casouri@gmail.com> writes:
> +if test "${with_tree_sitter}" != "no"; then
> + EMACS_CHECK_MODULES([TREE_SITTER], [tree-sitter >= 0.0],
Is that version number a typo?
I assume the tree-sitter support depends on the recently introduced
interface for specificing custom malloc functions.
Thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-13 7:13 ` Po Lu
@ 2022-03-14 0:23 ` Yuan Fu
2022-03-14 1:10 ` Po Lu
2022-03-14 3:31 ` Eli Zaretskii
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-14 0:23 UTC (permalink / raw)
To: Po Lu
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Philipp,
Stefan Monnier, Eli Zaretskii, Yoav Marco, Stephen Leake,
John Yates
> On Mar 12, 2022, at 11:13 PM, Po Lu <luangruo@yahoo.com> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>> +if test "${with_tree_sitter}" != "no"; then
>> + EMACS_CHECK_MODULES([TREE_SITTER], [tree-sitter >= 0.0],
>
> Is that version number a typo?
>
> I assume the tree-sitter support depends on the recently introduced
> interface for specificing custom malloc functions.
>
Yes, good catch. The current version is 0.20.6. I think we should use that?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-14 0:23 ` Yuan Fu
@ 2022-03-14 1:10 ` Po Lu
2022-03-14 3:31 ` Eli Zaretskii
1 sibling, 0 replies; 370+ messages in thread
From: Po Lu @ 2022-03-14 1:10 UTC (permalink / raw)
To: Yuan Fu
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Philipp,
Stefan Monnier, Eli Zaretskii, Yoav Marco, Stephen Leake,
John Yates
Yuan Fu <casouri@gmail.com> writes:
> Yes, good catch. The current version is 0.20.6. I think we should use
> that?
Looks good to me, though I'm no tree-sitter expert.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-14 0:23 ` Yuan Fu
2022-03-14 1:10 ` Po Lu
@ 2022-03-14 3:31 ` Eli Zaretskii
2022-03-14 3:43 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2022-03-14 3:31 UTC (permalink / raw)
To: Yuan Fu
Cc: ubolonton, theo, cpitclaudel, emacs-devel, luangruo, p.stephani2,
monnier, yoavm448, stephen_leake, john
> From: Yuan Fu <casouri@gmail.com>
> Date: Sun, 13 Mar 2022 17:23:09 -0700
> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
> Theodor Thornhill <theo@thornhill.no>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>, Philipp <p.stephani2@gmail.com>,
> Stefan Monnier <monnier@iro.umontreal.ca>, Eli Zaretskii <eliz@gnu.org>,
> Yoav Marco <yoavm448@gmail.com>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> John Yates <john@yates-sheets.org>
>
> >> +if test "${with_tree_sitter}" != "no"; then
> >> + EMACS_CHECK_MODULES([TREE_SITTER], [tree-sitter >= 0.0],
> >
> > Is that version number a typo?
> >
> > I assume the tree-sitter support depends on the recently introduced
> > interface for specificing custom malloc functions.
> >
>
> Yes, good catch. The current version is 0.20.6. I think we should use that?
Please use the first version where the malloc customization was
introduced.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-14 3:31 ` Eli Zaretskii
@ 2022-03-14 3:43 ` Yuan Fu
0 siblings, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-14 3:43 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Tuấn-Anh Nguyễn, Theodor Thornhill,
Clément Pit-Claudel, Emacs developers, Po Lu, Philipp,
Stefan Monnier, Yoav Marco, Stephen Leake, john
> On Mar 13, 2022, at 8:31 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Sun, 13 Mar 2022 17:23:09 -0700
>> Cc: Tuấn-Anh Nguyễn <ubolonton@gmail.com>,
>> Theodor Thornhill <theo@thornhill.no>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> Emacs developers <emacs-devel@gnu.org>, Philipp <p.stephani2@gmail.com>,
>> Stefan Monnier <monnier@iro.umontreal.ca>, Eli Zaretskii <eliz@gnu.org>,
>> Yoav Marco <yoavm448@gmail.com>,
>> Stephen Leake <stephen_leake@stephe-leake.org>,
>> John Yates <john@yates-sheets.org>
>>
>>>> +if test "${with_tree_sitter}" != "no"; then
>>>> + EMACS_CHECK_MODULES([TREE_SITTER], [tree-sitter >= 0.0],
>>>
>>> Is that version number a typo?
>>>
>>> I assume the tree-sitter support depends on the recently introduced
>>> interface for specificing custom malloc functions.
>>>
>>
>> Yes, good catch. The current version is 0.20.6. I think we should use that?
>
> Please use the first version where the malloc customization was
> introduced.
Cool. That would be 0.20.2.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-13 6:25 ` Yuan Fu
2022-03-13 7:13 ` Po Lu
@ 2022-03-29 16:40 ` Eli Zaretskii
2022-03-30 0:35 ` Po Lu
` (3 more replies)
1 sibling, 4 replies; 370+ messages in thread
From: Eli Zaretskii @ 2022-03-29 16:40 UTC (permalink / raw)
To: Yuan Fu, Lars Ingebrigtsen; +Cc: ubolonton, yoavm448, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Sat, 12 Mar 2022 22:25:11 -0800
> Cc: Yoav Marco <yoavm448@gmail.com>,
> Clément Pit-Claudel <cpitclaudel@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> John Yates <john@yates-sheets.org>,
> Stefan Monnier <monnier@iro.umontreal.ca>,
> Philipp <p.stephani2@gmail.com>,
> Stephen Leake <stephen_leake@stephe-leake.org>,
> Theodor Thornhill <theo@thornhill.no>,
> ubolonton@gmail.com
>
> > It has been quite a while. I added some fixes to the patch and added full changeling. Anyone would like to have a look at it?
> >
> > Thanks,
> > Yuan
>
> Forgot to attach the patch, here it is:
Thanks. I skimmed this, and it looks in sufficiently good shape. Do
you consider this feature-complete enough to make one more step
towards merging it? If so, I'd like first to have this on a feature
branch in our repository, so that people could build it and try it.
Then we could land it on master after some time.
One thing that we should consider right now is the name-space. You
used tree-sitter-* names for all the symbols, and I'm asking whether
we don't want something shorter, like ts-*. This is a decision we
must make now, because once we start using the code, there will be no
way back. Lars, WDYT?
One other thing that I don't think I have a clear idea about is the
deployment. Do we expect end-users (or downstream package
maintainers) to download and install the language definition libraries
they need? If so, I think we should have our own load-path for these
libraries; relying on the standard LD_LIBRARY_PATH etc. is not good
enough (although we should support that as well). I envision that at
least in some cases users will not want to have these libraries in the
public places, or maybe even won't have the requisite access rights to
do so. We should provide Emacs-style alternatives, like some
subdirectory of ~/.emacs.d/ and/or under ${prefix}/lib/ (similar to
*.eln files).
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-29 16:40 ` Eli Zaretskii
@ 2022-03-30 0:35 ` Po Lu
2022-03-30 0:49 ` Yuan Fu
` (2 subsequent siblings)
3 siblings, 0 replies; 370+ messages in thread
From: Po Lu @ 2022-03-30 0:35 UTC (permalink / raw)
To: Eli Zaretskii
Cc: Yuan Fu, Lars Ingebrigtsen, ubolonton, yoavm448, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
> One thing that we should consider right now is the name-space. You
> used tree-sitter-* names for all the symbols, and I'm asking whether
> we don't want something shorter, like ts-*. This is a decision we
> must make now, because once we start using the code, there will be no
> way back. Lars, WDYT?
I'm not Lars but the former seems fine to me, while I think the `ts'
namespace is already used by a Typescript package (or some third party
tree-sitter bindings, I forgot which).
Thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-29 16:40 ` Eli Zaretskii
2022-03-30 0:35 ` Po Lu
@ 2022-03-30 0:49 ` Yuan Fu
2022-03-30 0:51 ` Yuan Fu
` (2 more replies)
2022-03-31 16:35 ` Yuan Fu
2022-03-31 23:00 ` Yuan Fu
3 siblings, 3 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-30 0:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, Yoav Marco, ubolonton, Emacs developers
>
> Thanks. I skimmed this, and it looks in sufficiently good shape. Do
> you consider this feature-complete enough to make one more step
> towards merging it? If so, I'd like first to have this on a feature
> branch in our repository, so that people could build it and try it.
> Then we could land it on master after some time.
Yes. I can do that once we figure out the namespace and new load path.
>
> One thing that we should consider right now is the name-space. You
> used tree-sitter-* names for all the symbols, and I'm asking whether
> we don't want something shorter, like ts-*. This is a decision we
> must make now, because once we start using the code, there will be no
> way back. Lars, WDYT?
As Po said, there is a package already using the ts prefix and that package is popular enough. Maybe tsr? Anyway, I’d love a shorter prefix other than tree-sitter, it is a real pain to type this long prefix.
>
> One other thing that I don't think I have a clear idea about is the
> deployment. Do we expect end-users (or downstream package
> maintainers) to download and install the language definition libraries
> they need? If so, I think we should have our own load-path for these
> libraries; relying on the standard LD_LIBRARY_PATH etc. is not good
> enough (although we should support that as well). I envision that at
> least in some cases users will not want to have these libraries in the
> public places, or maybe even won't have the requisite access rights to
> do so. We should provide Emacs-style alternatives, like some
> subdirectory of ~/.emacs.d/ and/or under ${prefix}/lib/ (similar to
> *.eln files).
We expect end-users/maintainers to distribute language definitions. Because the language library must be in the same version as the tree-sitter library, and we expect them to distribute tree-sitter. I don’t have an opinion on where the load path should be.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 0:49 ` Yuan Fu
@ 2022-03-30 0:51 ` Yuan Fu
2022-03-30 2:13 ` Po Lu
2022-03-30 2:31 ` Eli Zaretskii
2 siblings, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-30 0:51 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, Yoav Marco, ubolonton, Emacs developers
> On Mar 29, 2022, at 5:49 PM, Yuan Fu <casouri@gmail.com> wrote:
>
>>
>> Thanks. I skimmed this, and it looks in sufficiently good shape. Do
>> you consider this feature-complete enough to make one more step
>> towards merging it? If so, I'd like first to have this on a feature
>> branch in our repository, so that people could build it and try it.
>> Then we could land it on master after some time.
>
> Yes. I can do that once we figure out the namespace and new load path.
>
>>
>> One thing that we should consider right now is the name-space. You
>> used tree-sitter-* names for all the symbols, and I'm asking whether
>> we don't want something shorter, like ts-*. This is a decision we
>> must make now, because once we start using the code, there will be no
>> way back. Lars, WDYT?
>
> As Po said, there is a package already using the ts prefix and that package is popular enough. Maybe tsr? Anyway, I’d love a shorter prefix other than tree-sitter, it is a real pain to type this long prefix.
I should clarify that I meant maybe we can use tsr as the prefix for tree-sitter functions. The package I mentioned that uses ts prefix is ts.el, a timestamp library on MELPA.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 0:49 ` Yuan Fu
2022-03-30 0:51 ` Yuan Fu
@ 2022-03-30 2:13 ` Po Lu
2022-03-30 3:01 ` Yuan Fu
2022-03-30 2:31 ` Eli Zaretskii
2 siblings, 1 reply; 370+ messages in thread
From: Po Lu @ 2022-03-30 2:13 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Lars Ingebrigtsen, Yoav Marco, ubolonton,
Emacs developers
Yuan Fu <casouri@gmail.com> writes:
> As Po said, there is a package already using the ts prefix and that
> package is popular enough. Maybe tsr? Anyway, I’d love a shorter
> prefix other than tree-sitter, it is a real pain to type this long
> prefix.
Aren't shorthands supposed to fix that problem?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 0:49 ` Yuan Fu
2022-03-30 0:51 ` Yuan Fu
2022-03-30 2:13 ` Po Lu
@ 2022-03-30 2:31 ` Eli Zaretskii
2022-03-30 2:59 ` Yuan Fu
2 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2022-03-30 2:31 UTC (permalink / raw)
To: Yuan Fu; +Cc: larsi, yoavm448, ubolonton, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Tue, 29 Mar 2022 17:49:55 -0700
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
> Yoav Marco <yoavm448@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> ubolonton@gmail.com
>
> > One thing that we should consider right now is the name-space. You
> > used tree-sitter-* names for all the symbols, and I'm asking whether
> > we don't want something shorter, like ts-*. This is a decision we
> > must make now, because once we start using the code, there will be no
> > way back. Lars, WDYT?
>
> As Po said, there is a package already using the ts prefix and that package is popular enough. Maybe tsr? Anyway, I’d love a shorter prefix other than tree-sitter, it is a real pain to type this long prefix.
Then tsitter, perhaps?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 2:31 ` Eli Zaretskii
@ 2022-03-30 2:59 ` Yuan Fu
2022-03-30 9:04 ` Lars Ingebrigtsen
2022-03-31 4:27 ` Richard Stallman
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-30 2:59 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: larsi, yoavm448, ubolonton, emacs-devel
> On Mar 29, 2022, at 7:31 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Tue, 29 Mar 2022 17:49:55 -0700
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> Yoav Marco <yoavm448@gmail.com>,
>> Emacs developers <emacs-devel@gnu.org>,
>> ubolonton@gmail.com
>>
>>> One thing that we should consider right now is the name-space. You
>>> used tree-sitter-* names for all the symbols, and I'm asking whether
>>> we don't want something shorter, like ts-*. This is a decision we
>>> must make now, because once we start using the code, there will be no
>>> way back. Lars, WDYT?
>>
>> As Po said, there is a package already using the ts prefix and that package is popular enough. Maybe tsr? Anyway, I’d love a shorter prefix other than tree-sitter, it is a real pain to type this long prefix.
>
> Then tsitter, perhaps?
LGTM
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 2:13 ` Po Lu
@ 2022-03-30 3:01 ` Yuan Fu
2022-03-30 3:10 ` Vitaly Ankh
2022-03-30 3:39 ` Po Lu
0 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-30 3:01 UTC (permalink / raw)
To: Po Lu
Cc: Eli Zaretskii, Yoav Marco, Emacs developers, Lars Ingebrigtsen,
ubolonton
> On Mar 29, 2022, at 7:13 PM, Po Lu <luangruo@yahoo.com> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>> As Po said, there is a package already using the ts prefix and that
>> package is popular enough. Maybe tsr? Anyway, I’d love a shorter
>> prefix other than tree-sitter, it is a real pain to type this long
>> prefix.
>
> Aren't shorthands supposed to fix that problem?
I was saying we can’t use ts as a prefix, but I agree with the idea of using a shorter prefix.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 3:01 ` Yuan Fu
@ 2022-03-30 3:10 ` Vitaly Ankh
2022-03-30 3:24 ` Yuan Fu
2022-03-30 3:39 ` Po Lu
1 sibling, 1 reply; 370+ messages in thread
From: Vitaly Ankh @ 2022-03-30 3:10 UTC (permalink / raw)
To: Yuan Fu
Cc: ubolonton, Emacs developers, Po Lu, Eli Zaretskii, Yoav Marco,
Lars Ingebrigtsen
Why not just tree-sitter? It looks intuitive and tsitter only saves
you four letters...
On Wed, Mar 30, 2022 at 11:03 AM Yuan Fu <casouri@gmail.com> wrote:
>
>
>
> > On Mar 29, 2022, at 7:13 PM, Po Lu <luangruo@yahoo.com> wrote:
> >
> > Yuan Fu <casouri@gmail.com> writes:
> >
> >> As Po said, there is a package already using the ts prefix and that
> >> package is popular enough. Maybe tsr? Anyway, I’d love a shorter
> >> prefix other than tree-sitter, it is a real pain to type this long
> >> prefix.
> >
> > Aren't shorthands supposed to fix that problem?
>
> I was saying we can’t use ts as a prefix, but I agree with the idea of using a shorter prefix.
>
> Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 3:10 ` Vitaly Ankh
@ 2022-03-30 3:24 ` Yuan Fu
0 siblings, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-30 3:24 UTC (permalink / raw)
To: Vitaly Ankh
Cc: ubolonton, Emacs developers, Po Lu, Eli Zaretskii, Yoav Marco,
Lars Ingebrigtsen
> On Mar 29, 2022, at 8:10 PM, Vitaly Ankh <vitalyankh@gmail.com> wrote:
>
> Why not just tree-sitter? It looks intuitive and tsitter only saves
> you four letters…
I’d much rather use ts-, but that is taken. tree-sitter- doesn’t look long, but is really annoying when you actually write code using tree-sitter API. Completion doesn’t help, because you have to type the full prefix plus a few characters for completion to get you what you want, otherwise it completes to whatever is on the top of candidate list. If there is better prefixes that are short and mnemonic I’ll gladly take that. I don’t have an opinion on which exact prefix we should use.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 3:01 ` Yuan Fu
2022-03-30 3:10 ` Vitaly Ankh
@ 2022-03-30 3:39 ` Po Lu
2022-03-30 4:29 ` Yuan Fu
2022-03-30 13:46 ` João Távora
1 sibling, 2 replies; 370+ messages in thread
From: Po Lu @ 2022-03-30 3:39 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Yoav Marco, Emacs developers, Lars Ingebrigtsen,
ubolonton
Yuan Fu <casouri@gmail.com> writes:
> I was saying we can’t use ts as a prefix, but I agree with the idea of
> using a shorter prefix.
But you can have the cake and eat it too, by making `ts' (or some other
short prefix) a shorthand for `tree-sitter' in the files where you want
to use it.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 3:39 ` Po Lu
@ 2022-03-30 4:29 ` Yuan Fu
2022-03-30 5:19 ` Phil Sainty
2022-03-30 13:46 ` João Távora
1 sibling, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2022-03-30 4:29 UTC (permalink / raw)
To: Po Lu
Cc: Eli Zaretskii, Yoav Marco, Lars Ingebrigtsen, ubolonton,
Emacs developers
> On Mar 29, 2022, at 8:39 PM, Po Lu <luangruo@yahoo.com> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>> I was saying we can’t use ts as a prefix, but I agree with the idea of
>> using a shorter prefix.
>
> But you can have the cake and eat it too, by making `ts' (or some other
> short prefix) a shorthand for `tree-sitter' in the files where you want
> to use it.
That’s true. I have some worries like whether people know about this feature and whether they will use it, but perhaps these aren’t real problems. I don’t have any case against tree-sitter- prefix apart from it being tedious to type. (They also fills up a 70-char line rather quickly, but that’s not too big of a problem.)
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 4:29 ` Yuan Fu
@ 2022-03-30 5:19 ` Phil Sainty
2022-03-30 5:39 ` Phil Sainty
0 siblings, 1 reply; 370+ messages in thread
From: Phil Sainty @ 2022-03-30 5:19 UTC (permalink / raw)
To: Yuan Fu
Cc: ubolonton, Emacs developers, Po Lu, Eli Zaretskii, Yoav Marco,
Lars Ingebrigtsen
On 2022-03-30 17:29, Yuan Fu wrote:
> On Mar 29, 2022, at 8:39 PM, Po Lu <luangruo@yahoo.com> wrote:
>> But you can have the cake and eat it too, by making `ts' (or some
>> other short prefix) a shorthand for `tree-sitter' in the files
>> where you want to use it.
>
> That’s true. I have some worries like whether people know about this
> feature and whether they will use it, but perhaps these aren’t real
> problems.
My recollection is that shorthands are not to be used in Emacs core.
(I believe it was confirmed that they will be used only for certain
pre-existing cases where renaming was not an option, and would not be
allowed in core Emacs code.)
tree-sitter- doesn't seem very long to me, and I'm not sure there's
any acronym which replaces it suitably (it's a slightly weird name,
after all).
Developers can always define an abbrev "ts" to expand to "tree-sitter-"
if they are typing it a great deal?
-Phil
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 5:19 ` Phil Sainty
@ 2022-03-30 5:39 ` Phil Sainty
0 siblings, 0 replies; 370+ messages in thread
From: Phil Sainty @ 2022-03-30 5:39 UTC (permalink / raw)
To: Yuan Fu
Cc: ubolonton, Emacs developers, Po Lu, Lars Ingebrigtsen, Yoav Marco,
Eli Zaretskii
On 2022-03-30 18:19, Phil Sainty wrote:
> tree-sitter- doesn't seem very long to me, and I'm not sure
> there's any acronym which replaces it suitably.
That said, if the expectation is that this feature will be(come)
used so pervasively that a shorter name is going to be a genuine
benefit on a large scale, then I'd suggest "tsit-" as being
sufficient shorter to make a difference, while still giving some
hint at the meaning.
(I was going to say that or just "sit-" but there's "sit-for" to
conflict with that.)
-Phil
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 2:59 ` Yuan Fu
@ 2022-03-30 9:04 ` Lars Ingebrigtsen
2022-03-30 11:48 ` Daniel Martín
2022-03-31 4:27 ` Richard Stallman
1 sibling, 1 reply; 370+ messages in thread
From: Lars Ingebrigtsen @ 2022-03-30 9:04 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, yoavm448, ubolonton, emacs-devel
Yuan Fu <casouri@gmail.com> writes:
>>>> One thing that we should consider right now is the name-space. You
>>>> used tree-sitter-* names for all the symbols, and I'm asking whether
>>>> we don't want something shorter, like ts-*. This is a decision we
>>>> must make now, because once we start using the code, there will be no
>>>> way back. Lars, WDYT?
A shorter name would be nice, yes.
>>> As Po said, there is a package already using the ts prefix and that
>>> package is popular enough. Maybe tsr? Anyway, I’d love a shorter
>>> prefix other than tree-sitter, it is a real pain to type this long
>>> prefix.
>>
>> Then tsitter, perhaps?
>
> LGTM
Fine by me. Or... is tsit taken?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 9:04 ` Lars Ingebrigtsen
@ 2022-03-30 11:48 ` Daniel Martín
2022-03-30 15:00 ` [External] : " Drew Adams
0 siblings, 1 reply; 370+ messages in thread
From: Daniel Martín @ 2022-03-30 11:48 UTC (permalink / raw)
To: Lars Ingebrigtsen
Cc: Yuan Fu, Eli Zaretskii, yoavm448, ubolonton, emacs-devel
Lars Ingebrigtsen <larsi@gnus.org> writes:
>
> Fine by me. Or... is tsit taken?
IMO, tsitter is much more descriptive than tsit, and it's only three
letters longer.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 3:39 ` Po Lu
2022-03-30 4:29 ` Yuan Fu
@ 2022-03-30 13:46 ` João Távora
1 sibling, 0 replies; 370+ messages in thread
From: João Távora @ 2022-03-30 13:46 UTC (permalink / raw)
To: Po Lu
Cc: Yuan Fu, ubolonton, Emacs developers, Lars Ingebrigtsen,
Yoav Marco, Eli Zaretskii
[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]
On Wed, Mar 30, 2022 at 4:40 AM Po Lu <luangruo@yahoo.com> wrote:
> Yuan Fu <casouri@gmail.com> writes:
>
> > I was saying we can’t use ts as a prefix, but I agree with the idea of
> > using a shorter prefix.
>
> But you can have the cake and eat it too, by making `ts' (or some other
> short prefix) a shorthand for `tree-sitter' in the files where you want
> to use it.
Exactly. And just to highlight this fact: shorthands are per-file. In one
file `tree-sitter-foo` can be shothanded to `ts-foo` but not in others.
If the relevant files are in Emacs core and some official policy states
that
no shorthands should be used in versioned files there, then maybe
your typing/reading aches can still be somewhat solved by temporarily
using the shorthand in your file, reading and writing `ts-foo` wherever
you like (knowing it will be interned `tree-sitter-foo`) and then renaming
everything to `tree-sitter-` before committing. Personally, I think I'd do
this at least in test/scratch files that exercise the API or in non-core
packages.
João
[-- Attachment #2: Type: text/html, Size: 1657 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* RE: [External] : Re: Tree-sitter api
2022-03-30 11:48 ` Daniel Martín
@ 2022-03-30 15:00 ` Drew Adams
0 siblings, 0 replies; 370+ messages in thread
From: Drew Adams @ 2022-03-30 15:00 UTC (permalink / raw)
To: Daniel Martín, Lars Ingebrigtsen
Cc: Yuan Fu, yoavm448@gmail.com, emacs-devel@gnu.org, Eli Zaretskii,
ubolonton@gmail.com
> IMO, tsitter is much more descriptive than tsit,
> and it's only three letters longer.
I don't really care about this question, but
I'll mention that `treesit-' is the same
length as `tsitter-', and the former speaks
more to what's involved than the latter, I
think.
Isn't "tree" more important to the meaning
than "sit" or "sitter"? (Is there any real
"sitting" involved?)
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-30 2:59 ` Yuan Fu
2022-03-30 9:04 ` Lars Ingebrigtsen
@ 2022-03-31 4:27 ` Richard Stallman
2022-03-31 5:36 ` Eli Zaretskii
1 sibling, 1 reply; 370+ messages in thread
From: Richard Stallman @ 2022-03-31 4:27 UTC (permalink / raw)
To: Yuan Fu; +Cc: eliz, yoavm448, emacs-devel, larsi, ubolonton
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > Then tsitter, perhaps?
Maybe treesit? It is the same length as tsitter, but clearer.
--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-31 4:27 ` Richard Stallman
@ 2022-03-31 5:36 ` Eli Zaretskii
2022-03-31 11:13 ` Lars Ingebrigtsen
` (2 more replies)
0 siblings, 3 replies; 370+ messages in thread
From: Eli Zaretskii @ 2022-03-31 5:36 UTC (permalink / raw)
To: rms; +Cc: casouri, yoavm448, larsi, ubolonton, emacs-devel
> From: Richard Stallman <rms@gnu.org>
> Date: Thu, 31 Mar 2022 00:27:08 -0400
> Cc: eliz@gnu.org, yoavm448@gmail.com, emacs-devel@gnu.org, larsi@gnus.org,
> ubolonton@gmail.com
>
> > > Then tsitter, perhaps?
>
> Maybe treesit? It is the same length as tsitter, but clearer.
"treesit" is fine with me, thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-31 5:36 ` Eli Zaretskii
@ 2022-03-31 11:13 ` Lars Ingebrigtsen
2022-03-31 12:46 ` John Yates
2022-03-31 16:23 ` [External] : " Drew Adams
2022-03-31 19:33 ` Filipp Gunbin
2 siblings, 1 reply; 370+ messages in thread
From: Lars Ingebrigtsen @ 2022-03-31 11:13 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, yoavm448, emacs-devel, rms, ubolonton
Eli Zaretskii <eliz@gnu.org> writes:
> "treesit" is fine with me, thanks.
Yeah, that's a good name.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-31 11:13 ` Lars Ingebrigtsen
@ 2022-03-31 12:46 ` John Yates
2022-03-31 17:37 ` Phil Sainty
2022-03-31 17:58 ` Stefan Monnier
0 siblings, 2 replies; 370+ messages in thread
From: John Yates @ 2022-03-31 12:46 UTC (permalink / raw)
To: Lars Ingebrigtsen
Cc: ubolonton, Richard Stallman, Yuan Fu, Emacs developers,
Eli Zaretskii, yoavm448
On Thu, Mar 31, 2022 at 7:15 AM Lars Ingebrigtsen <larsi@gnus.org> wrote:
> Yeah, that's a good name.
As pointed out previously, the important concept
here is "tree" not "sitting". "trees" is shorter and
perhaps more suggestive.
^ permalink raw reply [flat|nested] 370+ messages in thread
* RE: [External] : Re: Tree-sitter api
2022-03-31 5:36 ` Eli Zaretskii
2022-03-31 11:13 ` Lars Ingebrigtsen
@ 2022-03-31 16:23 ` Drew Adams
2022-03-31 19:33 ` Filipp Gunbin
2 siblings, 0 replies; 370+ messages in thread
From: Drew Adams @ 2022-03-31 16:23 UTC (permalink / raw)
To: Eli Zaretskii, rms@gnu.org
Cc: casouri@gmail.com, yoavm448@gmail.com, emacs-devel@gnu.org,
larsi@gnus.org, ubolonton@gmail.com
> > > > Then tsitter, perhaps?
> >
> > Maybe treesit? It is the same length as tsitter, but clearer.
>
> "treesit" is fine with me, thanks.
Great minds think alike... ;-) (I suggested the same.)
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-29 16:40 ` Eli Zaretskii
2022-03-30 0:35 ` Po Lu
2022-03-30 0:49 ` Yuan Fu
@ 2022-03-31 16:35 ` Yuan Fu
2022-03-31 23:00 ` Yuan Fu
3 siblings, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-31 16:35 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, yoavm448, ubolonton, emacs-devel
>
> If so, I think we should have our own load-path for these
> libraries; relying on the standard LD_LIBRARY_PATH etc. is not good
> enough (although we should support that as well). I envision that at
> least in some cases users will not want to have these libraries in the
> public places, or maybe even won't have the requisite access rights to
> do so. We should provide Emacs-style alternatives, like some
> subdirectory of ~/.emacs.d/ and/or under ${prefix}/lib/ (similar to
> *.eln files).
Anyone has thoughts on this?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-31 12:46 ` John Yates
@ 2022-03-31 17:37 ` Phil Sainty
2022-04-01 1:56 ` Po Lu
2022-03-31 17:58 ` Stefan Monnier
1 sibling, 1 reply; 370+ messages in thread
From: Phil Sainty @ 2022-03-31 17:37 UTC (permalink / raw)
To: John Yates
Cc: Yuan Fu, Richard Stallman, ubolonton, Emacs developers,
Eli Zaretskii, yoavm448, Lars Ingebrigtsen
On 2022-04-01 01:46, John Yates wrote:
> As pointed out previously, the important concept
> here is "tree" not "sitting". "trees" is shorter and
> perhaps more suggestive.
Unfortunately "tree" is an extremely generic term and on
its own doesn't suggest the very specific functionality
provided by tree-sitter.
"trees" just looks like the plural of "tree" rather than
"tree"+"s"[itter], and so I would expect that to be for
manipulating some general-purpose tree data structures
before I thought of "an incremental parsing library" for
programming languages.
Conversely the "sit" part of the name, whilst not
suggestive of being connected with trees of some kind,
*is* very suggestive of the actual library which is being
integrated.
I.e. tree-sitter itself may not* be about "sitting"; but
the Emacs integration is very specifically about a thing
called "tree-sitter" rather than just about trees.
(*) Or maybe it is. I don't understand that part of the
name (and I can't find any documentation explaining it)
-- but it *is* the name.
-Phil
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-31 12:46 ` John Yates
2022-03-31 17:37 ` Phil Sainty
@ 2022-03-31 17:58 ` Stefan Monnier
2022-04-04 10:29 ` Jostein Kjønigsen
1 sibling, 1 reply; 370+ messages in thread
From: Stefan Monnier @ 2022-03-31 17:58 UTC (permalink / raw)
To: John Yates
Cc: Lars Ingebrigtsen, ubolonton, Richard Stallman, Yuan Fu,
Emacs developers, Eli Zaretskii, yoavm448
Hi heard there's a bikeshedding opportunity here, so I'll throw in
another option if short is really important: `TS-`
Stefan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-31 5:36 ` Eli Zaretskii
2022-03-31 11:13 ` Lars Ingebrigtsen
2022-03-31 16:23 ` [External] : " Drew Adams
@ 2022-03-31 19:33 ` Filipp Gunbin
2 siblings, 0 replies; 370+ messages in thread
From: Filipp Gunbin @ 2022-03-31 19:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
On 31/03/2022 08:36 +0300, Eli Zaretskii wrote:
>> From: Richard Stallman <rms@gnu.org>
>> Date: Thu, 31 Mar 2022 00:27:08 -0400
>> Cc: eliz@gnu.org, yoavm448@gmail.com, emacs-devel@gnu.org, larsi@gnus.org,
>> ubolonton@gmail.com
>>
>> > > Then tsitter, perhaps?
>>
>> Maybe treesit? It is the same length as tsitter, but clearer.
>
> "treesit" is fine with me, thanks.
+1 for "treesit"
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-29 16:40 ` Eli Zaretskii
` (2 preceding siblings ...)
2022-03-31 16:35 ` Yuan Fu
@ 2022-03-31 23:00 ` Yuan Fu
2022-03-31 23:53 ` Yuan Fu
2022-04-01 6:20 ` Eli Zaretskii
3 siblings, 2 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-31 23:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, yoavm448, ubolonton, emacs-devel
>
> If so, I think we should have our own load-path for these
> libraries; relying on the standard LD_LIBRARY_PATH etc. is not good
> enough (although we should support that as well). I envision that at
> least in some cases users will not want to have these libraries in the
> public places, or maybe even won't have the requisite access rights to
> do so. We should provide Emacs-style alternatives, like some
> subdirectory of ~/.emacs.d/ and/or under ${prefix}/lib/ (similar to
> *.eln files).
Anyone have thoughts on this?
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-31 23:00 ` Yuan Fu
@ 2022-03-31 23:53 ` Yuan Fu
2022-04-01 6:20 ` Eli Zaretskii
1 sibling, 0 replies; 370+ messages in thread
From: Yuan Fu @ 2022-03-31 23:53 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, Yoav Marco, ubolonton, Emacs developers
> On Mar 31, 2022, at 4:00 PM, Yuan Fu <casouri@gmail.com> wrote:
>
>>
>> If so, I think we should have our own load-path for these
>> libraries; relying on the standard LD_LIBRARY_PATH etc. is not good
>> enough (although we should support that as well). I envision that at
>> least in some cases users will not want to have these libraries in the
>> public places, or maybe even won't have the requisite access rights to
>> do so. We should provide Emacs-style alternatives, like some
>> subdirectory of ~/.emacs.d/ and/or under ${prefix}/lib/ (similar to
>> *.eln files).
>
> Anyone have thoughts on this?
>
> Yuan
Sorry for the duplicate, didn’t mean to repeat myself.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-31 17:37 ` Phil Sainty
@ 2022-04-01 1:56 ` Po Lu
2022-04-01 6:36 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Po Lu @ 2022-04-01 1:56 UTC (permalink / raw)
To: Phil Sainty
Cc: John Yates, Yuan Fu, Richard Stallman, ubolonton,
Emacs developers, Eli Zaretskii, yoavm448, Lars Ingebrigtsen
Phil Sainty <psainty@orcon.net.nz> writes:
> Unfortunately "tree" is an extremely generic term and on
> its own doesn't suggest the very specific functionality
> provided by tree-sitter.
>
> "trees" just looks like the plural of "tree" rather than
> "tree"+"s"[itter], and so I would expect that to be for
> manipulating some general-purpose tree data structures
> before I thought of "an incremental parsing library" for
> programming languages.
>
> Conversely the "sit" part of the name, whilst not
> suggestive of being connected with trees of some kind,
> *is* very suggestive of the actual library which is being
> integrated.
>
> I.e. tree-sitter itself may not* be about "sitting"; but
> the Emacs integration is very specifically about a thing
> called "tree-sitter" rather than just about trees.
>
> (*) Or maybe it is. I don't understand that part of the
> name (and I can't find any documentation explaining it)
> -- but it *is* the name.
>
>
> -Phil
"treesit" sounds rather ambiguous and ugly, while "treesitt" is only 3
letters away from "tree-sitter". Instead of bikeshedding over a name,
why not use shorthands to refer to "tree-sitter" in files where the
column number limit is actually important? Isn't that what shorthands
were intended to solve?
Thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-31 23:00 ` Yuan Fu
2022-03-31 23:53 ` Yuan Fu
@ 2022-04-01 6:20 ` Eli Zaretskii
2022-04-01 16:48 ` Yuan Fu
1 sibling, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2022-04-01 6:20 UTC (permalink / raw)
To: Yuan Fu; +Cc: larsi, yoavm448, ubolonton, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 31 Mar 2022 16:00:28 -0700
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
> yoavm448@gmail.com,
> emacs-devel@gnu.org,
> ubolonton@gmail.com
>
> >
> > If so, I think we should have our own load-path for these
> > libraries; relying on the standard LD_LIBRARY_PATH etc. is not good
> > enough (although we should support that as well). I envision that at
> > least in some cases users will not want to have these libraries in the
> > public places, or maybe even won't have the requisite access rights to
> > do so. We should provide Emacs-style alternatives, like some
> > subdirectory of ~/.emacs.d/ and/or under ${prefix}/lib/ (similar to
> > *.eln files).
>
> Anyone have thoughts on this?
What kind of thoughts? Whether or not to provide this feature (I
think we should), or how best to implement that? Or something else?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-01 1:56 ` Po Lu
@ 2022-04-01 6:36 ` Eli Zaretskii
2022-04-01 7:56 ` Po Lu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2022-04-01 6:36 UTC (permalink / raw)
To: Po Lu; +Cc: casouri, rms, psainty, ubolonton, emacs-devel, larsi, yoavm448,
john
> From: Po Lu <luangruo@yahoo.com>
> Cc: John Yates <john@yates-sheets.org>, Yuan Fu <casouri@gmail.com>,
> Richard Stallman <rms@gnu.org>, ubolonton@gmail.com, Emacs developers
> <emacs-devel@gnu.org>, Eli Zaretskii <eliz@gnu.org>, yoavm448@gmail.com,
> Lars Ingebrigtsen <larsi@gnus.org>
> Date: Fri, 01 Apr 2022 09:56:21 +0800
>
> "treesit" sounds rather ambiguous and ugly, while "treesitt" is only 3
> letters away from "tree-sitter". Instead of bikeshedding over a name,
> why not use shorthands to refer to "tree-sitter" in files where the
> column number limit is actually important? Isn't that what shorthands
> were intended to solve?
As mentioned before, we don't want to use shorthands for core
features, and I'm not even sure there's a good way of doing that when
some of the feature is in primitives written in C.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-01 6:36 ` Eli Zaretskii
@ 2022-04-01 7:56 ` Po Lu
2022-04-01 10:45 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Po Lu @ 2022-04-01 7:56 UTC (permalink / raw)
To: Eli Zaretskii
Cc: casouri, rms, psainty, ubolonton, emacs-devel, larsi, yoavm448,
john
Eli Zaretskii <eliz@gnu.org> writes:
> As mentioned before, we don't want to use shorthands for core
> features
Ah, okay, I must've missed that.
> and I'm not even sure there's a good way of doing that when some of
> the feature is in primitives written in C.
Why would shorthands behave specially with C primitives, since
shorthands work on the reader-level?
Thanks.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-01 7:56 ` Po Lu
@ 2022-04-01 10:45 ` Eli Zaretskii
0 siblings, 0 replies; 370+ messages in thread
From: Eli Zaretskii @ 2022-04-01 10:45 UTC (permalink / raw)
To: Po Lu; +Cc: casouri, rms, psainty, ubolonton, emacs-devel, larsi, yoavm448,
john
> From: Po Lu <luangruo@yahoo.com>
> Cc: casouri@gmail.com, rms@gnu.org, psainty@orcon.net.nz,
> ubolonton@gmail.com, emacs-devel@gnu.org, larsi@gnus.org,
> yoavm448@gmail.com, john@yates-sheets.org
> Date: Fri, 01 Apr 2022 15:56:40 +0800
>
> > and I'm not even sure there's a good way of doing that when some of
> > the feature is in primitives written in C.
>
> Why would shorthands behave specially with C primitives, since
> shorthands work on the reader-level?
Because C primitives aren't read by the Lisp reader?
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-01 6:20 ` Eli Zaretskii
@ 2022-04-01 16:48 ` Yuan Fu
2022-04-01 17:59 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2022-04-01 16:48 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, Yoav Marco, ubolonton, Emacs developers
> On Mar 31, 2022, at 11:20 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 31 Mar 2022 16:00:28 -0700
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> yoavm448@gmail.com,
>> emacs-devel@gnu.org,
>> ubolonton@gmail.com
>>
>>>
>>> If so, I think we should have our own load-path for these
>>> libraries; relying on the standard LD_LIBRARY_PATH etc. is not good
>>> enough (although we should support that as well). I envision that at
>>> least in some cases users will not want to have these libraries in the
>>> public places, or maybe even won't have the requisite access rights to
>>> do so. We should provide Emacs-style alternatives, like some
>>> subdirectory of ~/.emacs.d/ and/or under ${prefix}/lib/ (similar to
>>> *.eln files).
>>
>> Anyone have thoughts on this?
>
> What kind of thoughts? Whether or not to provide this feature (I
> think we should), or how best to implement that? Or something else?
Thoughts on what paths should we use, whether we want to allow for custom load-paths (I think we should), and if so, the name for the load path variable ({tree-sitter/treesit/...}-language-definition-load-path?)
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-01 16:48 ` Yuan Fu
@ 2022-04-01 17:59 ` Eli Zaretskii
2022-04-02 6:26 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2022-04-01 17:59 UTC (permalink / raw)
To: Yuan Fu; +Cc: larsi, yoavm448, ubolonton, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 1 Apr 2022 09:48:09 -0700
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
> Yoav Marco <yoavm448@gmail.com>,
> Emacs developers <emacs-devel@gnu.org>,
> ubolonton@gmail.com
>
> >>> If so, I think we should have our own load-path for these
> >>> libraries; relying on the standard LD_LIBRARY_PATH etc. is not good
> >>> enough (although we should support that as well). I envision that at
> >>> least in some cases users will not want to have these libraries in the
> >>> public places, or maybe even won't have the requisite access rights to
> >>> do so. We should provide Emacs-style alternatives, like some
> >>> subdirectory of ~/.emacs.d/ and/or under ${prefix}/lib/ (similar to
> >>> *.eln files).
> >>
> >> Anyone have thoughts on this?
> >
> > What kind of thoughts? Whether or not to provide this feature (I
> > think we should), or how best to implement that? Or something else?
>
> Thoughts on what paths should we use, whether we want to allow for custom load-paths (I think we should), and if so, the name for the load path variable ({tree-sitter/treesit/...}-language-definition-load-path?)
I thought I answered all those questions, with the single exception of
the name of the path variable.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-01 17:59 ` Eli Zaretskii
@ 2022-04-02 6:26 ` Yuan Fu
2022-04-04 7:38 ` Robert Pluim
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2022-04-02 6:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, Yoav Marco, ubolonton, Emacs developers
> On Apr 1, 2022, at 10:59 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 1 Apr 2022 09:48:09 -0700
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> Yoav Marco <yoavm448@gmail.com>,
>> Emacs developers <emacs-devel@gnu.org>,
>> ubolonton@gmail.com
>>
>>>>> If so, I think we should have our own load-path for these
>>>>> libraries; relying on the standard LD_LIBRARY_PATH etc. is not good
>>>>> enough (although we should support that as well). I envision that at
>>>>> least in some cases users will not want to have these libraries in the
>>>>> public places, or maybe even won't have the requisite access rights to
>>>>> do so. We should provide Emacs-style alternatives, like some
>>>>> subdirectory of ~/.emacs.d/ and/or under ${prefix}/lib/ (similar to
>>>>> *.eln files).
>>>>
>>>> Anyone have thoughts on this?
>>>
>>> What kind of thoughts? Whether or not to provide this feature (I
>>> think we should), or how best to implement that? Or something else?
>>
>> Thoughts on what paths should we use, whether we want to allow for custom load-paths (I think we should), and if so, the name for the load path variable ({tree-sitter/treesit/...}-language-definition-load-path?)
>
> I thought I answered all those questions, with the single exception of
> the name of the path variable.
Ah, I thought you are inviting for suggestions. I’ll use ~/.emacs.d/tree-sitter and ${prefix}/lib and use ${tree-sitter-prefix}-load-path, if there’s no further suggestions.
The treesit prefix seems to receive most votes (including mine), so if no further objections I’ll change tree-sitter to treesit.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-02 6:26 ` Yuan Fu
@ 2022-04-04 7:38 ` Robert Pluim
2022-04-04 20:41 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Robert Pluim @ 2022-04-04 7:38 UTC (permalink / raw)
To: Yuan Fu
Cc: Eli Zaretskii, Yoav Marco, Emacs developers, Lars Ingebrigtsen,
ubolonton
>>>>> On Fri, 1 Apr 2022 23:26:25 -0700, Yuan Fu <casouri@gmail.com> said:
Yuan> Ah, I thought you are inviting for suggestions. I’ll use
Yuan> ~/.emacs.d/tree-sitter and ${prefix}/lib and use
You should use `locate-user-emacs-file' rather than hard-coding
"~/.emacs.d"
Robert
--
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-03-31 17:58 ` Stefan Monnier
@ 2022-04-04 10:29 ` Jostein Kjønigsen
0 siblings, 0 replies; 370+ messages in thread
From: Jostein Kjønigsen @ 2022-04-04 10:29 UTC (permalink / raw)
To: Stefan Monnier, John Yates
Cc: Yuan Fu, Richard Stallman, ubolonton, Emacs developers,
Eli Zaretskii, yoavm448, Lars Ingebrigtsen
[-- Attachment #1: Type: text/plain, Size: 564 bytes --]
On 31.03.2022 19:58, Stefan Monnier wrote:
> Hi heard there's a bikeshedding opportunity here, so I'll throw in
> another option if short is really important: `TS-`
>
> Stefan
While I see it's already established that we don't want to add too short
names, TS is also a common abbrevation for TypeScript, a very common and
popular JavaScript (JS!) dialect.
TS could definitely become a source of confusion.
--
Kind regards
*Jostein Kjønigsen*
jostein@kjonigsen.net 🍵 jostein@gmail.com
https://jostein.kjønigsen.no <https://jostein.kjønigsen.no>
[-- Attachment #2: Type: text/html, Size: 1368 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-04 7:38 ` Robert Pluim
@ 2022-04-04 20:41 ` Yuan Fu
2022-04-20 20:14 ` Theodor Thornhill
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2022-04-04 20:41 UTC (permalink / raw)
To: Robert Pluim
Cc: Eli Zaretskii, Yoav Marco, Emacs developers, Lars Ingebrigtsen,
ubolonton
> On Apr 4, 2022, at 12:38 AM, Robert Pluim <rpluim@gmail.com> wrote:
>
>>>>>> On Fri, 1 Apr 2022 23:26:25 -0700, Yuan Fu <casouri@gmail.com> said:
> Yuan> Ah, I thought you are inviting for suggestions. I’ll use
> Yuan> ~/.emacs.d/tree-sitter and ${prefix}/lib and use
>
> You should use `locate-user-emacs-file' rather than hard-coding
> "~/.emacs.d”
Thanks. I’ll make sure to use that.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-04 20:41 ` Yuan Fu
@ 2022-04-20 20:14 ` Theodor Thornhill
2022-04-21 1:36 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Theodor Thornhill @ 2022-04-20 20:14 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, Emacs developers, Lars Ingebrigtsen
>> On Apr 4, 2022, at 12:38 AM, Robert Pluim <rpluim@gmail.com> wrote:
>>
>>>>>>> On Fri, 1 Apr 2022 23:26:25 -0700, Yuan Fu <casouri@gmail.com> said:
>> Yuan> Ah, I thought you are inviting for suggestions. I’ll use
>> Yuan> ~/.emacs.d/tree-sitter and ${prefix}/lib and use
>>
>> You should use `locate-user-emacs-file' rather than hard-coding
>> "~/.emacs.d”
>
> Thanks. I’ll make sure to use that.
Sorry for losing track of the rather long email correspondence about
this package and possibly missing out on some details.
Is there anything I can do to help furthering this feature? I'm
starting work on tree sitter integration for typescript these days, but
I'd rather like to support the emacs proper version. Let me know if
there is something I can do to help!
All the best,
Theodor
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-20 20:14 ` Theodor Thornhill
@ 2022-04-21 1:36 ` Yuan Fu
2022-04-21 5:48 ` Eli Zaretskii
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2022-04-21 1:36 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: Eli Zaretskii, Lars Ingebrigtsen, Emacs developers
>
> Sorry for losing track of the rather long email correspondence about
> this package and possibly missing out on some details.
>
> Is there anything I can do to help furthering this feature? I'm
> starting work on tree sitter integration for typescript these days, but
> I'd rather like to support the emacs proper version. Let me know if
> there is something I can do to help!
>
> All the best,
> Theodor
Hey Theodor,
I would follow this article to start playing with tree-sitter in Emacs: https://archive.casouri.cat/note/2021/emacs-tree-sitter/index.html
You can help by using it and let me know where the integration can do better, eg, the font-lock and indentation engine might be hard to use, etc.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-21 1:36 ` Yuan Fu
@ 2022-04-21 5:48 ` Eli Zaretskii
2022-04-21 11:37 ` Lars Ingebrigtsen
0 siblings, 1 reply; 370+ messages in thread
From: Eli Zaretskii @ 2022-04-21 5:48 UTC (permalink / raw)
To: Yuan Fu; +Cc: larsi, theo, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Wed, 20 Apr 2022 18:36:09 -0700
> Cc: Eli Zaretskii <eliz@gnu.org>,
> Emacs developers <emacs-devel@gnu.org>,
> Lars Ingebrigtsen <larsi@gnus.org>
>
> >
> > Sorry for losing track of the rather long email correspondence about
> > this package and possibly missing out on some details.
> >
> > Is there anything I can do to help furthering this feature? I'm
> > starting work on tree sitter integration for typescript these days, but
> > I'd rather like to support the emacs proper version. Let me know if
> > there is something I can do to help!
> >
> > All the best,
> > Theodor
>
> Hey Theodor,
>
> I would follow this article to start playing with tree-sitter in Emacs: https://archive.casouri.cat/note/2021/emacs-tree-sitter/index.html
>
> You can help by using it and let me know where the integration can do better, eg, the font-lock and indentation engine might be hard to use, etc.
I hope we will soon see a feature branch in the Emacs Git repository
that people could try and provide feedback. That will move us closer
to merging this important feature, hopefully before Emacs 29 is
released.
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-21 5:48 ` Eli Zaretskii
@ 2022-04-21 11:37 ` Lars Ingebrigtsen
2022-04-21 12:10 ` Theodor Thornhill
0 siblings, 1 reply; 370+ messages in thread
From: Lars Ingebrigtsen @ 2022-04-21 11:37 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Yuan Fu, theo, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
> I hope we will soon see a feature branch in the Emacs Git repository
> that people could try and provide feedback. That will move us closer
> to merging this important feature, hopefully before Emacs 29 is
> released.
Yes, indeed. Having tree-sitter in Emacs 29 should be a priority.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-21 11:37 ` Lars Ingebrigtsen
@ 2022-04-21 12:10 ` Theodor Thornhill
2022-04-22 2:54 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Theodor Thornhill @ 2022-04-21 12:10 UTC (permalink / raw)
To: Lars Ingebrigtsen, Eli Zaretskii; +Cc: Yuan Fu, emacs-devel
Lars Ingebrigtsen <larsi@gnus.org> writes:
> Eli Zaretskii <eliz@gnu.org> writes:
>
>> I hope we will soon see a feature branch in the Emacs Git repository
>> that people could try and provide feedback. That will move us closer
>> to merging this important feature, hopefully before Emacs 29 is
>> released.
>
> Yes, indeed. Having tree-sitter in Emacs 29 should be a priority.
>
Great! I've started work on using tree-sitter provided by Yuan Fu in
typescript-mode, and it is working rather well. I'm struggling a little
with indentation, but I'm sure it's user error.
Making a feature branch for this would indeed help a _lot_ as setup
right now is very finicky. I'll report back any bugs I can find while
digging.
All the best,
Theodor
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-21 12:10 ` Theodor Thornhill
@ 2022-04-22 2:54 ` Yuan Fu
2022-04-22 4:58 ` Theodor Thornhill
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2022-04-22 2:54 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: Lars Ingebrigtsen, Eli Zaretskii, emacs-devel
>
> Great! I've started work on using tree-sitter provided by Yuan Fu in
> typescript-mode, and it is working rather well. I'm struggling a little
> with indentation, but I'm sure it's user error.
I’d love to know how are you struggling, because ideally we’d want the indentation engine to be very easy to understand and use. Maybe we can improve the documentation or something?
>
> Making a feature branch for this would indeed help a _lot_ as setup
> right now is very finicky. I'll report back any bugs I can find while
> digging.
So you think there can be any improvements in this regard? (Except for moving to a feature branch, of course, that’s already on the list.)
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-22 2:54 ` Yuan Fu
@ 2022-04-22 4:58 ` Theodor Thornhill
2022-04-22 7:08 ` Yuan Fu
0 siblings, 1 reply; 370+ messages in thread
From: Theodor Thornhill @ 2022-04-22 4:58 UTC (permalink / raw)
To: Yuan Fu; +Cc: Lars Ingebrigtsen, Eli Zaretskii, emacs-devel
Yuan Fu <casouri@gmail.com> writes:
>> Great! I've started work on using tree-sitter provided by Yuan Fu in
>> typescript-mode, and it is working rather well. I'm struggling a little
>> with indentation, but I'm sure it's user error.
>
> I’d love to know how are you struggling, because ideally we’d want the
> indentation engine to be very easy to understand and use. Maybe we can
> improve the documentation or something?
>
I find the way MATCHER -> ANCHOR -> OFFSET technique a little hard to
parse. Ideally I'd like to say something like: "Every direct child of
node FOO should be indented 2 spaces, including the null node". This is
the general case of indentation as far as I can tell. I'm thinking such
a rule could look like:
((child-of-and-null "function_declaration") (node-is "function_declaration") 2)
This could perhaps be abstracted yet again into a shorthand such as
this:
(scope-openers '("function_declaration" "class_declaration" "try_statement"))
My goal is that I get a typing experience where openers always indent:
```typescript
function foo() {
| <-- point is here
}
```
```typescript
function foo() {
try {
| <-- point is here
}
}
```
```typescript
foo(() => {
| <-- point is here
});
```
Does this make any sense?
I find that most of the time emacs cannot find the anchor (that's at
least what it is logging), and I assume that means it at least matched
something.
In addition - one trouble I've had with indentation using the libraries
from melpa is that accumulating offsets in a parentwise path add to to
too big of an indent. Here's an example:
```typescript
const foo = someFunction(() => ({
prop: "arst", // <-- indented by two spaces
}))
```
```typescript
const foo = someFunction(
() => ({
prop: "arst", // <-- indented by four spaces
})
)
```
This is the expected indentation. What I'd get is:
```typescript
const foo = someFunction(() => ({
prop: "arst",
}))
```
What happens is that the arguments list triggers as an indentation step,
but it should only do so when when on its own line. I believe this is
what SMIE calls "hanging-p" in its engine.
>> Making a feature branch for this would indeed help a _lot_ as setup
>> right now is very finicky. I'll report back any bugs I can find while
>> digging.
>
> So you think there can be any improvements in this regard? (Except for
> moving to a feature branch, of course, that’s already on the list.)
>
The hardest part apart from a feature branch is getting hold of the
definitions. I think your script-package should be added to elpa so
that putting them in a directory emacs can see can be automated.
I _really_ think we should distribute a function to get these libraries
when emacs ships, as every editor does this.
Sorry for the long post, hope some of it makes sense :)
Theodor
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-22 4:58 ` Theodor Thornhill
@ 2022-04-22 7:08 ` Yuan Fu
2022-04-22 8:02 ` Theodor Thornhill
0 siblings, 1 reply; 370+ messages in thread
From: Yuan Fu @ 2022-04-22 7:08 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: Lars Ingebrigtsen, Eli Zaretskii, emacs-devel
Thanks, your feedback is very valuable.
> I find the way MATCHER -> ANCHOR -> OFFSET technique a little hard to
> parse. Ideally I'd like to say something like: "Every direct child of
> node FOO should be indented 2 spaces, including the null node". This is
> the general case of indentation as far as I can tell. I'm thinking such
> a rule could look like:
>
> ((child-of-and-null "function_declaration") (node-is "function_declaration") 2)
IIUC, you are thinking about the following, which matches with whatever point that is a child of function_declaration and indents 2 columns, is that true?
((parent-is "function_declaration")
parent
2)
I am indeed guilty of throwing that wall of text when introducing the indent engine. I’ll see if I can make it more approachable. I could probably start by showing an example and how to use it, rather than explaining all the details.
>
> This could perhaps be abstracted yet again into a shorthand such as
> this:
>
> (scope-openers '("function_declaration" "class_declaration" "try_statement"))
I see what you mean, but that it yet another concept to learn. I’ll try to add something like that without adding too much complexity.
> My goal is that I get a typing experience where openers always indent:
>
> ```typescript
> function foo() {
> | <-- point is here
> }
> ```
>
> ```typescript
> function foo() {
> try {
> | <-- point is here
> }
> }
> ```
>
> ```typescript
> foo(() => {
> | <-- point is here
> });
> ```
>
> Does this make any sense?
>
> I find that most of the time emacs cannot find the anchor (that's at
> least what it is logging), and I assume that means it at least matched
> something.
I don’t quite understand. What do you mean Emacs cannot find the anchor? To make sure we are on the same page, in a rule (MATCHER ANCHOR OFFSET), MATCHER determines whether this rule applies to the current line, ANCHOR tells you indent from this position, and OFFSET tells you indent this much from ANCHOR.
>
> In addition - one trouble I've had with indentation using the libraries
> from melpa is that accumulating offsets in a parentwise path add to to
> too big of an indent. Here's an example:
>
> ```typescript
> const foo = someFunction(() => ({
> prop: "arst", // <-- indented by two spaces
> }))
> ```
>
> ```typescript
> const foo = someFunction(
> () => ({
> prop: "arst", // <-- indented by four spaces
> })
> )
> ```
>
> This is the expected indentation. What I'd get is:
>
> ```typescript
> const foo = someFunction(() => ({
> prop: "arst",
> }))
> ```
>
> What happens is that the arguments list triggers as an indentation step,
> but it should only do so when when on its own line. I believe this is
> what SMIE calls "hanging-p" in its engine.
I think an anchor preset that finds the parent that’s at the beginning of a line should solve this. I’ll definitely add that one.
>
> The hardest part apart from a feature branch is getting hold of the
> definitions. I think your script-package should be added to elpa so
> that putting them in a directory emacs can see can be automated.
>
> I _really_ think we should distribute a function to get these libraries
> when emacs ships, as every editor does this.
We thought about it, the main problem is that tree-sitter the library and tree-sitter language definitions need to be in sync in terms of version. Since Emacs doesn't distribute tree-sitter the library, if we distribute language definitions we can’t make sure they are of the correct version regards to the tree-sitter library on the system. Dynamic library, being system-dependent, isn’t something ELPA can easily distribute either. I can cook up some function that automatically downloads language definitions like my script does but that feels hacky and incomplete so it isn’t something I want to put into core Emacs. Maybe I can put such a function on nongnu ELPA? I’m open to ideas.
Ideally distributions just distribute tree-sitter with all the language definitions, and we just use that.
Yuan
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-22 7:08 ` Yuan Fu
@ 2022-04-22 8:02 ` Theodor Thornhill
2022-04-22 11:41 ` Theodor Thornhill
0 siblings, 1 reply; 370+ messages in thread
From: Theodor Thornhill @ 2022-04-22 8:02 UTC (permalink / raw)
To: Yuan Fu; +Cc: Lars Ingebrigtsen, Eli Zaretskii, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3570 bytes --]
Yuan Fu <casouri@gmail.com> writes:
>
> IIUC, you are thinking about the following, which matches with whatever point that is a child of function_declaration and indents 2 columns, is that true?
>
> ((parent-is "function_declaration")
> parent
> 2)
>
> I am indeed guilty of throwing that wall of text when introducing the
> indent engine. I’ll see if I can make it more approachable. I could
> probably start by showing an example and how to use it, rather than
> explaining all the details.
>
The wall of text is nice, but for some reason I am still struggling a
little. I've used the 'ts-c-mode' supplied from the RFC, but that seems
to be a little incomplete. I'll see if that can work, but I think
there's a caveat here as well: in typescript at least there are lots of
'wrapper-nodes' that causes the relevant parent to be further up in the
tree. I _think_ 'tree-sitter-parent-until' is interesting here, but I
didn't get it to work yet.
Anyways, I'll play more with it and report.
>>
>> This could perhaps be abstracted yet again into a shorthand such as
>> this:
>>
>> (scope-openers '("function_declaration" "class_declaration" "try_statement"))
>
> I see what you mean, but that it yet another concept to learn. I’ll try to add something like that without adding too much complexity.
>
>
Yeah, I agree. This is something I can implement myself if it serves a
need. I'm not either.
>
> I don’t quite understand. What do you mean Emacs cannot find the
> anchor? To make sure we are on the same page, in a rule (MATCHER
> ANCHOR OFFSET), MATCHER determines whether this rule applies to the
> current line, ANCHOR tells you indent from this position, and OFFSET
> tells you indent this much from ANCHOR.
>
I meant that when that specific message is sent inside the
indent-line-function, that makes it look like we got a match for
MATCHER, but didn't match for ANCHOR. I think though that many of the
cases we don't even have a match for MATCHER. So the message is a
little confusing.
Just to try to make it even more clear - The MATCHER tries to match the
provided node type at the point where the cursor is located as the user
sees it in the buffer, the ANCHOR does the search to some _other_ place
and returns that indentation. In the end we add offset to that
indentation. Is that correct?
>>
>> What happens is that the arguments list triggers as an indentation step,
>> but it should only do so when when on its own line. I believe this is
>> what SMIE calls "hanging-p" in its engine.
>
> I think an anchor preset that finds the parent that’s at the beginning
> of a line should solve this. I’ll definitely add that one.
>
Great!
> Ideally distributions just distribute tree-sitter with all the
> language definitions, and we just use that.
>
This is probably the smartest, yes.
While you're here - I've noticed that tree-sitter in some cases doesn't
handle errors in the AST very well, messing up indentation. See
provided screenshots for the details. What happens here is that when I
press 'enter' to add a newline at the top, everything shifts, and the
font-locking is wrong. If I undo it, it goes back to normal. If I save
the buffer and revert the buffer, it goes back to normal. Not sure what
is happening, but maybe we can add a shield to not change font-locking
should the parser return error for some reason?
Or is it returning error because of Emacs sending the wrong thing?
Thank you for your patience.
Theo
[-- Attachment #2: correct.png --]
[-- Type: image/png, Size: 66991 bytes --]
[-- Attachment #3: wrong.png --]
[-- Type: image/png, Size: 65645 bytes --]
^ permalink raw reply [flat|nested] 370+ messages in thread
* Re: Tree-sitter api
2022-04-22 8:02 ` Theodor Thornhill
@ 2022-04-22 11:41 ` Theodor Thornhill
0 siblings, 0 replies; 370+ messages in thread
From: Theodor Thornhill @ 2022-04-22 11:41 UTC (permalink / raw)
To: Yuan Fu; +Cc: Lars Ingebrigtsen, Eli Zaretskii, emacs-devel
>>>
>>> What happens is that the arguments list triggers as an indentation step,
>>> but it should only do so when when on its own line. I believe this is
>>> what SMIE calls "hanging-p" in its engine.
>>
>> I think an anchor preset that finds the parent that’s at the beginning
>> of a line should solve this. I’ll definitely add that one.
So I made this rule which is a good starting point for this, I think.
It's a pretty naive implementation still, but I wanted to share it
nonetheless.
```elisp
(defun ts-parent-until (type)
(lambda (node parent bol &rest _)
(when-let ((found-node
(tree-sitter-parent-until
node
(lambda (parent)
(equal type (tree-sitter-node-type parent))))))
(save-excursion
(goto-char (tree-sitter-node-start found-node))
(back-to-indentation)
(tree-sitter-node-start
(tree-sitter-node-at (point) (point) 'tree-sitter-tsx))))))
(defvar ts-tsx-tree-sitter-indent-rules
`((tree-sitter-tsx
((ts-parent-until "statement_block") (ts-parent-until "statement_block") 2)
(no-node prev-line 0)
)))
```
This at least marks the start of me understanding how it works, hehe...
Theodor
^ permalink raw reply [flat|nested] 370+ messages in thread
end of thread, other threads:[~2022-04-22 11:41 UTC | newest]
Thread overview: 370+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-07-14 17:37 How to add pseudo vector types Yuan Fu
2021-07-14 17:44 ` Eli Zaretskii
2021-07-14 17:47 ` Stefan Monnier
2021-07-14 23:48 ` Yuan Fu
2021-07-15 0:26 ` Yuan Fu
2021-07-15 2:48 ` Yuan Fu
2021-07-15 6:39 ` Eli Zaretskii
2021-07-15 13:37 ` Fu Yuan
2021-07-15 14:18 ` Eli Zaretskii
2021-07-15 15:17 ` Yuan Fu
2021-07-15 15:50 ` Eli Zaretskii
2021-07-15 16:19 ` Yuan Fu
2021-07-15 16:26 ` Yuan Fu
2021-07-15 16:50 ` Eli Zaretskii
2021-07-15 16:48 ` Eli Zaretskii
2021-07-15 18:23 ` Yuan Fu
2021-07-16 7:30 ` Eli Zaretskii
2021-07-16 14:27 ` Yuan Fu
2021-07-16 14:33 ` Stefan Monnier
2021-07-16 14:53 ` Yuan Fu
2021-07-16 15:27 ` Eli Zaretskii
2021-07-16 15:51 ` Yuan Fu
2021-07-17 2:05 ` Yuan Fu
2021-07-17 2:23 ` Clément Pit-Claudel
2021-07-17 3:12 ` Yuan Fu
2021-07-17 7:18 ` Eli Zaretskii
2021-07-17 7:16 ` Eli Zaretskii
2021-07-20 20:36 ` Clément Pit-Claudel
2021-07-21 11:26 ` Eli Zaretskii
2021-07-21 13:38 ` Clément Pit-Claudel
2021-07-21 13:51 ` Eli Zaretskii
2021-07-22 4:59 ` Clément Pit-Claudel
2021-07-22 6:38 ` Eli Zaretskii
2021-07-21 16:29 ` Stephen Leake
2021-07-21 16:54 ` Clément Pit-Claudel
2021-07-21 19:43 ` Eli Zaretskii
2021-07-24 2:57 ` Stephen Leake
2021-07-24 3:39 ` Óscar Fuentes
2021-07-24 7:34 ` Eli Zaretskii
2021-07-25 16:49 ` Stephen Leake
2021-07-24 7:06 ` Eli Zaretskii
2021-07-25 17:48 ` Stephen Leake
2021-07-24 3:55 ` Clément Pit-Claudel
2021-07-21 21:54 ` Stephen Leake
2021-07-22 4:40 ` Clément Pit-Claudel
2021-07-17 17:30 ` Stefan Monnier
2021-07-17 17:54 ` Eli Zaretskii
2021-07-24 14:08 ` Stefan Monnier
2021-07-24 14:32 ` Eli Zaretskii
2021-07-24 15:10 ` Stefan Monnier
2021-07-24 15:51 ` Eli Zaretskii
2021-07-19 15:16 ` Yuan Fu
2021-07-22 3:10 ` Yuan Fu
2021-07-22 8:23 ` Eli Zaretskii
2021-07-22 13:47 ` Yuan Fu
2021-07-22 14:11 ` Óscar Fuentes
2021-07-22 17:09 ` Eli Zaretskii
2021-07-22 19:29 ` Óscar Fuentes
2021-07-23 5:21 ` Eli Zaretskii
2021-07-24 9:38 ` Stephen Leake
2021-07-22 17:00 ` Eli Zaretskii
2021-07-22 17:47 ` Yuan Fu
2021-07-22 19:05 ` Eli Zaretskii
2021-07-23 13:25 ` Yuan Fu
2021-07-23 19:10 ` Eli Zaretskii
2021-07-23 20:01 ` Perry E. Metzger
2021-07-24 5:52 ` Eli Zaretskii
2021-07-23 20:22 ` Yuan Fu
2021-07-24 6:00 ` Eli Zaretskii
2021-07-25 18:01 ` Stephen Leake
2021-07-25 19:09 ` Eli Zaretskii
2021-07-26 5:10 ` Stephen Leake
2021-07-26 12:56 ` Eli Zaretskii
2021-07-24 15:04 ` Yuan Fu
2021-07-24 15:48 ` Eli Zaretskii
2021-07-24 17:14 ` Yuan Fu
2021-07-24 17:20 ` Eli Zaretskii
2021-07-24 17:40 ` Yuan Fu
2021-07-24 17:46 ` Eli Zaretskii
2021-07-24 18:06 ` Yuan Fu
2021-07-24 18:21 ` Eli Zaretskii
2021-07-24 18:55 ` Stefan Monnier
2021-07-25 18:44 ` Stephen Leake
2021-07-26 14:38 ` Perry E. Metzger
2021-07-24 16:14 ` Eli Zaretskii
2021-07-24 17:32 ` Yuan Fu
2021-07-24 17:42 ` Eli Zaretskii
2021-07-23 14:07 ` Stefan Monnier
2021-07-23 14:45 ` Yuan Fu
2021-07-23 19:13 ` Eli Zaretskii
2021-07-23 20:28 ` Stefan Monnier
2021-07-24 6:02 ` Eli Zaretskii
2021-07-24 14:19 ` Stefan Monnier
2021-07-24 9:42 ` Stephen Leake
2021-07-24 11:22 ` Eli Zaretskii
2021-07-25 18:21 ` Stephen Leake
2021-07-25 19:03 ` Eli Zaretskii
2021-07-26 16:40 ` Yuan Fu
2021-07-26 16:49 ` Eli Zaretskii
2021-07-26 17:09 ` Yuan Fu
2021-07-26 18:55 ` Eli Zaretskii
2021-07-26 19:06 ` Yuan Fu
2021-07-26 19:19 ` Perry E. Metzger
2021-07-26 19:31 ` Eli Zaretskii
2021-07-26 19:20 ` Eli Zaretskii
2021-07-26 19:45 ` Yuan Fu
2021-07-26 19:57 ` Dmitry Gutov
2021-07-27 6:13 ` Stephen Leake
2021-07-27 14:56 ` Yuan Fu
2021-07-28 3:40 ` Stephen Leake
2021-07-28 16:36 ` Yuan Fu
2021-07-28 16:41 ` Eli Zaretskii
2021-07-29 22:58 ` Stephen Leake
2021-07-30 6:00 ` Eli Zaretskii
2021-07-28 16:43 ` Eli Zaretskii
2021-07-28 17:47 ` Yuan Fu
2021-07-28 17:54 ` Eli Zaretskii
2021-07-28 18:46 ` Yuan Fu
2021-07-28 19:00 ` Eli Zaretskii
2021-07-29 14:35 ` Yuan Fu
2021-07-29 15:28 ` Eli Zaretskii
2021-07-29 15:57 ` Yuan Fu
2021-07-29 16:21 ` Eli Zaretskii
2021-07-29 16:59 ` Yuan Fu
2021-07-29 17:38 ` Eli Zaretskii
2021-07-29 17:55 ` Yuan Fu
2021-07-29 18:37 ` Eli Zaretskii
2021-07-29 18:57 ` Yuan Fu
2021-07-30 6:47 ` Eli Zaretskii
2021-07-30 14:17 ` Yuan Fu
2021-08-03 10:24 ` Fu Yuan
2021-08-03 11:42 ` Eli Zaretskii
2021-08-03 11:53 ` Fu Yuan
2021-08-03 12:21 ` Eli Zaretskii
2021-08-03 12:50 ` Fu Yuan
2021-08-03 13:03 ` Eli Zaretskii
2021-08-03 13:08 ` Fu Yuan
2021-08-03 11:47 ` Eli Zaretskii
2021-08-03 12:00 ` Fu Yuan
2021-08-03 12:24 ` Eli Zaretskii
2021-08-03 13:00 ` Fu Yuan
2021-08-03 13:28 ` Stefan Monnier
2021-08-03 13:34 ` Eli Zaretskii
2021-08-06 3:22 ` Yuan Fu
2021-08-06 6:37 ` Eli Zaretskii
2021-08-07 5:31 ` Tree-sitter api (Was: Re: How to add pseudo vector types) Fu Yuan
2021-08-07 6:26 ` Eli Zaretskii
2021-08-07 15:47 ` Tree-sitter api Stefan Monnier
2021-08-07 18:40 ` Theodor Thornhill
2021-08-07 19:53 ` Stefan Monnier
2021-08-17 6:18 ` Yuan Fu
2021-08-18 18:27 ` Stephen Leake
2021-08-18 21:30 ` Yuan Fu
2021-08-20 0:12 ` [SPAM UNSURE] " Stephen Leake
2021-08-23 6:51 ` Yuan Fu
2021-08-24 14:59 ` [SPAM UNSURE] " Stephen Leake
2021-08-27 5:18 ` [SPAM UNSURE] " Yuan Fu
2021-08-31 0:48 ` Stephen Leake
2021-08-24 22:51 ` Stefan Monnier
2021-08-22 2:43 ` Yuan Fu
2021-08-22 3:46 ` Yuan Fu
2021-08-22 6:16 ` Eli Zaretskii
2021-08-22 6:15 ` Eli Zaretskii
2021-08-25 0:21 ` Stefan Monnier
2021-08-27 5:45 ` Yuan Fu
2021-09-03 19:16 ` Theodor Thornhill
[not found] ` <AF64EB2C-CCEC-4C98-8FE3-37697BEC9098@gmail.com>
2021-09-04 12:49 ` Tuấn-Anh Nguyễn
2021-09-04 13:04 ` Eli Zaretskii
2021-09-04 14:49 ` Tuấn-Anh Nguyễn
2021-09-04 15:00 ` Eli Zaretskii
2021-09-05 16:34 ` Tuấn-Anh Nguyễn
2021-09-05 16:45 ` Eli Zaretskii
2021-09-04 15:31 ` Yuan Fu
2021-09-05 16:45 ` Tuấn-Anh Nguyễn
2021-09-05 20:19 ` Yuan Fu
2021-09-06 0:03 ` Tuấn-Anh Nguyễn
2021-09-06 0:23 ` Yuan Fu
2021-09-06 5:33 ` Eli Zaretskii
2021-09-07 15:38 ` Tuấn-Anh Nguyễn
2021-09-07 16:16 ` Eli Zaretskii
2021-09-08 3:06 ` Yuan Fu
2021-09-10 2:06 ` Yuan Fu
2021-09-10 6:32 ` Eli Zaretskii
2021-09-10 19:57 ` Yuan Fu
2021-09-11 3:41 ` Tuấn-Anh Nguyễn
2021-09-11 4:11 ` Yuan Fu
2021-09-11 7:23 ` Tuấn-Anh Nguyễn
2021-09-11 19:02 ` Yuan Fu
2021-09-11 5:51 ` Eli Zaretskii
2021-09-11 19:00 ` Yuan Fu
2021-09-11 19:14 ` Eli Zaretskii
2021-09-11 19:17 ` Eli Zaretskii
2021-09-11 20:29 ` Yuan Fu
2021-09-12 5:39 ` Eli Zaretskii
2021-09-13 4:15 ` Yuan Fu
2021-09-13 11:47 ` Eli Zaretskii
2021-09-13 18:01 ` Yuan Fu
2021-09-13 18:07 ` Eli Zaretskii
2021-09-13 18:29 ` Yuan Fu
2021-09-13 18:37 ` Eli Zaretskii
2021-09-14 0:13 ` Yuan Fu
2021-09-14 2:29 ` Eli Zaretskii
2021-09-14 4:27 ` Yuan Fu
2021-09-14 11:29 ` Eli Zaretskii
2021-09-15 0:50 ` Yuan Fu
2021-09-15 6:15 ` Eli Zaretskii
2021-09-15 15:56 ` Yuan Fu
2021-09-15 16:02 ` Eli Zaretskii
2021-09-15 18:19 ` Stefan Monnier
2021-09-15 18:48 ` Eli Zaretskii
2021-09-16 21:46 ` Yuan Fu
2021-09-17 6:06 ` Eli Zaretskii
2021-09-17 6:56 ` Yuan Fu
2021-09-17 7:38 ` Eli Zaretskii
2021-09-17 20:30 ` Yuan Fu
2021-09-18 2:22 ` Tuấn-Anh Nguyễn
2021-09-18 6:38 ` Yuan Fu
2021-09-18 12:33 ` Stephen Leake
2021-09-20 16:48 ` Yuan Fu
2021-09-20 18:48 ` Eli Zaretskii
2021-09-20 19:09 ` John Yates
2021-09-21 22:20 ` Yuan Fu
2021-09-27 4:42 ` Yuan Fu
2021-09-27 5:37 ` Eli Zaretskii
2021-09-27 19:17 ` Stefan Monnier
2021-09-28 5:33 ` Yuan Fu
2021-09-28 7:02 ` Eli Zaretskii
2021-09-28 16:10 ` Yuan Fu
2021-09-28 16:28 ` Eli Zaretskii
2021-12-13 6:54 ` Yuan Fu
2021-12-13 12:56 ` Eli Zaretskii
2021-12-14 7:19 ` Yuan Fu
2021-12-17 0:14 ` Yuan Fu
2021-12-17 7:15 ` Eli Zaretskii
2021-12-18 14:45 ` Philipp
2021-12-18 14:57 ` Eli Zaretskii
2021-12-19 2:51 ` Yuan Fu
2021-12-19 7:11 ` Eli Zaretskii
2021-12-19 7:52 ` Yuan Fu
2021-12-24 10:04 ` Yoav Marco
2021-12-24 10:21 ` Yoav Marco
2021-12-25 8:31 ` Yuan Fu
2021-12-25 10:13 ` Eli Zaretskii
2021-12-26 9:50 ` Yuan Fu
2021-12-26 10:23 ` Eli Zaretskii
2021-12-30 0:59 ` Yuan Fu
2021-12-30 6:35 ` Eli Zaretskii
2022-01-04 18:31 ` Yuan Fu
2022-03-13 6:22 ` Yuan Fu
2022-03-13 6:25 ` Yuan Fu
2022-03-13 7:13 ` Po Lu
2022-03-14 0:23 ` Yuan Fu
2022-03-14 1:10 ` Po Lu
2022-03-14 3:31 ` Eli Zaretskii
2022-03-14 3:43 ` Yuan Fu
2022-03-29 16:40 ` Eli Zaretskii
2022-03-30 0:35 ` Po Lu
2022-03-30 0:49 ` Yuan Fu
2022-03-30 0:51 ` Yuan Fu
2022-03-30 2:13 ` Po Lu
2022-03-30 3:01 ` Yuan Fu
2022-03-30 3:10 ` Vitaly Ankh
2022-03-30 3:24 ` Yuan Fu
2022-03-30 3:39 ` Po Lu
2022-03-30 4:29 ` Yuan Fu
2022-03-30 5:19 ` Phil Sainty
2022-03-30 5:39 ` Phil Sainty
2022-03-30 13:46 ` João Távora
2022-03-30 2:31 ` Eli Zaretskii
2022-03-30 2:59 ` Yuan Fu
2022-03-30 9:04 ` Lars Ingebrigtsen
2022-03-30 11:48 ` Daniel Martín
2022-03-30 15:00 ` [External] : " Drew Adams
2022-03-31 4:27 ` Richard Stallman
2022-03-31 5:36 ` Eli Zaretskii
2022-03-31 11:13 ` Lars Ingebrigtsen
2022-03-31 12:46 ` John Yates
2022-03-31 17:37 ` Phil Sainty
2022-04-01 1:56 ` Po Lu
2022-04-01 6:36 ` Eli Zaretskii
2022-04-01 7:56 ` Po Lu
2022-04-01 10:45 ` Eli Zaretskii
2022-03-31 17:58 ` Stefan Monnier
2022-04-04 10:29 ` Jostein Kjønigsen
2022-03-31 16:23 ` [External] : " Drew Adams
2022-03-31 19:33 ` Filipp Gunbin
2022-03-31 16:35 ` Yuan Fu
2022-03-31 23:00 ` Yuan Fu
2022-03-31 23:53 ` Yuan Fu
2022-04-01 6:20 ` Eli Zaretskii
2022-04-01 16:48 ` Yuan Fu
2022-04-01 17:59 ` Eli Zaretskii
2022-04-02 6:26 ` Yuan Fu
2022-04-04 7:38 ` Robert Pluim
2022-04-04 20:41 ` Yuan Fu
2022-04-20 20:14 ` Theodor Thornhill
2022-04-21 1:36 ` Yuan Fu
2022-04-21 5:48 ` Eli Zaretskii
2022-04-21 11:37 ` Lars Ingebrigtsen
2022-04-21 12:10 ` Theodor Thornhill
2022-04-22 2:54 ` Yuan Fu
2022-04-22 4:58 ` Theodor Thornhill
2022-04-22 7:08 ` Yuan Fu
2022-04-22 8:02 ` Theodor Thornhill
2022-04-22 11:41 ` Theodor Thornhill
2021-12-18 13:39 ` Daniel Martín
2021-12-19 2:48 ` Yuan Fu
2021-09-17 12:11 ` Tuấn-Anh Nguyễn
2021-09-17 13:14 ` Stefan Monnier
2021-09-17 13:39 ` Tuấn-Anh Nguyễn
2021-09-17 17:18 ` Stefan Monnier
2021-09-18 2:16 ` Tuấn-Anh Nguyễn
2021-09-17 12:23 ` Stefan Monnier
2021-09-17 13:03 ` Tuấn-Anh Nguyễn
2021-09-04 15:14 ` Tuấn-Anh Nguyễn
2021-09-04 15:33 ` Eli Zaretskii
2021-09-05 16:48 ` Tuấn-Anh Nguyễn
2021-09-04 15:39 ` Yuan Fu
2021-09-05 21:15 ` Theodor Thornhill
2021-09-05 23:58 ` Yuan Fu
2021-08-08 22:56 ` Yuan Fu
2021-08-08 23:24 ` Stefan Monnier
2021-08-09 0:06 ` Yuan Fu
2021-07-29 23:06 ` How to add pseudo vector types Stephen Leake
2021-07-30 0:35 ` Richard Stallman
2021-07-30 0:46 ` Alexandre Garreau
2021-07-30 6:35 ` Eli Zaretskii
2021-07-29 23:01 ` Stephen Leake
2021-07-26 18:32 ` chad
2021-07-26 18:44 ` Perry E. Metzger
2021-07-26 19:13 ` Eli Zaretskii
2021-07-26 19:09 ` Eli Zaretskii
2021-07-26 19:48 ` chad
2021-07-26 20:05 ` Óscar Fuentes
2021-07-26 21:30 ` Clément Pit-Claudel
2021-07-26 21:46 ` Óscar Fuentes
2021-07-27 14:02 ` Eli Zaretskii
2021-07-27 13:59 ` Eli Zaretskii
2021-07-26 23:40 ` Ergus
2021-07-27 14:49 ` Yuan Fu
2021-07-27 16:50 ` Ergus
2021-07-27 16:59 ` Eli Zaretskii
2021-07-28 3:45 ` Stephen Leake
2021-07-24 9:33 ` Stephen Leake
2021-07-24 22:54 ` Dmitry Gutov
2021-07-20 16:32 ` Stephen Leake
2021-07-20 16:48 ` Eli Zaretskii
2021-07-20 17:38 ` Stefan Monnier
2021-07-20 17:36 ` Stefan Monnier
2021-07-20 18:05 ` Clément Pit-Claudel
2021-07-21 16:02 ` Stephen Leake
2021-07-21 17:16 ` Stefan Monnier
2021-07-20 18:04 ` Clément Pit-Claudel
2021-07-20 18:24 ` Eli Zaretskii
2021-07-21 16:54 ` [SPAM UNSURE] " Stephen Leake
2021-07-21 17:12 ` Clément Pit-Claudel
2021-07-21 19:49 ` Eli Zaretskii
2021-07-22 5:09 ` Clément Pit-Claudel
2021-07-22 6:44 ` Eli Zaretskii
2021-07-22 14:43 ` Clément Pit-Claudel
2021-07-17 6:56 ` Eli Zaretskii
2021-07-20 16:28 ` Stephen Leake
2021-07-20 16:27 ` Stephen Leake
2021-07-20 16:25 ` Stephen Leake
2021-07-20 16:45 ` Eli Zaretskii
2021-07-21 15:49 ` Stephen Leake
2021-07-21 19:37 ` Eli Zaretskii
2021-07-24 2:00 ` Stephen Leake
2021-07-24 6:51 ` Eli Zaretskii
2021-07-25 16:16 ` Stephen Leake
[not found] <casouri/emacs/issues/5@github.com>
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.