From: "Augustin Chéneau (BTuin)" <btuin@mailo.com>
To: Yuan Fu <casouri@gmail.com>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: Questions about tree-sitter
Date: Sat, 9 Sep 2023 18:39:45 +0200 [thread overview]
Message-ID: <a0e63d70-14bf-498a-8e06-510a64ab7911@mailo.com> (raw)
In-Reply-To: <1F227B69-6195-4115-A7B6-BD2F7EA08E1F@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1657 bytes --]
Le 08/09/2023 à 18:43, Yuan Fu a écrit :
>
>
>> On Sep 8, 2023, at 4:53 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>
>> Le 06/09/2023 à 06:07, Yuan Fu a écrit :
>>> I added local parser support to master. If everything goes right, you just need to add a :local t flag in treesit-range-rules. Check out the modified bision-ts-mode.el that I hacked up for an example. BTW, it’s vital that you define treesit-language-at-point-function for a multi-language mode.
>>> Yuan
>>
>> Thanks a lot!
>>
>> I did some tests and it's working pretty well.
>
> Awesome!
>
It seems I spoke a bit too soon :(
When I edit the buffer, sometimes there is an offset between the text
and the nodes after modifying the buffer, or the syntax highlighting
breaks in C code.
I attached an example Bison file if needed.
>
>> I have a few issues though:
>>
>> - I first defined `treesit-language-at-point-function` using
>> `treesit-node-at`. However, `treesit-node-at` itself uses
>> `treesit-language-at-point-function` which causes an infinite recursion.
>> So I instead used `treesit-local-parsers-at` to check if a local parser is used. Is it a good solution?
>
> No no, you should use the host langauge’s parser (bison) and see if point is in an undelimited_code_block, and return c or bison accordingly. I’m highlight this in the docstring, thanks.
So I need to call `treesit-node-at` with `'bison` as the value for
PARSER-OR-LANG to see in which node I am?
Then I think there is a problem with `treesit-node-at`, because it
always call `treesit-language-at` even if PARSER-OR-LANG is provided.
I propose a fix in the attached patch.
[-- Attachment #2: 0001-Do-not-always-call-treesit-language-at-in-treesit-no.patch --]
[-- Type: text/x-patch, Size: 1512 bytes --]
From dda0b7a9cd5f8b325b401aa7ba44c6fbe103fb6a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Augustin=20Ch=C3=A9neau?= <btuin@mailo.com>
Date: Sat, 9 Sep 2023 15:35:49 +0200
Subject: [PATCH] Do not always call `treesit-language-at` in
`treesit-node-at`
---
lisp/treesit.el | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 1711446b40b..3d1ceda6d06 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -190,15 +190,14 @@ treesit-node-at
is nil, try to guess the language at POS using `treesit-language-at'.
If there's a local parser at POS, try to use that parser first."
- (let* ((lang-at-point (treesit-language-at pos))
- (root (if (treesit-parser-p parser-or-lang)
+ (let* ((root (if (treesit-parser-p parser-or-lang)
(treesit-parser-root-node parser-or-lang)
(or (when-let ((parser (car (treesit-local-parsers-at
pos (or parser-or-lang
- lang-at-point)))))
+ (treesit-language-at pos))))))
(treesit-parser-root-node parser))
(treesit-buffer-root-node
- (or parser-or-lang lang-at-point)))))
+ (or parser-or-lang (treesit-language-at pos))))))
(node root)
(node-before root)
(pos-1 (max (1- pos) (point-min)))
--
2.42.0
[-- Attachment #3: bison-example.y --]
[-- Type: text/plain, Size: 8044 bytes --]
/* -*- C -*-
Copyright (C) 2020-2022 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
*/
/* Simplified C++ Type and Expression Grammar.
Written by Paul Hilfinger for Bison's test suite. */
%define api.pure
%header
%define api.header.include {"c++-types.h"}
%locations
%debug
/* Nice error messages with details. */
%define parse.error detailed
%code requires
{
union node {
struct {
int is_nterm;
int parents;
} node_info;
struct {
int is_nterm; /* 1 */
int parents;
char const *form;
union node *children[3];
} nterm;
struct {
int is_nterm; /* 0 */
int parents;
char *text;
} term;
};
typedef union node node_t;
}
%define api.value.type union
%code
{
/* Portability issues for strdup. */
#ifndef _XOPEN_SOURCE
# define _XOPEN_SOURCE 600
#endif
#include <assert.h>
#include <ctype.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static node_t *new_nterm (char const *, node_t *, node_t *, node_t *);
static node_t *new_term (char *);
static void free_node (node_t *);
static char *node_to_string (const node_t *);
static void node_print (FILE *, const node_t *);
static node_t *stmt_merge (YYSTYPE x0, YYSTYPE x1);
static void yyerror (YYLTYPE const * const loc, const char *msg);
static yytoken_kind_t yylex (YYSTYPE *lval, YYLTYPE *lloc);
}
%expect-rr 1
%token
TYPENAME "typename"
ID "identifier"
%right '='
%left '+'
%glr-parser
%type <node_t *> stmt expr decl declarator TYPENAME ID
%destructor { free_node ($$); } <node_t *>
%printer { node_print (yyo, $$); } <node_t *>
%%
prog : %empty
| prog stmt {
YYLOCATION_PRINT (stdout, &@2);
fputs (": ", stdout);
node_print (stdout, $2);
putc ('\n', stdout);
fflush (stdout);
free_node ($2);
}
;
stmt : expr ';' %merge <stmt_merge> { $$ = $1; }
| decl %merge <stmt_merge>
| error ';' { $$ = new_nterm ("<error>", NULL, NULL, NULL); }
;
expr : ID
| TYPENAME '(' expr ')'
{ $$ = new_nterm ("<cast>(%s, %s)", $3, $1, NULL); }
| expr '+' expr { $$ = new_nterm ("+(%s, %s)", $1, $3, NULL); }
| expr '=' expr { $$ = new_nterm ("=(%s, %s)", $1, $3, NULL); }
;
decl : TYPENAME declarator ';'
{ $$ = new_nterm ("<declare>(%s, %s)", $1, $2, NULL); }
| TYPENAME declarator '=' expr ';'
{ $$ = new_nterm ("<init-declare>(%s, %s, %s)", $1,
$2, $4); }
;
declarator
: ID
| '(' declarator ')' { $$ = $2; }
;
%%
/* A C error reporting function. */
static void
yyerror (YYLTYPE const * const loc, const char *msg)
{
YYLOCATION_PRINT (stderr, loc);
fprintf (stderr, ": %s\n", msg);
}
/* The input file. */
FILE * input = NULL;
yytoken_kind_t
yylex (YYSTYPE *lval, YYLTYPE *lloc)
{
static int line_num = 1;
static int col_num = 0;
while (1)
{
int c;
assert (!feof (input));
c = getc (input);
switch (c)
{
case EOF:
return 0;
case '\t':
col_num = (col_num + 7) & ~7;
break;
case ' ': case '\f':
col_num += 1;
break;
case '\n':
line_num += 1;
col_num = 0;
break;
default:
{
yytoken_kind_t tok;
lloc->first_line = lloc->last_line = line_num;
lloc->first_column = col_num;
if (isalpha (c))
{
char buffer[256];
unsigned i = 0;
do
{
buffer[i++] = (char) c;
col_num += 1;
assert (i != sizeof buffer - 1);
c = getc (input);
}
while (isalnum (c) || c == '_');
ungetc (c, input);
buffer[i++] = 0;
if (isupper ((unsigned char) buffer[0]))
{
tok = TYPENAME;
lval->TYPENAME = new_term (strdup (buffer));
}
else
{
tok = ID;
lval->ID = new_term (strdup (buffer));
}
}
else
{
col_num += 1;
tok = c;
}
lloc->last_column = col_num;
return tok;
}
}
}
}
static node_t *
new_nterm (char const *form, node_t *child0, node_t *child1, node_t *child2)
{
node_t *res = malloc (sizeof *res);
res->nterm.is_nterm = 1;
res->nterm.parents = 0;
res->nterm.form = form;
res->nterm.children[0] = child0;
if (child0)
child0->node_info.parents += 1;
res->nterm.children[1] = child1;
if (child1)
child1->node_info.parents += 1;
res->nterm.children[2] = child2;
if (child2)
child2->node_info.parents += 1;
return res;
}
static node_t *
new_term (char *text)
{
node_t *res = malloc (sizeof *res);
res->term.is_nterm = 0;
res->term.parents = 0;
res->term.text = text;
return res;
}
static void
free_node (node_t *node)
{
if (!node)
return;
node->node_info.parents -= 1;
/* Free only if 0 (last parent) or -1 (no parents). */
if (node->node_info.parents > 0)
return;
if (node->node_info.is_nterm == 1)
{
free_node (node->nterm.children[0]);
free_node (node->nterm.children[1]);
free_node (node->nterm.children[2]);
}
else
free (node->term.text);
free (node);
}
static char *
node_to_string (const node_t *node)
{
char *res;
if (!node)
res = strdup ("");
else if (node->node_info.is_nterm)
{
char *child0 = node_to_string (node->nterm.children[0]);
char *child1 = node_to_string (node->nterm.children[1]);
char *child2 = node_to_string (node->nterm.children[2]);
res = malloc (strlen (node->nterm.form) + strlen (child0)
+ strlen (child1) + strlen (child2) + 1);
sprintf (res, node->nterm.form, child0, child1, child2);
free (child2);
free (child1);
free (child0);
}
else
res = strdup (node->term.text);
return res;
}
static void
node_print (FILE *out, const node_t *n)
{
char *str = node_to_string (n);
fputs (str, out);
free (str);
}
static node_t *
stmt_merge (YYSTYPE x0, YYSTYPE x1)
{
return new_nterm ("<OR>(%s, %s)", x0.stmt, x1.stmt, NULL);
}
static int
process (const char *file)
{
int is_stdin = !file || strcmp (file, "-") == 0;
if (is_stdin)
input = stdin;
else
input = fopen (file, "r");
assert (input);
int status = yyparse ();
if (!is_stdin)
fclose (input);
return status;
}
int
main (int argc, char **argv)
{
if (getenv ("YYDEBUG"))
yydebug = 1;
int ran = 0;
for (int i = 1; i < argc; ++i)
// Enable parse traces on option -p.
if (strcmp (argv[i], "-p") == 0)
yydebug = 1;
else
{
int status = process (argv[i]);
ran = 1;
if (!status)
return status;
}
if (!ran)
{
int status = process (NULL);
if (!status)
return status;
}
return 0;
}
next prev parent reply other threads:[~2023-09-09 16:39 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-29 21:26 Questions about tree-sitter Augustin Chéneau (BTuin)
2023-08-30 7:03 ` Yuan Fu
2023-08-30 11:28 ` Augustin Chéneau (BTuin)
2023-09-06 4:07 ` Yuan Fu
2023-09-08 11:53 ` Augustin Chéneau (BTuin)
2023-09-08 16:43 ` Yuan Fu
2023-09-09 16:39 ` Augustin Chéneau (BTuin) [this message]
2023-09-12 0:22 ` Yuan Fu
2023-09-13 12:43 ` Augustin Chéneau (BTuin)
2023-09-14 4:11 ` Yuan Fu
2023-09-18 17:04 ` Augustin Chéneau (BTuin)
2023-09-19 4:00 ` Yuan Fu
2023-09-01 2:39 ` Madhu
2023-09-01 6:53 ` Eli Zaretskii
2023-09-01 9:15 ` Madhu
2023-09-01 10:45 ` Dmitry Gutov
2023-09-01 10:58 ` Eli Zaretskii
2023-11-27 7:16 ` Madhu
2023-09-06 16:11 ` Lynn Winebarger
2023-09-07 23:42 ` Yuan Fu
2023-09-08 0:11 ` Lynn Winebarger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a0e63d70-14bf-498a-8e06-510a64ab7911@mailo.com \
--to=btuin@mailo.com \
--cc=casouri@gmail.com \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).