all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Augustin Chéneau (BTuin)" <btuin@mailo.com>
To: Yuan Fu <casouri@gmail.com>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: Questions about tree-sitter
Date: Sat, 9 Sep 2023 18:39:45 +0200	[thread overview]
Message-ID: <a0e63d70-14bf-498a-8e06-510a64ab7911@mailo.com> (raw)
In-Reply-To: <1F227B69-6195-4115-A7B6-BD2F7EA08E1F@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1657 bytes --]

Le 08/09/2023 à 18:43, Yuan Fu a écrit :
> 
> 
>> On Sep 8, 2023, at 4:53 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>
>> Le 06/09/2023 à 06:07, Yuan Fu a écrit :
>>> I added local parser support to master. If everything goes right, you just need to add a :local t flag in treesit-range-rules. Check out the modified bision-ts-mode.el that I hacked up for an example. BTW, it’s vital that you define treesit-language-at-point-function for a multi-language mode.
>>> Yuan
>>
>> Thanks a lot!
>>
>> I did some tests and it's working pretty well.
> 
> Awesome!
> 


It seems I spoke a bit too soon  :(
When I edit the buffer, sometimes there is an offset between the text 
and the nodes after modifying the buffer, or the syntax highlighting 
breaks in C code.

I attached an example Bison file if needed.

> 
>> I have a few issues though:
>>
>> - I first defined `treesit-language-at-point-function` using
>> `treesit-node-at`.  However, `treesit-node-at` itself uses
>> `treesit-language-at-point-function` which causes an infinite recursion.
>> So I instead used `treesit-local-parsers-at` to check if a local parser is used.  Is it a good solution?
> 
> No no, you should use the host langauge’s parser (bison) and see if point is in an undelimited_code_block, and return c or bison accordingly. I’m highlight this in the docstring, thanks.

So I need to call `treesit-node-at` with `'bison` as the value for 
PARSER-OR-LANG to see in which node I am?
Then I think there is a problem with `treesit-node-at`, because it 
always call `treesit-language-at` even if PARSER-OR-LANG is provided.
I propose a fix in the attached patch.



[-- Attachment #2: 0001-Do-not-always-call-treesit-language-at-in-treesit-no.patch --]
[-- Type: text/x-patch, Size: 1512 bytes --]

From dda0b7a9cd5f8b325b401aa7ba44c6fbe103fb6a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Augustin=20Ch=C3=A9neau?= <btuin@mailo.com>
Date: Sat, 9 Sep 2023 15:35:49 +0200
Subject: [PATCH] Do not always call `treesit-language-at` in 
 `treesit-node-at`

---
 lisp/treesit.el | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/lisp/treesit.el b/lisp/treesit.el
index 1711446b40b..3d1ceda6d06 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -190,15 +190,14 @@ treesit-node-at
 is nil, try to guess the language at POS using `treesit-language-at'.
 
 If there's a local parser at POS, try to use that parser first."
-  (let* ((lang-at-point (treesit-language-at pos))
-         (root (if (treesit-parser-p parser-or-lang)
+  (let* ((root (if (treesit-parser-p parser-or-lang)
                    (treesit-parser-root-node parser-or-lang)
                  (or (when-let ((parser (car (treesit-local-parsers-at
                                               pos (or parser-or-lang
-                                                      lang-at-point)))))
+                                                      (treesit-language-at pos))))))
                        (treesit-parser-root-node parser))
                      (treesit-buffer-root-node
-                      (or parser-or-lang lang-at-point)))))
+                      (or parser-or-lang (treesit-language-at pos))))))
          (node root)
          (node-before root)
          (pos-1 (max (1- pos) (point-min)))
-- 
2.42.0


[-- Attachment #3: bison-example.y --]
[-- Type: text/plain, Size: 8044 bytes --]

/*                                                       -*- C -*-
  Copyright (C) 2020-2022 Free Software Foundation, Inc.

  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see <https://www.gnu.org/licenses/>.
*/

/* Simplified C++ Type and Expression Grammar.
   Written by Paul Hilfinger for Bison's test suite.  */

%define api.pure
%header
%define api.header.include {"c++-types.h"}
%locations
%debug

/* Nice error messages with details. */
%define parse.error detailed

%code requires
{
  union node {
    struct {
      int is_nterm;
      int parents;
    } node_info;
    struct {
      int is_nterm; /* 1 */
      int parents;
      char const *form;
      union node *children[3];
    } nterm;
    struct {
      int is_nterm; /* 0 */
      int parents;
      char *text;
    } term;
  };
  typedef union node node_t;
}

%define api.value.type union

%code
{
  /* Portability issues for strdup. */
#ifndef _XOPEN_SOURCE
# define _XOPEN_SOURCE 600
#endif

#include <assert.h>
#include <ctype.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

  static node_t *new_nterm (char const *, node_t *, node_t *, node_t *);
  static node_t *new_term (char *);
  static void free_node (node_t *);
  static char *node_to_string (const node_t *);
  static void node_print (FILE *, const node_t *);
  static node_t *stmt_merge (YYSTYPE x0, YYSTYPE x1);

  static void yyerror (YYLTYPE const * const loc, const char *msg);
  static yytoken_kind_t yylex (YYSTYPE *lval, YYLTYPE *lloc);
}

%expect-rr 1

%token
  TYPENAME "typename"
  ID "identifier"

%right '='
%left '+'

%glr-parser

%type <node_t *> stmt expr decl declarator TYPENAME ID
%destructor { free_node ($$); } <node_t *>
%printer { node_print (yyo, $$); } <node_t *>

%%

prog : %empty
     | prog stmt   {
                     YYLOCATION_PRINT (stdout, &@2);
                     fputs (": ", stdout);
                     node_print (stdout, $2);
                     putc ('\n', stdout);
                     fflush (stdout);
                     free_node ($2);
                   }
     ;

stmt : expr ';'  %merge <stmt_merge>     { $$ = $1; }
     | decl      %merge <stmt_merge>
     | error ';'        { $$ = new_nterm ("<error>", NULL, NULL, NULL); }
     ;

expr : ID
     | TYPENAME '(' expr ')'
                        { $$ = new_nterm ("<cast>(%s, %s)", $3, $1, NULL); }
     | expr '+' expr    { $$ = new_nterm ("+(%s, %s)", $1, $3, NULL); }
     | expr '=' expr    { $$ = new_nterm ("=(%s, %s)", $1, $3, NULL); }
     ;

decl : TYPENAME declarator ';'
                        { $$ = new_nterm ("<declare>(%s, %s)", $1, $2, NULL); }
     | TYPENAME declarator '=' expr ';'
                        { $$ = new_nterm ("<init-declare>(%s, %s, %s)", $1,
                                          $2, $4); }
     ;

declarator
     : ID
     | '(' declarator ')' { $$ = $2; }
     ;

%%

/* A C error reporting function.  */
static void
yyerror (YYLTYPE const * const loc, const char *msg)
{
  YYLOCATION_PRINT (stderr, loc);
  fprintf (stderr, ": %s\n", msg);
}

/* The input file. */
FILE * input = NULL;

yytoken_kind_t
yylex (YYSTYPE *lval, YYLTYPE *lloc)
{
  static int line_num = 1;
  static int col_num = 0;

  while (1)
    {
      int c;
      assert (!feof (input));
      c = getc (input);
      switch (c)
        {
        case EOF:
          return 0;
        case '\t':
          col_num = (col_num + 7) & ~7;
          break;
        case ' ': case '\f':
          col_num += 1;
          break;
        case '\n':
          line_num += 1;
          col_num = 0;
          break;
        default:
          {
            yytoken_kind_t tok;
            lloc->first_line = lloc->last_line = line_num;
            lloc->first_column = col_num;
            if (isalpha (c))
              {
                char buffer[256];
                unsigned i = 0;

                do
                  {
                    buffer[i++] = (char) c;
                    col_num += 1;
                    assert (i != sizeof buffer - 1);
                    c = getc (input);
                  }
                while (isalnum (c) || c == '_');

                ungetc (c, input);
                buffer[i++] = 0;
                if (isupper ((unsigned char) buffer[0]))
                  {
                    tok = TYPENAME;
                    lval->TYPENAME = new_term (strdup (buffer));
                  }
                else
                  {
                    tok = ID;
                    lval->ID = new_term (strdup (buffer));
                  }
              }
            else
              {
                col_num += 1;
                tok = c;
              }
            lloc->last_column = col_num;
            return tok;
          }
        }
    }
}

static node_t *
new_nterm (char const *form, node_t *child0, node_t *child1, node_t *child2)
{
  node_t *res = malloc (sizeof *res);
  res->nterm.is_nterm = 1;
  res->nterm.parents = 0;
  res->nterm.form = form;
  res->nterm.children[0] = child0;
  if (child0)
    child0->node_info.parents += 1;
  res->nterm.children[1] = child1;
  if (child1)
    child1->node_info.parents += 1;
  res->nterm.children[2] = child2;
  if (child2)
    child2->node_info.parents += 1;
  return res;
}

static node_t *
new_term (char *text)
{
  node_t *res = malloc (sizeof *res);
  res->term.is_nterm = 0;
  res->term.parents = 0;
  res->term.text = text;
  return res;
}

static void
free_node (node_t *node)
{
  if (!node)
    return;
  node->node_info.parents -= 1;
  /* Free only if 0 (last parent) or -1 (no parents).  */
  if (node->node_info.parents > 0)
    return;
  if (node->node_info.is_nterm == 1)
    {
      free_node (node->nterm.children[0]);
      free_node (node->nterm.children[1]);
      free_node (node->nterm.children[2]);
    }
  else
    free (node->term.text);
  free (node);
}

static char *
node_to_string (const node_t *node)
{
  char *res;
  if (!node)
    res = strdup ("");
  else if (node->node_info.is_nterm)
    {
      char *child0 = node_to_string (node->nterm.children[0]);
      char *child1 = node_to_string (node->nterm.children[1]);
      char *child2 = node_to_string (node->nterm.children[2]);
      res = malloc (strlen (node->nterm.form) + strlen (child0)
                    + strlen (child1) + strlen (child2) + 1);
      sprintf (res, node->nterm.form, child0, child1, child2);
      free (child2);
      free (child1);
      free (child0);
    }
  else
    res = strdup (node->term.text);
  return res;
}

static void
node_print (FILE *out, const node_t *n)
{
  char *str = node_to_string (n);
  fputs (str, out);
  free (str);
}


static node_t *
stmt_merge (YYSTYPE x0, YYSTYPE x1)
{
  return new_nterm ("<OR>(%s, %s)", x0.stmt, x1.stmt, NULL);
}

static int
process (const char *file)
{
  int is_stdin = !file || strcmp (file, "-") == 0;
  if (is_stdin)
    input = stdin;
  else
    input = fopen (file, "r");
  assert (input);
  int status = yyparse ();
  if (!is_stdin)
    fclose (input);
  return status;
}

int
main (int argc, char **argv)
{
  if (getenv ("YYDEBUG"))
    yydebug = 1;

  int ran = 0;
  for (int i = 1; i < argc; ++i)
    // Enable parse traces on option -p.
    if (strcmp (argv[i], "-p") == 0)
      yydebug = 1;
    else
      {
        int status = process (argv[i]);
        ran = 1;
        if (!status)
          return status;
      }

  if (!ran)
    {
      int status = process (NULL);
      if (!status)
        return status;
    }
  return 0;
}


  reply	other threads:[~2023-09-09 16:39 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-29 21:26 Questions about tree-sitter Augustin Chéneau (BTuin)
2023-08-30  7:03 ` Yuan Fu
2023-08-30 11:28   ` Augustin Chéneau (BTuin)
2023-09-06  4:07     ` Yuan Fu
2023-09-08 11:53       ` Augustin Chéneau (BTuin)
2023-09-08 16:43         ` Yuan Fu
2023-09-09 16:39           ` Augustin Chéneau (BTuin) [this message]
2023-09-12  0:22             ` Yuan Fu
2023-09-13 12:43               ` Augustin Chéneau (BTuin)
2023-09-14  4:11                 ` Yuan Fu
2023-09-18 17:04                   ` Augustin Chéneau (BTuin)
2023-09-19  4:00                     ` Yuan Fu
2023-09-01  2:39   ` Madhu
2023-09-01  6:53     ` Eli Zaretskii
2023-09-01  9:15       ` Madhu
2023-09-01 10:45         ` Dmitry Gutov
2023-09-01 10:58         ` Eli Zaretskii
2023-11-27  7:16           ` Madhu
2023-09-06 16:11   ` Lynn Winebarger
2023-09-07 23:42     ` Yuan Fu
2023-09-08  0:11       ` Lynn Winebarger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a0e63d70-14bf-498a-8e06-510a64ab7911@mailo.com \
    --to=btuin@mailo.com \
    --cc=casouri@gmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.