unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Treesitter injection support
@ 2025-01-02 14:48 Pranshu Sharma via Emacs development discussions.
  2025-01-04  8:21 ` Yuan Fu
  0 siblings, 1 reply; 6+ messages in thread
From: Pranshu Sharma via Emacs development discussions. @ 2025-01-02 14:48 UTC (permalink / raw)
  To: emacs-devel; +Cc: casouri


I'm making cperl clone using treesitter, and have done all of
highlighting apart from regex and pod.

For regexp, I need different grammer to highlight it, and using the
treesit-parser-set-included-ranges doesn't work.  An example:

preq knowledge:

's/bi?g/small/' replaces instances of 'bg' and 'big' with 'small', and
's/([0-9]+)/$1 + 1/e' incrimental all number (the 'e' at the end tells
perl to evaluate the code).

the parse tree of 's/([0-9]+)/$1 + 1/e' is:
(substitution_regexp operator: s '
     content: (regexp_content not-interpolated not-interpolated) '
     (replacement
      (scalar $ (varname)))
     ' modifiers: (substitution_regexp_modifiers))

(replacement) needs to be conditionally parsed as perl over here because
of the 'e' modifier.  Now I cannot use range for this, because say if I
had:

's/(([0-9]+),)+/s#([0-9]+)#$1 + 1#e/e;'
                           ^^^^^^          Perl code
                ^^^^^^^^^^^^^^^^^^^        Perl code
                             

The replacement contains another replacment which contains perl code, so
it overlaps

So I won't have any way to highlight.  It seems making this work could
be possible using nested parsers with their own setting each using own
local treesit-range-settings, but this seems really hard with
treesit-range-settings being a buffer local variable.

-- 
Pranshu Sharma <https://p.bauherren.ovh>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Treesitter injection support
  2025-01-02 14:48 Treesitter injection support Pranshu Sharma via Emacs development discussions.
@ 2025-01-04  8:21 ` Yuan Fu
  2025-01-04 16:33   ` Pranshu Sharma via Emacs development discussions.
  0 siblings, 1 reply; 6+ messages in thread
From: Yuan Fu @ 2025-01-04  8:21 UTC (permalink / raw)
  To: Pranshu Sharma; +Cc: emacs-devel



> On Jan 2, 2025, at 6:48 AM, Pranshu Sharma <pranshu@bauherren.ovh> wrote:
> 
> 
> I'm making cperl clone using treesitter, and have done all of
> highlighting apart from regex and pod.
> 
> For regexp, I need different grammer to highlight it, and using the
> treesit-parser-set-included-ranges doesn't work.  An example:
> 
> preq knowledge:
> 
> 's/bi?g/small/' replaces instances of 'bg' and 'big' with 'small', and
> 's/([0-9]+)/$1 + 1/e' incrimental all number (the 'e' at the end tells
> perl to evaluate the code).
> 
> the parse tree of 's/([0-9]+)/$1 + 1/e' is:
> (substitution_regexp operator: s '
>     content: (regexp_content not-interpolated not-interpolated) '
>     (replacement
>      (scalar $ (varname)))
>     ' modifiers: (substitution_regexp_modifiers))
> 
> (replacement) needs to be conditionally parsed as perl over here because
> of the 'e' modifier.  Now I cannot use range for this, because say if I
> had:
> 
> 's/(([0-9]+),)+/s#([0-9]+)#$1 + 1#e/e;'
>                           ^^^^^^          Perl code
>                ^^^^^^^^^^^^^^^^^^^        Perl code
> 
> 
> The replacement contains another replacment which contains perl code, so
> it overlaps
> 
> So I won't have any way to highlight.  It seems making this work could
> be possible using nested parsers with their own setting each using own
> local treesit-range-settings, but this seems really hard with
> treesit-range-settings being a buffer local variable.
> 
> -- 
> Pranshu Sharma <https://p.bauherren.ovh>

Ok, so the problem is nested parsers. I don’t think the overlap would cause any problem. Right now treesit-range-settings can only give you one nested layer. I’ll need to make it support nesting a parser inside a local parser of the same language. I’ll work on that once I wrap up the thing I’m working on right now :-)

Yuan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Treesitter injection support
  2025-01-04  8:21 ` Yuan Fu
@ 2025-01-04 16:33   ` Pranshu Sharma via Emacs development discussions.
  2025-01-04 19:23     ` Yuan Fu
  0 siblings, 1 reply; 6+ messages in thread
From: Pranshu Sharma via Emacs development discussions. @ 2025-01-04 16:33 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

Yuan Fu <casouri@gmail.com> writes:

>> On Jan 2, 2025, at 6:48 AM, Pranshu Sharma <pranshu@bauherren.ovh> wrote:
>> 
>> 
>> I'm making cperl clone using treesitter, and have done all of
>> highlighting apart from regex and pod.
>> 
>> For regexp, I need different grammer to highlight it, and using the
>> treesit-parser-set-included-ranges doesn't work.  An example:
>> 
>> preq knowledge:
>> 
>> 's/bi?g/small/' replaces instances of 'bg' and 'big' with 'small', and
>> 's/([0-9]+)/$1 + 1/e' incrimental all number (the 'e' at the end tells
>> perl to evaluate the code).
>> 
>> the parse tree of 's/([0-9]+)/$1 + 1/e' is:
>> (substitution_regexp operator: s '
>>     content: (regexp_content not-interpolated not-interpolated) '
>>     (replacement
>>      (scalar $ (varname)))
>>     ' modifiers: (substitution_regexp_modifiers))
>> 
>> (replacement) needs to be conditionally parsed as perl over here because
>> of the 'e' modifier.  Now I cannot use range for this, because say if I
>> had:
>> 
>> 's/(([0-9]+),)+/s#([0-9]+)#$1 + 1#e/e;'
>>                           ^^^^^^          Perl code
>>                ^^^^^^^^^^^^^^^^^^^        Perl code
>> 
>> 
>> The replacement contains another replacment which contains perl code, so
>> it overlaps
>> 
>> So I won't have any way to highlight.  It seems making this work could
>> be possible using nested parsers with their own setting each using own
>> local treesit-range-settings, but this seems really hard with
>> treesit-range-settings being a buffer local variable.
>> 
>
> Ok, so the problem is nested parsers. I don’t think the overlap would
> cause any problem. Right now treesit-range-settings can only give you
> one nested layer. I’ll need to make it support nesting a parser inside
> a local parser of the same language. I’ll work on that once I wrap up
> the thing I’m working on right now :-)

Thanks, this definetly seems like the problem.  Also the
treesit-range-settings seems kind of unstable, example when I purposly
leave closed string before it, and close the string, it doesn't reparse.

-- 
Pranshu Sharma <https://p.bauherren.ovh>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Treesitter injection support
  2025-01-04 16:33   ` Pranshu Sharma via Emacs development discussions.
@ 2025-01-04 19:23     ` Yuan Fu
  2025-01-07  9:36       ` Pranshu Sharma via Emacs development discussions.
  0 siblings, 1 reply; 6+ messages in thread
From: Yuan Fu @ 2025-01-04 19:23 UTC (permalink / raw)
  To: Pranshu Sharma; +Cc: emacs-devel



> On Jan 4, 2025, at 8:33 AM, Pranshu Sharma <pranshu@bauherren.ovh> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> On Jan 2, 2025, at 6:48 AM, Pranshu Sharma <pranshu@bauherren.ovh> wrote:
>>> 
>>> 
>>> I'm making cperl clone using treesitter, and have done all of
>>> highlighting apart from regex and pod.
>>> 
>>> For regexp, I need different grammer to highlight it, and using the
>>> treesit-parser-set-included-ranges doesn't work.  An example:
>>> 
>>> preq knowledge:
>>> 
>>> 's/bi?g/small/' replaces instances of 'bg' and 'big' with 'small', and
>>> 's/([0-9]+)/$1 + 1/e' incrimental all number (the 'e' at the end tells
>>> perl to evaluate the code).
>>> 
>>> the parse tree of 's/([0-9]+)/$1 + 1/e' is:
>>> (substitution_regexp operator: s '
>>>    content: (regexp_content not-interpolated not-interpolated) '
>>>    (replacement
>>>     (scalar $ (varname)))
>>>    ' modifiers: (substitution_regexp_modifiers))
>>> 
>>> (replacement) needs to be conditionally parsed as perl over here because
>>> of the 'e' modifier.  Now I cannot use range for this, because say if I
>>> had:
>>> 
>>> 's/(([0-9]+),)+/s#([0-9]+)#$1 + 1#e/e;'
>>>                          ^^^^^^          Perl code
>>>               ^^^^^^^^^^^^^^^^^^^        Perl code
>>> 
>>> 
>>> The replacement contains another replacment which contains perl code, so
>>> it overlaps
>>> 
>>> So I won't have any way to highlight.  It seems making this work could
>>> be possible using nested parsers with their own setting each using own
>>> local treesit-range-settings, but this seems really hard with
>>> treesit-range-settings being a buffer local variable.
>>> 
>> 
>> Ok, so the problem is nested parsers. I don’t think the overlap would
>> cause any problem. Right now treesit-range-settings can only give you
>> one nested layer. I’ll need to make it support nesting a parser inside
>> a local parser of the same language. I’ll work on that once I wrap up
>> the thing I’m working on right now :-)
> 
> Thanks, this definetly seems like the problem.  Also the
> treesit-range-settings seems kind of unstable, example when I purposly
> leave closed string before it, and close the string, it doesn't reparse.
> 
> -- 
> Pranshu Sharma <https://p.bauherren.ovh>

Can you show me a concrete example (reproduce recipe)? I can look into it.

Yuan




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Treesitter injection support
  2025-01-04 19:23     ` Yuan Fu
@ 2025-01-07  9:36       ` Pranshu Sharma via Emacs development discussions.
  2025-01-12  7:52         ` Yuan Fu
  0 siblings, 1 reply; 6+ messages in thread
From: Pranshu Sharma via Emacs development discussions. @ 2025-01-07  9:36 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1037 bytes --]

Yuan Fu <casouri@gmail.com> writes:

>>> Ok, so the problem is nested parsers. I don’t think the overlap
>>> would
>>> cause any problem. Right now treesit-range-settings can only give
>>> you
>>> one nested layer. I’ll need to make it support nesting a parser
>>> inside
>>> a local parser of the same language. I’ll work on that once I wrap
>>> up
>>> the thing I’m working on right now :-)
>> 
>> Thanks, this definetly seems like the problem.  Also the
>> treesit-range-settings seems kind of unstable, example when I
>> purposly
>> leave closed string before it, and close the string, it doesn't
>> reparse.
>
> Can you show me a concrete example (reproduce recipe)? I can look into
> it.
>


I attached a video that demostrated it.  It also shows a second bug, in
which perfomance is exponentially bad because of
treesit-font-lock-settings.

I've attached all the relevent fiels, and note the long file with
horrible perfomance was
https://github.com/git/git/blob/master/gitweb/gitweb.perl.


[-- Attachment #2: Bug --]
[-- Type: video/x-matroska, Size: 1092164 bytes --]

[-- Attachment #3: perl-ts-mode.el --]
[-- Type: application/emacs-lisp, Size: 16705 bytes --]

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: test.pl --]
[-- Type: text/x-perl, Size: 2813 bytes --]


# Comment

use hello qw(he owq);
require few;

my $var = <<"BLAH";
Test thing this is string $vlah
BLAH

my $var = "thing";
my $newvar = 'thing' . $var;
# ew
($var, my @arr) = (1,2,3,4,5);

my $sarr = join '-', @arr;

my $teacher = Person->new; 

# my @txt_files  =  <ProgramFiles/*.tx>;

$teacher->name('Foo');

print for qw[1 2 3 4];

$var =~ m#h(el)lo#;

$thign =~ /^eg([2-1])regex$/;

sub thing {
  say for (1,2,3,4)
}

=head1 First level heading

Here's a line of code that won't execute:

        print "How'd you see this!?\n";
	say "hello" for (1,2,3);
	
=over 4

Hello

=item First item

=item Second item

=back

=cut

sub func {
  $thing =~ s/edw/s/;
  $thing =~ s/(([0-9]+),)+\+/ join ",", "hello" for @_ /e;
  #                          ^^^^^^^^^^^^^^^^^^^^^^
  #                                  |
  #                         Here watch how it loses it's colour
  $thing =~ m/thing/;
  $thing =~ y/thing/about/c;
}

my $thing = abs(/1/, 1);

if (1) {
  2;
}
# $thing =~ s/(([0-9]+),)+/ "$_" for @_ /e;

my %hash = (
  thing => 1 + 2,
  other => 2,
  blah => 2
  );

my @ls = (
  1, 3,
  2, 3, 3
  );

if (3) {
  while (my ($a, $b) = each %hash) {
    my $file = do {local $/ = undef;};
    chomp;
    abs 2;
    sort @thing;
    map { $_ + 1 } qw(1 2 43 4 );
    unpack "thing";
    return "string";
  }
}

print "hello $we wow @thing, re";

$thing->whe;

class My::Example 1.234 {
  field $x;
  
  ADJUST {
    $x = "Hello, world";
  }
  ADJUST {
    $x = "Hello, world";
  }
  
  method print_message {
    say $x;
  }
  sub thing {
    "thing"
  }
}


class New::Example 1.234 {
  field $x;
  ADJUST {
    $x = "Hello, world";
  }
  ADJUST {
    $x = "Hello, world";
  }
  method print_message {
    say $x;
  }
}

package PDate;

sub new {
  my $class = shift;
  my $self = { year => 0 + shift,
	       month => 0 + shift,
	       day => 0 + shift,
  };
  bless $self, $class;
  return $self;
}

# $d1 is greater than $d2
sub cmp  {
  my ($d1, $d2) = @_;
  for ($d1->{year} <=> $d2->{year},
       $d1->{month} <=> $d2->{month},
       $d1->{day} <=> $d2->{day}) {
    return $_ unless $_ == 0
  }
  0
}
use overload '<=>' => \&cmp;

sub fmt {
  my $self = shift;
  my @months =
    qw(January Febuary March April May June July August September November October December);
  my $n = $self->{day};
  if ($n == 1) { $n = '1st' }
    elsif (($n - 2) % 10 == 0) { $n = "${n}nd" }
      elsif (($n - 3) % 10 == 0) { $n = "${n}rd" }
	else { $n = "${n}th" }
  $months[$self->{month} - 1] . " $n, " . $self->{year}
}

sub short_fmt {
    my $self = shift;
    join "-", ($self->{year}, $self->{month}, $self->{day});
  }

sub text_easy {
    my $self = shift;
    join "-", ($self->{year}, $self->{month}, $self->{day});
}

say %latest_commit = %{$commitlist[0]};
say Dumper %hello;
say $thing{hello};

[-- Attachment #5: Type: text/plain, Size: 47 bytes --]



-- 
Pranshu Sharma <https://p.bauherren.ovh>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Treesitter injection support
  2025-01-07  9:36       ` Pranshu Sharma via Emacs development discussions.
@ 2025-01-12  7:52         ` Yuan Fu
  0 siblings, 0 replies; 6+ messages in thread
From: Yuan Fu @ 2025-01-12  7:52 UTC (permalink / raw)
  To: Pranshu Sharma; +Cc: Emacs Devel



> On Jan 7, 2025, at 1:36 AM, Pranshu Sharma <pranshu@bauherren.ovh> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>>> Ok, so the problem is nested parsers. I don’t think the overlap
>>>> would
>>>> cause any problem. Right now treesit-range-settings can only give
>>>> you
>>>> one nested layer. I’ll need to make it support nesting a parser
>>>> inside
>>>> a local parser of the same language. I’ll work on that once I wrap
>>>> up
>>>> the thing I’m working on right now :-)
>>> 
>>> Thanks, this definetly seems like the problem.  Also the
>>> treesit-range-settings seems kind of unstable, example when I
>>> purposly
>>> leave closed string before it, and close the string, it doesn't
>>> reparse.
>> 
>> Can you show me a concrete example (reproduce recipe)? I can look into
>> it.
>> 
> 
> 
> I attached a video that demostrated it.  It also shows a second bug, in
> which perfomance is exponentially bad because of
> treesit-font-lock-settings.
> 
> I've attached all the relevent fiels, and note the long file with
> horrible perfomance was
> https://github.com/git/git/blob/master/gitweb/gitweb.perl.
> 
> <simplescreenrecorder-2025-01-07_19.25.54.mkv><perl-ts-mode.el><test.pl>

Thanks! I’m a bit overwhelmed with todo’s right now but I’ll come back to this.

Yuan


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-01-12  7:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-02 14:48 Treesitter injection support Pranshu Sharma via Emacs development discussions.
2025-01-04  8:21 ` Yuan Fu
2025-01-04 16:33   ` Pranshu Sharma via Emacs development discussions.
2025-01-04 19:23     ` Yuan Fu
2025-01-07  9:36       ` Pranshu Sharma via Emacs development discussions.
2025-01-12  7:52         ` Yuan Fu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).