* Treesitter injection support
@ 2025-01-02 14:48 Pranshu Sharma via Emacs development discussions.
2025-01-04 8:21 ` Yuan Fu
0 siblings, 1 reply; 6+ messages in thread
From: Pranshu Sharma via Emacs development discussions. @ 2025-01-02 14:48 UTC (permalink / raw)
To: emacs-devel; +Cc: casouri
I'm making cperl clone using treesitter, and have done all of
highlighting apart from regex and pod.
For regexp, I need different grammer to highlight it, and using the
treesit-parser-set-included-ranges doesn't work. An example:
preq knowledge:
's/bi?g/small/' replaces instances of 'bg' and 'big' with 'small', and
's/([0-9]+)/$1 + 1/e' incrimental all number (the 'e' at the end tells
perl to evaluate the code).
the parse tree of 's/([0-9]+)/$1 + 1/e' is:
(substitution_regexp operator: s '
content: (regexp_content not-interpolated not-interpolated) '
(replacement
(scalar $ (varname)))
' modifiers: (substitution_regexp_modifiers))
(replacement) needs to be conditionally parsed as perl over here because
of the 'e' modifier. Now I cannot use range for this, because say if I
had:
's/(([0-9]+),)+/s#([0-9]+)#$1 + 1#e/e;'
^^^^^^ Perl code
^^^^^^^^^^^^^^^^^^^ Perl code
The replacement contains another replacment which contains perl code, so
it overlaps
So I won't have any way to highlight. It seems making this work could
be possible using nested parsers with their own setting each using own
local treesit-range-settings, but this seems really hard with
treesit-range-settings being a buffer local variable.
--
Pranshu Sharma <https://p.bauherren.ovh>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Treesitter injection support
2025-01-02 14:48 Treesitter injection support Pranshu Sharma via Emacs development discussions.
@ 2025-01-04 8:21 ` Yuan Fu
2025-01-04 16:33 ` Pranshu Sharma via Emacs development discussions.
0 siblings, 1 reply; 6+ messages in thread
From: Yuan Fu @ 2025-01-04 8:21 UTC (permalink / raw)
To: Pranshu Sharma; +Cc: emacs-devel
> On Jan 2, 2025, at 6:48 AM, Pranshu Sharma <pranshu@bauherren.ovh> wrote:
>
>
> I'm making cperl clone using treesitter, and have done all of
> highlighting apart from regex and pod.
>
> For regexp, I need different grammer to highlight it, and using the
> treesit-parser-set-included-ranges doesn't work. An example:
>
> preq knowledge:
>
> 's/bi?g/small/' replaces instances of 'bg' and 'big' with 'small', and
> 's/([0-9]+)/$1 + 1/e' incrimental all number (the 'e' at the end tells
> perl to evaluate the code).
>
> the parse tree of 's/([0-9]+)/$1 + 1/e' is:
> (substitution_regexp operator: s '
> content: (regexp_content not-interpolated not-interpolated) '
> (replacement
> (scalar $ (varname)))
> ' modifiers: (substitution_regexp_modifiers))
>
> (replacement) needs to be conditionally parsed as perl over here because
> of the 'e' modifier. Now I cannot use range for this, because say if I
> had:
>
> 's/(([0-9]+),)+/s#([0-9]+)#$1 + 1#e/e;'
> ^^^^^^ Perl code
> ^^^^^^^^^^^^^^^^^^^ Perl code
>
>
> The replacement contains another replacment which contains perl code, so
> it overlaps
>
> So I won't have any way to highlight. It seems making this work could
> be possible using nested parsers with their own setting each using own
> local treesit-range-settings, but this seems really hard with
> treesit-range-settings being a buffer local variable.
>
> --
> Pranshu Sharma <https://p.bauherren.ovh>
Ok, so the problem is nested parsers. I don’t think the overlap would cause any problem. Right now treesit-range-settings can only give you one nested layer. I’ll need to make it support nesting a parser inside a local parser of the same language. I’ll work on that once I wrap up the thing I’m working on right now :-)
Yuan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Treesitter injection support
2025-01-04 8:21 ` Yuan Fu
@ 2025-01-04 16:33 ` Pranshu Sharma via Emacs development discussions.
2025-01-04 19:23 ` Yuan Fu
0 siblings, 1 reply; 6+ messages in thread
From: Pranshu Sharma via Emacs development discussions. @ 2025-01-04 16:33 UTC (permalink / raw)
To: Yuan Fu; +Cc: emacs-devel
Yuan Fu <casouri@gmail.com> writes:
>> On Jan 2, 2025, at 6:48 AM, Pranshu Sharma <pranshu@bauherren.ovh> wrote:
>>
>>
>> I'm making cperl clone using treesitter, and have done all of
>> highlighting apart from regex and pod.
>>
>> For regexp, I need different grammer to highlight it, and using the
>> treesit-parser-set-included-ranges doesn't work. An example:
>>
>> preq knowledge:
>>
>> 's/bi?g/small/' replaces instances of 'bg' and 'big' with 'small', and
>> 's/([0-9]+)/$1 + 1/e' incrimental all number (the 'e' at the end tells
>> perl to evaluate the code).
>>
>> the parse tree of 's/([0-9]+)/$1 + 1/e' is:
>> (substitution_regexp operator: s '
>> content: (regexp_content not-interpolated not-interpolated) '
>> (replacement
>> (scalar $ (varname)))
>> ' modifiers: (substitution_regexp_modifiers))
>>
>> (replacement) needs to be conditionally parsed as perl over here because
>> of the 'e' modifier. Now I cannot use range for this, because say if I
>> had:
>>
>> 's/(([0-9]+),)+/s#([0-9]+)#$1 + 1#e/e;'
>> ^^^^^^ Perl code
>> ^^^^^^^^^^^^^^^^^^^ Perl code
>>
>>
>> The replacement contains another replacment which contains perl code, so
>> it overlaps
>>
>> So I won't have any way to highlight. It seems making this work could
>> be possible using nested parsers with their own setting each using own
>> local treesit-range-settings, but this seems really hard with
>> treesit-range-settings being a buffer local variable.
>>
>
> Ok, so the problem is nested parsers. I don’t think the overlap would
> cause any problem. Right now treesit-range-settings can only give you
> one nested layer. I’ll need to make it support nesting a parser inside
> a local parser of the same language. I’ll work on that once I wrap up
> the thing I’m working on right now :-)
Thanks, this definetly seems like the problem. Also the
treesit-range-settings seems kind of unstable, example when I purposly
leave closed string before it, and close the string, it doesn't reparse.
--
Pranshu Sharma <https://p.bauherren.ovh>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Treesitter injection support
2025-01-04 16:33 ` Pranshu Sharma via Emacs development discussions.
@ 2025-01-04 19:23 ` Yuan Fu
2025-01-07 9:36 ` Pranshu Sharma via Emacs development discussions.
0 siblings, 1 reply; 6+ messages in thread
From: Yuan Fu @ 2025-01-04 19:23 UTC (permalink / raw)
To: Pranshu Sharma; +Cc: emacs-devel
> On Jan 4, 2025, at 8:33 AM, Pranshu Sharma <pranshu@bauherren.ovh> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>>> On Jan 2, 2025, at 6:48 AM, Pranshu Sharma <pranshu@bauherren.ovh> wrote:
>>>
>>>
>>> I'm making cperl clone using treesitter, and have done all of
>>> highlighting apart from regex and pod.
>>>
>>> For regexp, I need different grammer to highlight it, and using the
>>> treesit-parser-set-included-ranges doesn't work. An example:
>>>
>>> preq knowledge:
>>>
>>> 's/bi?g/small/' replaces instances of 'bg' and 'big' with 'small', and
>>> 's/([0-9]+)/$1 + 1/e' incrimental all number (the 'e' at the end tells
>>> perl to evaluate the code).
>>>
>>> the parse tree of 's/([0-9]+)/$1 + 1/e' is:
>>> (substitution_regexp operator: s '
>>> content: (regexp_content not-interpolated not-interpolated) '
>>> (replacement
>>> (scalar $ (varname)))
>>> ' modifiers: (substitution_regexp_modifiers))
>>>
>>> (replacement) needs to be conditionally parsed as perl over here because
>>> of the 'e' modifier. Now I cannot use range for this, because say if I
>>> had:
>>>
>>> 's/(([0-9]+),)+/s#([0-9]+)#$1 + 1#e/e;'
>>> ^^^^^^ Perl code
>>> ^^^^^^^^^^^^^^^^^^^ Perl code
>>>
>>>
>>> The replacement contains another replacment which contains perl code, so
>>> it overlaps
>>>
>>> So I won't have any way to highlight. It seems making this work could
>>> be possible using nested parsers with their own setting each using own
>>> local treesit-range-settings, but this seems really hard with
>>> treesit-range-settings being a buffer local variable.
>>>
>>
>> Ok, so the problem is nested parsers. I don’t think the overlap would
>> cause any problem. Right now treesit-range-settings can only give you
>> one nested layer. I’ll need to make it support nesting a parser inside
>> a local parser of the same language. I’ll work on that once I wrap up
>> the thing I’m working on right now :-)
>
> Thanks, this definetly seems like the problem. Also the
> treesit-range-settings seems kind of unstable, example when I purposly
> leave closed string before it, and close the string, it doesn't reparse.
>
> --
> Pranshu Sharma <https://p.bauherren.ovh>
Can you show me a concrete example (reproduce recipe)? I can look into it.
Yuan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Treesitter injection support
2025-01-04 19:23 ` Yuan Fu
@ 2025-01-07 9:36 ` Pranshu Sharma via Emacs development discussions.
2025-01-12 7:52 ` Yuan Fu
0 siblings, 1 reply; 6+ messages in thread
From: Pranshu Sharma via Emacs development discussions. @ 2025-01-07 9:36 UTC (permalink / raw)
To: Yuan Fu; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1037 bytes --]
Yuan Fu <casouri@gmail.com> writes:
>>> Ok, so the problem is nested parsers. I don’t think the overlap
>>> would
>>> cause any problem. Right now treesit-range-settings can only give
>>> you
>>> one nested layer. I’ll need to make it support nesting a parser
>>> inside
>>> a local parser of the same language. I’ll work on that once I wrap
>>> up
>>> the thing I’m working on right now :-)
>>
>> Thanks, this definetly seems like the problem. Also the
>> treesit-range-settings seems kind of unstable, example when I
>> purposly
>> leave closed string before it, and close the string, it doesn't
>> reparse.
>
> Can you show me a concrete example (reproduce recipe)? I can look into
> it.
>
I attached a video that demostrated it. It also shows a second bug, in
which perfomance is exponentially bad because of
treesit-font-lock-settings.
I've attached all the relevent fiels, and note the long file with
horrible perfomance was
https://github.com/git/git/blob/master/gitweb/gitweb.perl.
[-- Attachment #2: Bug --]
[-- Type: video/x-matroska, Size: 1092164 bytes --]
[-- Attachment #3: perl-ts-mode.el --]
[-- Type: application/emacs-lisp, Size: 16705 bytes --]
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: test.pl --]
[-- Type: text/x-perl, Size: 2813 bytes --]
# Comment
use hello qw(he owq);
require few;
my $var = <<"BLAH";
Test thing this is string $vlah
BLAH
my $var = "thing";
my $newvar = 'thing' . $var;
# ew
($var, my @arr) = (1,2,3,4,5);
my $sarr = join '-', @arr;
my $teacher = Person->new;
# my @txt_files = <ProgramFiles/*.tx>;
$teacher->name('Foo');
print for qw[1 2 3 4];
$var =~ m#h(el)lo#;
$thign =~ /^eg([2-1])regex$/;
sub thing {
say for (1,2,3,4)
}
=head1 First level heading
Here's a line of code that won't execute:
print "How'd you see this!?\n";
say "hello" for (1,2,3);
=over 4
Hello
=item First item
=item Second item
=back
=cut
sub func {
$thing =~ s/edw/s/;
$thing =~ s/(([0-9]+),)+\+/ join ",", "hello" for @_ /e;
# ^^^^^^^^^^^^^^^^^^^^^^
# |
# Here watch how it loses it's colour
$thing =~ m/thing/;
$thing =~ y/thing/about/c;
}
my $thing = abs(/1/, 1);
if (1) {
2;
}
# $thing =~ s/(([0-9]+),)+/ "$_" for @_ /e;
my %hash = (
thing => 1 + 2,
other => 2,
blah => 2
);
my @ls = (
1, 3,
2, 3, 3
);
if (3) {
while (my ($a, $b) = each %hash) {
my $file = do {local $/ = undef;};
chomp;
abs 2;
sort @thing;
map { $_ + 1 } qw(1 2 43 4 );
unpack "thing";
return "string";
}
}
print "hello $we wow @thing, re";
$thing->whe;
class My::Example 1.234 {
field $x;
ADJUST {
$x = "Hello, world";
}
ADJUST {
$x = "Hello, world";
}
method print_message {
say $x;
}
sub thing {
"thing"
}
}
class New::Example 1.234 {
field $x;
ADJUST {
$x = "Hello, world";
}
ADJUST {
$x = "Hello, world";
}
method print_message {
say $x;
}
}
package PDate;
sub new {
my $class = shift;
my $self = { year => 0 + shift,
month => 0 + shift,
day => 0 + shift,
};
bless $self, $class;
return $self;
}
# $d1 is greater than $d2
sub cmp {
my ($d1, $d2) = @_;
for ($d1->{year} <=> $d2->{year},
$d1->{month} <=> $d2->{month},
$d1->{day} <=> $d2->{day}) {
return $_ unless $_ == 0
}
0
}
use overload '<=>' => \&cmp;
sub fmt {
my $self = shift;
my @months =
qw(January Febuary March April May June July August September November October December);
my $n = $self->{day};
if ($n == 1) { $n = '1st' }
elsif (($n - 2) % 10 == 0) { $n = "${n}nd" }
elsif (($n - 3) % 10 == 0) { $n = "${n}rd" }
else { $n = "${n}th" }
$months[$self->{month} - 1] . " $n, " . $self->{year}
}
sub short_fmt {
my $self = shift;
join "-", ($self->{year}, $self->{month}, $self->{day});
}
sub text_easy {
my $self = shift;
join "-", ($self->{year}, $self->{month}, $self->{day});
}
say %latest_commit = %{$commitlist[0]};
say Dumper %hello;
say $thing{hello};
[-- Attachment #5: Type: text/plain, Size: 47 bytes --]
--
Pranshu Sharma <https://p.bauherren.ovh>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Treesitter injection support
2025-01-07 9:36 ` Pranshu Sharma via Emacs development discussions.
@ 2025-01-12 7:52 ` Yuan Fu
0 siblings, 0 replies; 6+ messages in thread
From: Yuan Fu @ 2025-01-12 7:52 UTC (permalink / raw)
To: Pranshu Sharma; +Cc: Emacs Devel
> On Jan 7, 2025, at 1:36 AM, Pranshu Sharma <pranshu@bauherren.ovh> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>>>> Ok, so the problem is nested parsers. I don’t think the overlap
>>>> would
>>>> cause any problem. Right now treesit-range-settings can only give
>>>> you
>>>> one nested layer. I’ll need to make it support nesting a parser
>>>> inside
>>>> a local parser of the same language. I’ll work on that once I wrap
>>>> up
>>>> the thing I’m working on right now :-)
>>>
>>> Thanks, this definetly seems like the problem. Also the
>>> treesit-range-settings seems kind of unstable, example when I
>>> purposly
>>> leave closed string before it, and close the string, it doesn't
>>> reparse.
>>
>> Can you show me a concrete example (reproduce recipe)? I can look into
>> it.
>>
>
>
> I attached a video that demostrated it. It also shows a second bug, in
> which perfomance is exponentially bad because of
> treesit-font-lock-settings.
>
> I've attached all the relevent fiels, and note the long file with
> horrible perfomance was
> https://github.com/git/git/blob/master/gitweb/gitweb.perl.
>
> <simplescreenrecorder-2025-01-07_19.25.54.mkv><perl-ts-mode.el><test.pl>
Thanks! I’m a bit overwhelmed with todo’s right now but I’ll come back to this.
Yuan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-01-12 7:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-02 14:48 Treesitter injection support Pranshu Sharma via Emacs development discussions.
2025-01-04 8:21 ` Yuan Fu
2025-01-04 16:33 ` Pranshu Sharma via Emacs development discussions.
2025-01-04 19:23 ` Yuan Fu
2025-01-07 9:36 ` Pranshu Sharma via Emacs development discussions.
2025-01-12 7:52 ` Yuan Fu
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).