unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Tree sitter: issue embedding HTML, CSS, JavaScript within a new php-ts-mode
@ 2023-02-09 12:45 Simon Pugnet
  2023-02-10  5:45 ` Yuan Fu
  0 siblings, 1 reply; 3+ messages in thread
From: Simon Pugnet @ 2023-02-09 12:45 UTC (permalink / raw)
  To: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 4063 bytes --]

Dear Emacs maintainers,

I have recently started work on a PHP tree sitter major mode. Things 
are going well so far, however I'm having trouble with embedding 
multiple languages in the PHP buffer.

In case you're not familiar with PHP, here's a quick example (I'm 
using org-mode mark-up in this message which hopefully will help): -

#+begin_src php
  <html lang="en-gb">
    <head>
      <style type="text/css">
       body {
         background: url("/background.png");
         color: #ff0000;
       }
      </style>
    </head>

    <body>
      <?php
      $a = [1, 2, "3", 4.5];
      if (is_array($a)) {
        echo "$a is an array";
      } else {
        echo "$a is not an array";
      }
      ?>

      <div id="my-div">
        <h1>This is a test</h1>
      </div>

      <script type="text/javascript">
       const div = document.getElementById('my-div');
       // This is a JS comment
       /* This too */
       console.log("my-div is:", div);
      </script>

      <?php // Some more PHP here ?>

    </body>
  </html>
#+end_src

As you can see, PHP code is usually encapsulated within a HTML 
document, with PHP code enclosed within ~<?php ... ?>~ blocks.

The first block of HTML from the beginning of the buffer to the first 
~<?php~ is enclosed within a ~(program (text))~ node. The second 
(after ~?>~ and before the second ~<?php~) is enclosed within a 
~(text_interpolation (text))~ node. I have therefore defined the 
following ~treesit-range-settings~ in my major mode: -

#+begin_src emacs-lisp
  (setq-local treesit-range-settings
              (treesit-range-rules
               :embed 'html
               :host 'php
               '((program (text) @capture)
                 (text_interpolation (text) @capture))))
#+end_src

This seems to work however when I evaluate ~(treesit-language-at 
(point))~ anywhere in this buffer I get ='html= in response. This is 
of course expected within a HTML region, but not within a PHP region. 
Despite this, the font-locking I have defined for PHP appears to work 
correctly. I have also defined a custom face and applied it via 
font-locking to the above two nodes to confirm that those regions are 
indeed enclosed as expected and they are.

My hope eventually is to use the following ~treesit-range-settings~: -

#+begin_src emacs-lisp
  (setq-local treesit-range-settings
                  (treesit-range-rules
                   :embed 'html
                   :host 'php
                   '((program (text) @capture)
                     (text_interpolation (text) @capture))

                   :embed 'css
                   :host 'html
                   '((style_element (raw_text) @capture))

                   :embed 'typescript
                   :host 'html
                   '((script_element (raw_text) @capture))))
#+end_src

As well as defining these rules, I require =css-mode= and 
=typescript-ts-mode= and append their own font-locking rules to my 
own. My hope is that this will allow CSS and JavaScript embedded 
within HTML regions to be font-locked according to those separate 
major modes too. This appears to work for simple files but does not 
work reliably for more complex files. Also when using the above I get 
='typescript= whenever I evaluate ~(treesit-language-at (point))~. I'm 
not sure if this is just a bug with the language grammars that I'm 
using or if perhaps because I'm not using the treesit library 
correctly. Because of the issue with ~treesit-language-at~ above I'm 
concerned that it's the latter.

So my questions are: -

1. Based on my rules for embedding ='html= within ='php= above, should 
I expect ~(treesit-language-at (point))~ to return ='php= when the 
point is within a PHP region?
2. Is my goal of embedding HTML within PHP, then embedding CSS and 
JavaScript/TypeScript within HTML feasible and if so am I going about 
this in the right way?

Thank you in advance for your help and thank you for all of your work 
on Emacs and the tree sitter integration.

Kind regards,

-- 
Simon Pugnet
https://www.polaris64.net/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Tree sitter: issue embedding HTML, CSS, JavaScript within a new php-ts-mode
  2023-02-09 12:45 Tree sitter: issue embedding HTML, CSS, JavaScript within a new php-ts-mode Simon Pugnet
@ 2023-02-10  5:45 ` Yuan Fu
  2023-02-10 11:40   ` Simon Pugnet
  0 siblings, 1 reply; 3+ messages in thread
From: Yuan Fu @ 2023-02-10  5:45 UTC (permalink / raw)
  To: Simon Pugnet; +Cc: Emacs developers

Hey Simon,

Thanks for trying this out! Feedbacks like this are very welcome.

> On Feb 9, 2023, at 4:45 AM, Simon Pugnet <simon@polaris64.net> wrote:
> 
> Dear Emacs maintainers,
> 
> I have recently started work on a PHP tree sitter major mode. Things are going well so far, however I'm having trouble with embedding multiple languages in the PHP buffer.
> 
> In case you're not familiar with PHP, here's a quick example (I'm using org-mode mark-up in this message which hopefully will help): -
> 
> #+begin_src php
> <html lang="en-gb">
>   <head>
>     <style type="text/css">
>      body {
>        background: url("/background.png");
>        color: #ff0000;
>      }
>     </style>
>   </head>
> 
>   <body>
>     <?php
>     $a = [1, 2, "3", 4.5];
>     if (is_array($a)) {
>       echo "$a is an array";
>     } else {
>       echo "$a is not an array";
>     }
>     ?>
> 
>     <div id="my-div">
>       <h1>This is a test</h1>
>     </div>
> 
>     <script type="text/javascript">
>      const div = document.getElementById('my-div');
>      // This is a JS comment
>      /* This too */
>      console.log("my-div is:", div);
>     </script>
> 
>     <?php // Some more PHP here ?>
> 
>   </body>
> </html>
> #+end_src
> 
> As you can see, PHP code is usually encapsulated within a HTML document, with PHP code enclosed within ~<?php ... ?>~ blocks.
> 
> The first block of HTML from the beginning of the buffer to the first ~<?php~ is enclosed within a ~(program (text))~ node. The second (after ~?>~ and before the second ~<?php~) is enclosed within a ~(text_interpolation (text))~ node. I have therefore defined the following ~treesit-range-settings~ in my major mode: -
> 
> #+begin_src emacs-lisp
> (setq-local treesit-range-settings
>             (treesit-range-rules
>              :embed 'html
>              :host 'php
>              '((program (text) @capture)
>                (text_interpolation (text) @capture))))
> #+end_src
> 
> This seems to work however when I evaluate ~(treesit-language-at (point))~ anywhere in this buffer I get ='html= in response. This is of course expected within a HTML region, but not within a PHP region. Despite this, the font-locking I have defined for PHP appears to work correctly. I have also defined a custom face and applied it via font-locking to the above two nodes to confirm that those regions are indeed enclosed as expected and they are.
> 
> My hope eventually is to use the following ~treesit-range-settings~: -
> 
> #+begin_src emacs-lisp
> (setq-local treesit-range-settings
>                 (treesit-range-rules
>                  :embed 'html
>                  :host 'php
>                  '((program (text) @capture)
>                    (text_interpolation (text) @capture))
> 
>                  :embed 'css
>                  :host 'html
>                  '((style_element (raw_text) @capture))
> 
>                  :embed 'typescript
>                  :host 'html
>                  '((script_element (raw_text) @capture))))
> #+end_src
> 
> As well as defining these rules, I require =css-mode= and =typescript-ts-mode= and append their own font-locking rules to my own. My hope is that this will allow CSS and JavaScript embedded within HTML regions to be font-locked according to those separate major modes too. This appears to work for simple files but does not work reliably for more complex files. Also when using the above I get ='typescript= whenever I evaluate ~(treesit-language-at (point))~. I'm not sure if this is just a bug with the language grammars that I'm using or if perhaps because I'm not using the treesit library correctly. Because of the issue with ~treesit-language-at~ above I'm concerned that it's the latter.
> 
> So my questions are: -
> 
> 1. Based on my rules for embedding ='html= within ='php= above, should I expect ~(treesit-language-at (point))~ to return ='php= when the point is within a PHP region?

Because we don’t have much experience with tree-sitter and its interfaces, I made treesit-language-at simply delegate work to treesit-language-at-point-function, which can be an arbitrary function, giving developers maximum flexibility. You need to set that variables to a function, otherwise treesit-language-at simply returns the first parser in the parser list. 

> 2. Is my goal of embedding HTML within PHP, then embedding CSS and JavaScript/TypeScript within HTML feasible and if so am I going about this in the right way?

It should be. Although I didn’t thought of having multiple layers of embedded language (in this case PHP embedding HTML embedding CSS/Javascript), if you order the entries in treesit-range-rules like you do now (outer most host language, then embedded language, then embedded embedded language), it should work. Try setting treesit-language-at-point-function and it should work right. If not… then we need to look into it.

Yuan




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Tree sitter: issue embedding HTML, CSS, JavaScript within a new php-ts-mode
  2023-02-10  5:45 ` Yuan Fu
@ 2023-02-10 11:40   ` Simon Pugnet
  0 siblings, 0 replies; 3+ messages in thread
From: Simon Pugnet @ 2023-02-10 11:40 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Emacs developers

[-- Attachment #1: Type: text/plain, Size: 2657 bytes --]

Yuan Fu <casouri@gmail.com> writes:

> Hey Simon,
>
> Thanks for trying this out! Feedbacks like this are very welcome.

Hi Yuan, you're very welcome! I'm glad to be able to help.


>> On Feb 9, 2023, at 4:45 AM, Simon Pugnet <simon@polaris64.net> 
>> wrote:
>> 
>> 1. Based on my rules for embedding ='html= within ='php= above, 
>> should I expect ~(treesit-language-at (point))~ to return ='php= 
>> when the point is within a PHP region?
>
> Because we don’t have much experience with tree-sitter and its
> interfaces, I made treesit-language-at simply delegate work to
> treesit-language-at-point-function, which can be an arbitrary
> function, giving developers maximum flexibility. You need to set 
> that
> variables to a function, otherwise treesit-language-at simply 
> returns
> the first parser in the parser list.

Ah, that makes sense, thank you. I must have missed that in the 
documentation as I thought this was just used when overriding the 
default behaviour.

I've added the following implementation and this appears to work 
nicely: -

  (defun php-ts--language-at-point (point)
    "Return the language at POINT, used to determine which tree sitter 
    parser to use."
  
    (let* ((php-node-at-point (treesit-node-at point 'php))
           (parent-node (treesit-node-parent php-node-at-point)))
      (if (and (string-equal "text" (treesit-node-type 
      php-node-at-point))
               (or (string-equal "program" (treesit-node-type 
               parent-node))
                   (string-equal "text_interpolation" 
                   (treesit-node-type parent-node))))
          'html
        'php)))

The next step will be to run further tests on the current node in 
cases where the language is 'html in order to determine if the actual 
language is 'css or 'javascript.


>> 2. Is my goal of embedding HTML within PHP, then embedding CSS and 
>> JavaScript/TypeScript within HTML feasible and if so am I going 
>> about this in the right way?
>
> It should be. Although I didn’t thought of having multiple layers of
> embedded language (in this case PHP embedding HTML embedding
> CSS/Javascript), if you order the entries in treesit-range-rules 
> like
> you do now (outer most host language, then embedded language, then
> embedded embedded language), it should work. Try setting
> treesit-language-at-point-function and it should work right. If not…
> then we need to look into it.

I'll try this next and I'll be sure to let you know how it goes.


Thanks for your advice, and kind regards,

-- 
Simon Pugnet
https://www.polaris64.net/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-02-10 11:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 12:45 Tree sitter: issue embedding HTML, CSS, JavaScript within a new php-ts-mode Simon Pugnet
2023-02-10  5:45 ` Yuan Fu
2023-02-10 11:40   ` Simon Pugnet

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).