Hi, Andrew Tropin writes: > [[PGP Signed Part:Undecided]] > On 2023-02-12 12:07, Pierre Langlois wrote: > >> Hi, >> >> Andrew Tropin writes: >> >>> [[PGP Signed Part:Undecided]] >>> On 2023-02-10 15:48, Pierre Langlois wrote: >>> >>>> Hi Andrew, thanks for pushing this along! It's great to see things >>>> getting merged. >>>> >>>> Andrew Tropin writes: >>>> >>>>> [[PGP Signed Part:Undecided]] >>>>> On 2023-02-09 18:04, Andrew Tropin wrote: >>>>> >>>>>> On 2023-02-09 13:39, zimoun wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On Thu, 09 Feb 2023 at 14:11, Andrew Tropin wrote: >>>>>>> >>>>>>>> I applied tree-sitter and tree-sitter-cli patches, >>>>>>> >>>>>>> Just to be sure to understand, you have only applied 02/32 and 05/32, >>>>>>> right? >>>>>>> >>>>>>> >>>>>>> [bug#49946] [PATCH v7 02/32] gnu: tree-sitter: Update to 0.20.7. >>>>>>> id:20221125012142.22579-3-pierre.langlois@gmx.com >>>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-3-pierre.langlois@gmx.com >>>>>>> >>>>>>> [bug#49946] [PATCH v7 05/32] gnu: Add tree-sitter-cli. >>>>>>> id:20221125012142.22579-6-pierre.langlois@gmx.com >>>>>>> http://issues.guix.gnu.org/msgid/20221125012142.22579-6-pierre.langlois@gmx.com >>>>>>> >>>>>>> Leaving out all the others, right? >>>>>> >>>>>> Merged first 5 patches from 01 to 05, also added one more commit, which >>>>>> addresses some things from reviews and one commit, which adds html >>>>>> grammar. >>>>>> >>>>>> The html grammar is added for the testing purposes. It relies on >>>>>> generated parser.c and scanner.c and we will need to repackage it using >>>>>> grammar.js instead. I'm not sure if a separate build system is needed >>>>>> for this, I guess we can just rewrite tree-sitter-grammar function, >>>>>> which generates packages as in example with tree-sitter-grammar-html: >>>>>> https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/tree-sitter.scm?h=53b00b91b73bd60412d5bd057e22e6d63194a7f7#n158 >>>>>> >>>>>> Anyway, I only skimmed tree-sitter-build-system source code, and plan to >>>>>> read it carefully, evaluate and either introduce new build system or >>>>>> just move all needed parts to tree-sitter-grammar function. WDYT? >>>>>> After we done with it we can package all other grammars. >>>>> >>>>> Ok, I realized that the proper build process for tree-sitter grammars is >>>>> a little harder than I expected, tree-sitter-build system make sense. I >>>>> reviewed it, made a small change: >>>> >>>> Ah great, I was going to comment to try and push for us to keep the >>>> build system. I originally went with a template package and inheritance, >>>> but Maxime suggested moving to a build-system which ended up making the >>>> package definitions a *lot* nicer IMO (see previous discussion here >>>> https://issues.guix.gnu.org/49946#144). It also allows us to deal with >>>> grammars that depend on each other more nicely I think. >>>> >>>>> >>>>> @@ -29,7 +29,7 @@ (define-module (guix build tree-sitter-build-system) >>>>> ;; Commentary: >>>>> ;; >>>>> ;; Build procedures for tree-sitter grammar packages. This is the >>>>> -;; builder-side code, which builds on top fo the node build-system. >>>>> +;; builder-side code, which builds on top of the node build-system. >>>>> ;; >>>>> ;; Tree-sitter grammars are written in JavaScript and compiled to a native >>>>> ;; shared object. The `tree-sitter generate' command invokes `node' in order >>>>> @@ -114,7 +114,7 @@ (define (compile-language dir) >>>>> "-fno-exceptions" >>>>> "-O2" >>>>> "-g" >>>>> - "-o" ,(string-append lib "/" lang ".so") >>>>> + "-o" ,(string-append lib "/libtree-sitter-" lang ".so") >>>>> ;; An additional `scanner.{c,cc}' file is sometimes >>>>> ;; provided. >>>>> ,@(cond >>>>> >>>>> >>>>> rewrote html grammar to use this build system and made it work with >>>>> built-in treesit package. Also, tried examples of c and cpp grammars >>>>> from patches in this thread. >>>>> >>>>> If you ok with it, I'll push the build system to master and update the >>>>> html grammar accordingly. >>>>> >>>>> The final result will look like this: >>>>> >>>>> (define tree-sitter-delete-generated-files >>>>> #~(begin >>>>> (delete-file "binding.gyp") >>>>> (delete-file-recursively "bindings") >>>>> (delete-file "src/grammar.json") >>>>> (delete-file "src/node-types.json") >>>>> (delete-file "src/parser.c") >>>>> (delete-file-recursively "src/tree_sitter"))) >>>>> >>>>> (define* (tree-sitter-grammar >>>>> language language-for-synopsis version commit hash >>>>> #:key >>>>> (repository-url >>>>> (format #f "https://github.com/tree-sitter/tree-sitter-~a" language)) >>>>> (inputs '())) >>>>> (let ((synopsis (string-append language-for-synopsis >>>>> " grammar for tree-sitter")) >>>>> (name (string-append "tree-sitter-grammar-" language))) >>>>> (package >>>>> (name name) >>>>> (version version) >>>>> (home-page repository-url) >>>>> (source (origin >>>>> (method git-fetch) >>>>> (uri (git-reference >>>>> (url repository-url) >>>>> (commit commit))) >>>>> (file-name (git-file-name name version)) >>>>> (sha256 (base32 hash)) >>>>> (modules '((guix build utils))) >>>>> (snippet tree-sitter-delete-generated-files))) >>>>> (build-system tree-sitter-build-system) >>>>> (inputs inputs) >>>>> (synopsis synopsis) >>>>> (description (string-append synopsis ".")) >>>>> (license license:expat)))) >>>>> >>>>> (define-public tree-sitter-grammar-html >>>>> (tree-sitter-grammar >>>>> "html" "HTML" >>>>> "0.19.0" "v0.19.0" >>>>> "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy")) >>>>> >>>>> After that we can bring the rest of the grammars. >>>> >>>> I would suggest to rmeove the `tree-sitter-grammar' function, and keep >>>> grammars as "regular" package records, even though it's a little bit >>>> more verbose: >>>> >>>> --8<---------------cut here---------------start------------->8--- >>>> (define-public tree-sitter-html >>>> (package >>>> (name "tree-sitter-html") >>> >>> It seems tree-sitter-html mimics upstream package name and probably make >>> more sense than tree-sitter-grammar-html used by me. >> >> Yeah, at some point I think I had named the packages with "grammar" as >> well, but thought it was a bit of a mouthful. I'm also thinking one day >> we may build language bindings as part of the build system (Rust and >> NodeJS I think ATM), so those packages could do more than ship the >> grammar in the future (although we don't know if we'll ever really need >> that). >> >>> >>>> (version "0.19.0") >>>> (source (origin >>>> (method git-fetch) >>>> (uri (git-reference >>>> (url "https://github.com/tree-sitter/tree-sitter-html") >>>> (commit (string-append "v" version)))) >>>> (file-name (git-file-name name version)) >>>> (sha256 >>>> (base32 >>>> "1hg7vbcy7bir6b8x11v0a4x0glvqnsqc3i2ixiarbxmycbgl3axy")) >>>> (modules '((guix build utils))) >>>> (snippet tree-sitter-delete-generated-files))) >>>> (build-system tree-sitter-build-system) >>>> (home-page "https://github.com/tree-sitter/tree-sitter-html") >>>> (synopsis "Tree-sitter HTML grammar") >>>> (description >>>> "This package provides a HTML grammar for the Tree-sitter library.") >>>> (license license:expat))) >>>> --8<---------------cut here---------------end--------------->8--- >>>> >>>> This way, they look like any other package in Guix, which makes it >>>> easier for us to apply automatic changes in the future if needed (for >>>> example like how the input format could be automically updated for all >>>> "simple" package definitions, but had to be manual whenever custom code >>>> refactoring was done). Does this make sense? >>> >>> Make sense, but on the other hand we already have hunspell, aspell >>> dictionaries and probably a few more others, which are very similiar in >>> spirit and we already have to keep in mind their existence on such >>> automatic code updates. >>> >>> It looks that the packages differ only in url for the source code, lang >>> name and sometimes in inputs. Having template package function can make >>> management of shared parts more centralized, reduce possibility of >>> copy-paste mistakes, when the description wasn't updated and so on and >>> can reduce the amount of the code overall (which also reduces the change >>> of introducing an error). >>> >>> I don't have a strong opinion on this topic, but leaning towards the >>> template function slightly more, however I'm completely ok with the >>> standalone package definitions as well. WDYT? >> >> I can think of both cost/benefits to the template so I don't have a >> strong opinion either :-). >> >> I do like the template to make sure people don't forget to delete >> generated files, that's quite important as it seems upstream packages >> often check-in the generated C code. Although, we could probably assert >> that with in the build-system phase? I'll think about that. >> >> On the other hand, I wonder how the template works for packages that >> provide multiple grammars (see ocaml and typescript for example). I >> guess we could use the template for trivial packages, and standalone >> definitions for more complex ones? In general, if we keep the template >> interface really simple, then I'm happy with it. > > Hi Pierre! > > I spend two days trying grammars with and without helper function and > found hepler quite helpful to reduce boilerplate and errors from > copypaste, so I went the way with helper. The logic inside is quite > trivial, the only downside I found so far is that in cases when > repository url constructed automatically I can't easily open the repo > url in the browser. > > I packaged all the grammars from this thread and a few more on top of > it. Updated them to usually latest versions, added some comments, when > needed. > > If I forgot to reply on something or you have any comments/ideas, let me > know! :) > > Kudos to Pierre and everyone, who helped with all the tree-sitter stuff. Thank you for landing all this work!