From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter integration on feature/tree-sitter Date: Wed, 11 May 2022 13:14:33 -0700 Message-ID: <8F6A43D1-D1EA-4602-A245-627DB7960FC2@gmail.com> References: <87y1zabmbt.fsf@gmail.com> <5F186EBD-CD21-422B-8B4F-0D5424173334@gmail.com> <875ymdwf76.fsf@gmail.com> <011DA1A3-0FA8-4449-878A-FD6B336B0F1B@gmail.com> <8735hhw75p.fsf@gmail.com> <83czgks4ss.fsf@gnu.org> <87wnesuw63.fsf@gmail.com> <83pmkkqhft.fsf@gnu.org> <87tu9wukbt.fsf@gmail.com> <83ee10qbk7.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.80.82.1.1\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36221"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Yoav Marco , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed May 11 22:17:49 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nosmJ-00095G-6W for ged-emacs-devel@m.gmane-mx.org; Wed, 11 May 2022 22:17:47 +0200 Original-Received: from localhost ([::1]:56228 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nosmH-0006qw-Su for ged-emacs-devel@m.gmane-mx.org; Wed, 11 May 2022 16:17:45 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60024) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nosl5-0005oM-RO for emacs-devel@gnu.org; Wed, 11 May 2022 16:16:34 -0400 Original-Received: from mail-pl1-x634.google.com ([2607:f8b0:4864:20::634]:40575) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nosl0-0000Rp-FV; Wed, 11 May 2022 16:16:31 -0400 Original-Received: by mail-pl1-x634.google.com with SMTP id i1so2935885plg.7; Wed, 11 May 2022 13:16:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=Dr9grCpWfpdJKHJaMOIvaOSFtGbkUT5Vgrbe9xCZ7Ks=; b=NE+1AVZrYHOO+XV23y8CUyt+FQia/x/8IATzu+qGio0QAqt/B490LMCfMq8yudcnHU e51SLt8hEclJMxuO1nuIuAFmyF1v4ju2p4TEwK9sAXxvfeInFTUSfNCMXoBuU7kAkuED XjWXoFKQq+y0Ga1Ta+3wu69mfhYtTSy3I4Crzc3bSS6i06yizWoyYAJS4tDQtf6w9fpa UqgztwLt0nnSLwr2R/Z1Lh0KFvlAo/82uBYbnb8E4aDtZs/Fva7w7uyg92MRaGtrl+rh j+yD2lSfA+PuITliG6HFmIifcP9jISs60wbs9IbEVNfDToTpWBH8XiNxMwaZ8a0eG0Wg 9HWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=Dr9grCpWfpdJKHJaMOIvaOSFtGbkUT5Vgrbe9xCZ7Ks=; b=dlAoXzs58nTQR7xhDUmTcTKgsrlcYzSohjrJvOf3X4st/W3AD3oztgiKsDeTPCfbkb 2+s9QWEO7+uOTlzIeI+nLQq/3IYdOKcJzeYNWYpxp/62hC2jCkQEZxpOmzUFCU+vlDkL owwf2nZoON37U91R4FOJoPLmN4FeXRijmkUdK4nBWwkso12q7FDi6cnVxzu47PEwhua2 MFs87Hk2sy6ysyJShoNhaoGygjmEEYNMAFgVLmydCKlioMf6AgGCKbAmknn3bDwYfxv/ 3RN2n2bCCwjX/W0mzLSidyji78Uc8yMiTAatHzzbDQVce3jglR3XOFdpvldhjY3Q+oIa +dFQ== X-Gm-Message-State: AOAM5306QGf2iFwtY2USLOPp33wjy86cEiWSEww82ld2TMBf8nc6l0p+ 1l7DXrlfHq5+8p2YvSSkPdxaFZ8JSpb5Wg== X-Google-Smtp-Source: ABdhPJz0XSS76/AR5NmI/xxW2nkdblWYpCEGDvUJscjPLvxaKwkwhEf8JWYIbB41Z+Ej6lVcGG2tiA== X-Received: by 2002:a17:902:8c8f:b0:15e:ab1c:591b with SMTP id t15-20020a1709028c8f00b0015eab1c591bmr26925704plo.171.1652300184321; Wed, 11 May 2022 13:16:24 -0700 (PDT) Original-Received: from smtpclient.apple ([2600:1700:2ec7:8c90:7cb3:8483:26c4:aa26]) by smtp.gmail.com with ESMTPSA id c6-20020a170902c2c600b0015e8d4eb267sm2279006pla.177.2022.05.11.13.14.34 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 May 2022 13:15:27 -0700 (PDT) In-Reply-To: <83ee10qbk7.fsf@gnu.org> X-Mailer: Apple Mail (2.3696.80.82.1.1) Received-SPF: pass client-ip=2607:f8b0:4864:20::634; envelope-from=casouri@gmail.com; helo=mail-pl1-x634.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:289647 Archived-At: --Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 >=20 > And the timings are in the table below? >=20 > | | | no reuse (now) | reuse | > | 1 | Fontify xdisp.c all at once | 0.01s | 0.01s | > | 2 | Fontify 60 next lines of xdisp.c =C3=9710 | 0.10s | = 0.00s | > | 3 | Fontify 60 next lines till the end | 6.06s | 0.01s | >=20 > If so, what is the significance of the last line in practical use > cases? JIT font-lock never fontifies such large chunks of source > code, it does that in 512-character chunks, which is less than 60 > lines in most cases, and definitely not "till the end". I think that=E2=80=99s just a way to run font-lock enough times without = repeatedly fontifying the same region? >=20 > Also, how much time does it take to do the same with the current > regexp- and syntax-based font-lock, for the same chunks of text? >=20 > We need to examine the use cases and the absolute numbers carefully > before we conclude that any kind of caching is needed and/or > justified. >=20 I redid the benchmark, but without his reuse patch, just to see how much = time is spent on creating query objects. So fortifying 40 lines for 463 = times takes 6.92s (according to Emacs, 7.30s according to the profiler). = That counts to 0.0158s per call to font-lock-region, of which 0.0104s is = spent on creating the query object. That seems to tell me if we optimize = away the query object creation we can make font-locking very very fast? = And not just font-locking, since using tree-sitter to do anything useful = basically means querying the parsed tree. If we expose "compiled query=E2=80=9D we don=E2=80=99t need to cache = them either. The regex-based font-lock is a lot slower. With the optimization or not = tree-sitter is a win, but we know that already. I have no idea why regex = font-lock ran for 905 loops comparing to 463 for tree-sitter. Maybe I = did something wrong there. Benchmark 3: fontify all of xdisp.c, 40 lines at a time. took 6.92, of which 1.00 is GC (0 gc runs), loop count: 463 font-lock: 7.30s -> 0.015766738660907127 / loop ts_query_new: 4.80s -> 0.010367170626349892s / loop Note: 7.30 is taken from external profiler. Benchmark 3: fontify all of xdisp.c, 40 lines at a time. took 88.28, of which 5.00 is GC (4 gc runs), loop count: 905 font-lock: 88.28s -> 0.1997285067873303 / loop Yuan --Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C Content-Disposition: attachment; filename=tree-sitter-benchmark.el Content-Type: application/octet-stream; x-unix-mode=0644; name="tree-sitter-benchmark.el" Content-Transfer-Encoding: 7bit ;;; tree-sitter-benchmark.el -*- lexical-binding: t; -*- (require 'treesit) (setq c-font-lock-settings-1 `((c ,(with-temp-buffer (insert-file-contents-literally "./highlights.scm") ;; make capture names map to a face, any face (goto-char (point-min)) (while (re-search-forward "@[a-z.]+" nil t) (replace-match "@font-lock-string-face" t)) (buffer-substring (point-min) (point-max)))))) (with-temp-buffer (treesit-get-parser-create 'c) (setq-local treesit-font-lock-defaults '((c-font-lock-settings-1))) (font-lock-mode) (treesit-font-lock-enable) (insert-file-contents "xdisp.c") (let ((count 0)) (apply #'message "Benchmark 3: fontify all of xdisp.c, 40 lines at a time.\ took %2.2f, of which %2.2f is GC (%d gc runs), loop count: %s" (append (benchmark-run 1 (while (/= (point-max) (point)) (font-lock-fontify-region (point) (line-end-position 40)) (forward-line 40) (cl-incf count))) (list count))))) (with-temp-buffer (treesit-get-parser-create 'c) (c-mode) (insert-file-contents "xdisp.c") (let ((count 0)) (apply #'message "Benchmark 3: fontify all of xdisp.c, 40 lines at a time.\ took %2.2f, of which %2.2f is GC (%d gc runs), loop count: %s" (append (benchmark-run 1 (while (/= (point-max) (point)) (font-lock-fontify-region (point) (line-end-position 40)) (forward-line 40) (cl-incf count))) (list count))))) --Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C Content-Disposition: attachment; filename=highlights.scm Content-Type: application/octet-stream; x-unix-mode=0644; name="highlights.scm" Content-Transfer-Encoding: 7bit ;; Copied from elisp-tree-sitter/langs/queries/c ["break" "case" "const" "continue" "default" "do" "else" "enum" "extern" "for" "if" "inline" "return" "sizeof" "static" "struct" "switch" "typedef" "union" "volatile" "while" "..."] @keyword [(storage_class_specifier) (type_qualifier)] @keyword ["#define" "#else" "#endif" "#if" "#ifdef" "#ifndef" "#include" (preproc_directive)] @function.macro ((["#ifdef" "#ifndef"] (identifier) @constant)) ["+" "-" "*" "/" "%" "~" "|" "&" "<<" ">>" "!" "||" "&&" "->" "==" "!=" "<" ">" "<=" ">=" "=" "+=" "-=" "*=" "/=" "%=" "|=" "&=" "++" "--" ] @operator (conditional_expression ["?" ":"] @operator) ["(" ")" "[" "]" "{" "}"] @punctuation.bracket ["." "," ";"] @punctuation.delimiter ;;; ---------------------------------------------------------------------------- ;;; Functions. (call_expression function: [(identifier) @function.call (field_expression field: (_) @method.call)]) (function_declarator declarator: [(identifier) @function (parenthesized_declarator (pointer_declarator (field_identifier) @function))]) (preproc_function_def name: (identifier) @function) ;;; ---------------------------------------------------------------------------- ;;; Types. [(primitive_type) (sized_type_specifier)] @type.builtin (type_identifier) @type ;;; ---------------------------------------------------------------------------- ;;; Variables. (declaration declarator: [(identifier) @variable (_ (identifier) @variable)]) (parameter_declaration declarator: [(identifier) @variable.parameter (_ (identifier) @variable.parameter)]) (init_declarator declarator: [(identifier) @variable (_ (identifier) @variable)]) (assignment_expression left: [(identifier) @variable (field_expression field: (_) @variable) (subscript_expression argument: (identifier) @variable) (pointer_expression (identifier) @variable)]) (update_expression argument: (identifier) @variable) (preproc_def name: (identifier) @variable.special) (preproc_params (identifier) @variable.parameter) ;;; ---------------------------------------------------------------------------- ;;; Properties. (field_declaration declarator: [(field_identifier) @property.definition (pointer_declarator (field_identifier) @property.definition) (pointer_declarator (pointer_declarator (field_identifier) @property.definition))]) (enumerator name: (identifier) @property.definition) (field_identifier) @property ;;; ---------------------------------------------------------------------------- ;;; Misc. ;; Doesn't work right now: results in error Query pattern is malformed: "Cannot ;; find captured node", "^[A-Z_][A-Z_\\d]*$", "A predicate can only refer to ;; captured nodes in the same pattern" ;; ((identifier) @constant ;; (.match @constant "^[A-Z_][A-Z_\\d]*$")) [(null) (true) (false)] @constant.builtin [(number_literal) (char_literal)] @number (statement_identifier) @label ;;; ---------------------------------------------------------------------------- ;;; Strings and comments. (comment) @comment [(string_literal) (system_lib_string)] @string --Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii --Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C--