From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Yuan Fu <casouri@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Tree-sitter integration on feature/tree-sitter
Date: Wed, 11 May 2022 13:14:33 -0700
Message-ID: <8F6A43D1-D1EA-4602-A245-627DB7960FC2@gmail.com>
References: <87y1zabmbt.fsf@gmail.com>
 <5F186EBD-CD21-422B-8B4F-0D5424173334@gmail.com> <875ymdwf76.fsf@gmail.com>
 <011DA1A3-0FA8-4449-878A-FD6B336B0F1B@gmail.com> <8735hhw75p.fsf@gmail.com>
 <83czgks4ss.fsf@gnu.org> <87wnesuw63.fsf@gmail.com> <83pmkkqhft.fsf@gnu.org>
 <87tu9wukbt.fsf@gmail.com> <83ee10qbk7.fsf@gnu.org>
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.80.82.1.1\))
Content-Type: multipart/mixed;
 boundary="Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C"
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="36221"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: Yoav Marco <yoavm448@gmail.com>,
 emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed May 11 22:17:49 2022
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1nosmJ-00095G-6W
	for ged-emacs-devel@m.gmane-mx.org; Wed, 11 May 2022 22:17:47 +0200
Original-Received: from localhost ([::1]:56228 helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>)
	id 1nosmH-0006qw-Su
	for ged-emacs-devel@m.gmane-mx.org; Wed, 11 May 2022 16:17:45 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60024)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <casouri@gmail.com>) id 1nosl5-0005oM-RO
 for emacs-devel@gnu.org; Wed, 11 May 2022 16:16:34 -0400
Original-Received: from mail-pl1-x634.google.com ([2607:f8b0:4864:20::634]:40575)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <casouri@gmail.com>)
 id 1nosl0-0000Rp-FV; Wed, 11 May 2022 16:16:31 -0400
Original-Received: by mail-pl1-x634.google.com with SMTP id i1so2935885plg.7;
 Wed, 11 May 2022 13:16:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=from:message-id:mime-version:subject:date:in-reply-to:cc:to
 :references; bh=Dr9grCpWfpdJKHJaMOIvaOSFtGbkUT5Vgrbe9xCZ7Ks=;
 b=NE+1AVZrYHOO+XV23y8CUyt+FQia/x/8IATzu+qGio0QAqt/B490LMCfMq8yudcnHU
 e51SLt8hEclJMxuO1nuIuAFmyF1v4ju2p4TEwK9sAXxvfeInFTUSfNCMXoBuU7kAkuED
 XjWXoFKQq+y0Ga1Ta+3wu69mfhYtTSy3I4Crzc3bSS6i06yizWoyYAJS4tDQtf6w9fpa
 UqgztwLt0nnSLwr2R/Z1Lh0KFvlAo/82uBYbnb8E4aDtZs/Fva7w7uyg92MRaGtrl+rh
 j+yD2lSfA+PuITliG6HFmIifcP9jISs60wbs9IbEVNfDToTpWBH8XiNxMwaZ8a0eG0Wg
 9HWw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:message-id:mime-version:subject:date
 :in-reply-to:cc:to:references;
 bh=Dr9grCpWfpdJKHJaMOIvaOSFtGbkUT5Vgrbe9xCZ7Ks=;
 b=dlAoXzs58nTQR7xhDUmTcTKgsrlcYzSohjrJvOf3X4st/W3AD3oztgiKsDeTPCfbkb
 2+s9QWEO7+uOTlzIeI+nLQq/3IYdOKcJzeYNWYpxp/62hC2jCkQEZxpOmzUFCU+vlDkL
 owwf2nZoON37U91R4FOJoPLmN4FeXRijmkUdK4nBWwkso12q7FDi6cnVxzu47PEwhua2
 MFs87Hk2sy6ysyJShoNhaoGygjmEEYNMAFgVLmydCKlioMf6AgGCKbAmknn3bDwYfxv/
 3RN2n2bCCwjX/W0mzLSidyji78Uc8yMiTAatHzzbDQVce3jglR3XOFdpvldhjY3Q+oIa
 +dFQ==
X-Gm-Message-State: AOAM5306QGf2iFwtY2USLOPp33wjy86cEiWSEww82ld2TMBf8nc6l0p+
 1l7DXrlfHq5+8p2YvSSkPdxaFZ8JSpb5Wg==
X-Google-Smtp-Source: ABdhPJz0XSS76/AR5NmI/xxW2nkdblWYpCEGDvUJscjPLvxaKwkwhEf8JWYIbB41Z+Ej6lVcGG2tiA==
X-Received: by 2002:a17:902:8c8f:b0:15e:ab1c:591b with SMTP id
 t15-20020a1709028c8f00b0015eab1c591bmr26925704plo.171.1652300184321; 
 Wed, 11 May 2022 13:16:24 -0700 (PDT)
Original-Received: from smtpclient.apple ([2600:1700:2ec7:8c90:7cb3:8483:26c4:aa26])
 by smtp.gmail.com with ESMTPSA id
 c6-20020a170902c2c600b0015e8d4eb267sm2279006pla.177.2022.05.11.13.14.34
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 11 May 2022 13:15:27 -0700 (PDT)
In-Reply-To: <83ee10qbk7.fsf@gnu.org>
X-Mailer: Apple Mail (2.3696.80.82.1.1)
Received-SPF: pass client-ip=2607:f8b0:4864:20::634;
 envelope-from=casouri@gmail.com; helo=mail-pl1-x634.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
 <mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org
Original-Sender: "Emacs-devel"
 <emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org>
Xref: news.gmane.io gmane.emacs.devel:289647
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/289647>


--Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

>=20
> And the timings are in the table below?
>=20
>  |   |                                      | no reuse (now) | reuse |
>  | 1 | Fontify xdisp.c all at once          |          0.01s | 0.01s |
>  | 2 | Fontify 60 next lines of xdisp.c =C3=9710 |          0.10s | =
0.00s |
>  | 3 | Fontify 60 next lines till the end   |          6.06s | 0.01s |
>=20
> If so, what is the significance of the last line in practical use
> cases?  JIT font-lock never fontifies such large chunks of source
> code, it does that in 512-character chunks, which is less than 60
> lines in most cases, and definitely not "till the end".

I think that=E2=80=99s just a way to run font-lock enough times without =
repeatedly fontifying the same region?

>=20
> Also, how much time does it take to do the same with the current
> regexp- and syntax-based font-lock, for the same chunks of text?
>=20
> We need to examine the use cases and the absolute numbers carefully
> before we conclude that any kind of caching is needed and/or
> justified.
>=20

I redid the benchmark, but without his reuse patch, just to see how much =
time is spent on creating query objects. So fortifying 40 lines for 463 =
times takes 6.92s (according to Emacs, 7.30s according to the profiler). =
That counts to 0.0158s per call to font-lock-region, of which 0.0104s is =
spent on creating the query object. That seems to tell me if we optimize =
away the query object creation we can make font-locking very very fast? =
And not just font-locking, since using tree-sitter to do anything useful =
basically means querying the parsed tree.

If we expose "compiled query=E2=80=9D we don=E2=80=99t need to cache =
them either.

The regex-based font-lock is a lot slower. With the optimization or not =
tree-sitter is a win, but we know that already. I have no idea why regex =
font-lock ran for 905 loops comparing to 463 for tree-sitter. Maybe I =
did something wrong there.

Benchmark 3: fontify all of xdisp.c, 40 lines at a time.
took 6.92, of which 1.00 is GC (0 gc runs), loop count: 463

font-lock:    7.30s -> 0.015766738660907127 / loop
ts_query_new: 4.80s -> 0.010367170626349892s / loop

Note: 7.30 is taken from external profiler.

Benchmark 3: fontify all of xdisp.c, 40 lines at a time.
took 88.28, of which 5.00 is GC (4 gc runs), loop count: 905

font-lock: 88.28s -> 0.1997285067873303 / loop

Yuan


--Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C
Content-Disposition: attachment;
	filename=tree-sitter-benchmark.el
Content-Type: application/octet-stream;
	x-unix-mode=0644;
	name="tree-sitter-benchmark.el"
Content-Transfer-Encoding: 7bit

;;; tree-sitter-benchmark.el -*- lexical-binding: t; -*-

(require 'treesit)
(setq c-font-lock-settings-1
      `((c
         ,(with-temp-buffer
            (insert-file-contents-literally "./highlights.scm")
            ;; make capture names map to a face, any face
            (goto-char (point-min))
            (while (re-search-forward "@[a-z.]+" nil t)
              (replace-match "@font-lock-string-face" t))
            (buffer-substring (point-min) (point-max))))))

(with-temp-buffer
  (treesit-get-parser-create 'c)
  (setq-local treesit-font-lock-defaults
              '((c-font-lock-settings-1)))
  (font-lock-mode)
  (treesit-font-lock-enable)
  (insert-file-contents "xdisp.c")
  (let ((count 0))
    (apply #'message
           "Benchmark 3: fontify all of xdisp.c, 40 lines at a time.\
  took %2.2f, of which %2.2f is GC (%d gc runs), loop count: %s"
           (append
            (benchmark-run 1
              (while (/= (point-max) (point))
                (font-lock-fontify-region (point) (line-end-position 40))
                (forward-line 40)
                (cl-incf count)))
            (list count)))))

(with-temp-buffer
  (treesit-get-parser-create 'c)
  (c-mode)
  (insert-file-contents "xdisp.c")
  (let ((count 0))
    (apply #'message
           "Benchmark 3: fontify all of xdisp.c, 40 lines at a time.\
  took %2.2f, of which %2.2f is GC (%d gc runs), loop count: %s"
           (append
            (benchmark-run 1
              (while (/= (point-max) (point))
                (font-lock-fontify-region (point) (line-end-position 40))
                (forward-line 40)
                (cl-incf count)))
            (list count)))))

--Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C
Content-Disposition: attachment;
	filename=highlights.scm
Content-Type: application/octet-stream;
	x-unix-mode=0644;
	name="highlights.scm"
Content-Transfer-Encoding: 7bit

;; Copied from elisp-tree-sitter/langs/queries/c
["break"
 "case"
 "const"
 "continue"
 "default"
 "do"
 "else"
 "enum"
 "extern"
 "for"
 "if"
 "inline"
 "return"
 "sizeof"
 "static"
 "struct"
 "switch"
 "typedef"
 "union"
 "volatile"
 "while"
 "..."] @keyword

[(storage_class_specifier)
 (type_qualifier)] @keyword

["#define"
 "#else"
 "#endif"
 "#if"
 "#ifdef"
 "#ifndef"
 "#include"
 (preproc_directive)] @function.macro

((["#ifdef" "#ifndef"] (identifier) @constant))

["+" "-" "*" "/" "%"
 "~" "|" "&" "<<" ">>"
 "!" "||" "&&"
 "->"
 "==" "!=" "<" ">" "<=" ">="
 "=" "+=" "-=" "*=" "/=" "%=" "|=" "&="
 "++" "--"
] @operator

(conditional_expression ["?" ":"] @operator)

["(" ")" "[" "]" "{" "}"] @punctuation.bracket

["." "," ";"] @punctuation.delimiter

;;; ----------------------------------------------------------------------------
;;; Functions.

(call_expression
 function: [(identifier) @function.call
            (field_expression field: (_) @method.call)])

(function_declarator
 declarator: [(identifier) @function
              (parenthesized_declarator
               (pointer_declarator (field_identifier) @function))])

(preproc_function_def
 name: (identifier) @function)

;;; ----------------------------------------------------------------------------
;;; Types.

[(primitive_type)
 (sized_type_specifier)] @type.builtin

(type_identifier) @type

;;; ----------------------------------------------------------------------------
;;; Variables.

(declaration declarator: [(identifier) @variable
                          (_ (identifier) @variable)])

(parameter_declaration declarator: [(identifier) @variable.parameter
                                    (_ (identifier) @variable.parameter)])

(init_declarator declarator: [(identifier) @variable
                              (_ (identifier) @variable)])

(assignment_expression
 left: [(identifier) @variable
        (field_expression field: (_) @variable)
        (subscript_expression argument: (identifier) @variable)
        (pointer_expression (identifier) @variable)])

(update_expression
 argument: (identifier) @variable)

(preproc_def name: (identifier) @variable.special)

(preproc_params
 (identifier) @variable.parameter)

;;; ----------------------------------------------------------------------------
;;; Properties.

(field_declaration
 declarator: [(field_identifier) @property.definition
              (pointer_declarator (field_identifier) @property.definition)
              (pointer_declarator (pointer_declarator (field_identifier) @property.definition))])

(enumerator name: (identifier) @property.definition)

(field_identifier) @property

;;; ----------------------------------------------------------------------------
;;; Misc.

;; Doesn't work right now: results in error Query pattern is malformed: "Cannot
;; find captured node", "^[A-Z_][A-Z_\\d]*$", "A predicate can only refer to
;; captured nodes in the same pattern"
;; ((identifier) @constant
;;  (.match @constant "^[A-Z_][A-Z_\\d]*$"))

[(null) (true) (false)] @constant.builtin

[(number_literal)
 (char_literal)] @number

(statement_identifier) @label

;;; ----------------------------------------------------------------------------
;;; Strings and comments.

(comment) @comment

[(string_literal)
 (system_lib_string)] @string

--Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=us-ascii




--Apple-Mail=_600541BC-D99C-469F-BD84-9007362C012C--