From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?UTF-8?Q?Jostein_Kj=c3=b8nigsen?= Newsgroups: gmane.emacs.devel Subject: Tree-sitter integration on feature/tree-sitter (severe performance issues together with linum-mode) Date: Mon, 15 Aug 2022 14:32:24 +0200 Message-ID: <7e24d0aa-9980-6204-5064-5a92963ae7bd@secure.kjonigsen.net> Reply-To: jostein@kjonigsen.net Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="------------bsriZ9b0MB0pVzYhv2cTJPyH" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="33644"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Cc: Theodor Thornhill To: Yuan Fu , "Ergus via Emacs development discussions." , =?UTF-8?B?VHXhuqVuLUFuaCBOZ3V54buFbg==?= , Markus Triska Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Aug 15 14:33:32 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oNZHe-0008Wp-FK for ged-emacs-devel@m.gmane-mx.org; Mon, 15 Aug 2022 14:33:32 +0200 Original-Received: from localhost ([::1]:44160 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oNZHd-00059s-EN for ged-emacs-devel@m.gmane-mx.org; Mon, 15 Aug 2022 08:33:29 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40952) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oNZGi-0004Rm-Sl for emacs-devel@gnu.org; Mon, 15 Aug 2022 08:32:33 -0400 Original-Received: from out2-smtp.messagingengine.com ([66.111.4.26]:47791) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oNZGg-0001UF-4O for emacs-devel@gnu.org; Mon, 15 Aug 2022 08:32:32 -0400 Original-Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id E14695C00D3; Mon, 15 Aug 2022 08:32:27 -0400 (EDT) Original-Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Mon, 15 Aug 2022 08:32:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= secure.kjonigsen.net; h=cc:cc:content-type:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:reply-to:sender :subject:subject:to:to; s=fm3; t=1660566747; x=1660653147; bh=8/ 8NAHAbjNGpFZBb1FbDcJ0Wb2+GLrdQ0JHj+mLwvyA=; b=fzKiIXeOpzneFCzvCr +hv6RaN2ShV+wCGwNCGZbQntnBlrbEFBUQI7HPFbaY8DsI/OuWiLbu36qO2isKEo TKMZm+OxAF6qPKVJKgRR+7XhW0nHT4+DMl5npRrAL9ycEMrORdN8NloLalyYT0EW ulbiJrMPmNZl8GMA/8KcqnfwknfUiXLCtIoQ8g8cu/BtSF4bvvPOUq4fg/dwLTxg NZ9QaFZtaNVvxQ+pq2+1D8cCVphDk/l3PcQQQLPsChPp1X86weMoZ1tQzsldSkas iXnN0mzzsw815ldg1m/3V4KBxvu4nDLGZdDMxNQdcfaC09+eMqBHeVL0vHO4Td/9 sIRQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:message-id:mime-version :reply-to:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1660566747; x=1660653147; bh=8/8NAHAbjNGpFZBb1FbDcJ0Wb2+GLrdQ0JH j+mLwvyA=; b=tKbtpMdohTzjJK196dGK93JbRP9/jVm1C3rGAIQ5KL0PLDNfInX NwSWELX7WR1QZJCtJQHduO+4XdVW84kYYu17UnlSb7dOQA40LI+r01Wi0/yB7J1W cQUHBD7ncwBDpdGn/0aPxdTsjlhV6wF19Z307aYuywKRNPj2fwZdIEomSCQ6Lf0Q ss5ai1kIKet5CprM1zpVAAvY7qwdX6BKP0yw2zkMxcSD4fNk35cWndGvbiIawLp5 JNTNcwd+SwmbTaUPapLqsH3TwaWmlbTVgl+jbQtJRGH0JHahaEicZVwP9vxHw1rO 7jC76HE+KS95bAeckINmNsYLI+uxrhatulQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrvdehvddgheefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurheptgfkffggfghrvfevhffusegrtderredtfeejnecuhfhrohhmpeflohhsthgv ihhnpgfmjhppnhhighhsvghnuceojhhoshhtvghinhesshgvtghurhgvrdhkjhhonhhigh hsvghnrdhnvghtqeenucggtffrrghtthgvrhhnpefhveetteevgfefkeehfeeikedvtefh tddvleeuheeukeevueekgeegjeefleduvdenucffohhmrghinhepghhithhhuhgsrdgtoh hmpdhsrhdrhhhtpdhkjhhnihhgshgvnhdrnhhonecuvehluhhsthgvrhfuihiivgeptden ucfrrghrrghmpehmrghilhhfrhhomhepjhhoshhtvghinhesshgvtghurhgvrdhkjhhonh highhsvghnrdhnvght X-ME-Proxy: Feedback-ID: ib2f84088:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 15 Aug 2022 08:32:26 -0400 (EDT) Content-Language: en-GB Received-SPF: pass client-ip=66.111.4.26; envelope-from=jostein@secure.kjonigsen.net; helo=out2-smtp.messagingengine.com X-Spam_score_int: -26 X-Spam_score: -2.7 X-Spam_bar: -- X-Spam_report: (-2.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:293478 Archived-At: This is a multi-part message in MIME format. --------------bsriZ9b0MB0pVzYhv2cTJPyH Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hey everyone. Sorry for writing one of those long emails. For those who wants to cut to the brief, the executive summary seems to be that using tree-sitter for fontification can give much lower performance than expected, and used together with linum-mode for line numbering will cause severe performance-degradation. How bad this issue is ofcourse depends on the major-mode involved, and how complex the tree-sitter grammars used by that mode is. These findings are found in major-modes not part of core Emacs, but I still think it could provide valuable feedback wrt the current state of the feature/tree-sitter code and what can be improved. As a reference I have 2 major-mode demonstrates this: * csharp-mode [1] * typescript-mode [2] Both of these major modes are either implemented (or in the process of being implemented) for the following scenarios: * plain elisp (or cc-mode) * using the emacs-tree-sitter package/library written by Tuan-Anh Nhuyen[3], compatible with "any" Emacs. * using native Emacs tree-sitter based on the git feature/tree-sitter branch by Yuan Fu, requires Emacs-build from git. For csharp-mode we've been successfully been able to pivot from elisp/cc-mode to emacs-tree-sitter with great success. The code is simpler, and performance is perfectly acceptable, and long-standing bugs where fixed in the process. For typescript-mode we tried to do the same[4], but learnt about Yuan Fu's work before completing. The result instead was a new major-mode depending on native Emacs tree-sitter support[5]. This has also worked out well enough for me to use it as my "daily driver". Motivated by that success, I've tried to rewrite csharp-mode to also use native Emacs tree-sitter support[6]. And while porting the code seems to work, performance for this mode has been VERY far from acceptable. Even on a modern, fast Intel CPU, keystrokes are lagging several seconds behind and it's not really usable. You just have to stop typing and wait for your input to suddenly appear many, many seconds later. This is in great contrast to the csharp-mode implementation which uses Tuan-Anh's library, and quite opposite of what I would expect. While perhaps somewhat naive, I honestly expected "native support" would perform better. Could there be optimizations in Tuan-Anh's library we need to add treesit.el in Emacs? Another thing which made me really notice this issue is that by default I have linum-mode enabled for all prog-mode buffers. And linum-mode -easily- reduces input-performance in tree-sitter mode buffers by a factor of 4 (this has been measured using profile-start, profile-stop and profile-report). The following profiling-report stems from enabling csharp-mode based on native Emacs tree-sitter support, linum-mode and then proceeding to writing a long line with random letters (no need to be valid code).     382,605,711  71% - linum-update-current     382,605,711  71%  - linum-update     382,605,711  71%   - mapc     382,601,487  71%    - linum-update-window     382,176,351  71%     - window-end     382,176,351  71%      - jit-lock-function     382,176,351  71%       - jit-lock-fontify-now     382,176,351  71%        - jit-lock--run-functions     382,176,351  71%         - run-hook-wrapped     382,176,351  71%          - #     382,176,351  71%           - font-lock-fontify-region     382,127,055  71%            + treesit-font-lock-fontify-region          49,296   0%            + font-lock-default-fontify-region          30,616   0%       linum--face-width     137,009,221  25% - command-execute I realize linum-mode has been controversial wrt to performance in the past, but this kind of slow-down had me quite surprised. Disabling linum-mode makes the major-mode borderline usable, but it's still much slower than I know it -can- be (based on Thuan-Anh's library). Can something be done to Yuan's code to make it perform equally to Thuan-Anh's? Are there improvements which can be done to linum-mode to avoid these kinds of issues? I know for sure I'm not qualified to answer those questions, but I think it's definitely something which needs to be looked into and if anyone has anything they want me to provide feedback on though, I will be more than happy test those changes and report back. [1] https://github.com/emacs-csharp/csharp-mode [2] https://github.com/emacs-typescript/typescript.el [3] https://github.com/emacs-tree-sitter/elisp-tree-sitter [4] https://github.com/emacs-typescript/typescript.el/blob/feature/tsx-support/typescript-tree-sitter.el [5] https://git.sr.ht/~theo/tree-sitter-modes/tree/master/item/typescript-mode.el [6] https://git.sr.ht/~jostein/tree-sitter-modes/tree/feature/csharp/item/csharp-mode.el -- Kind regards *Jostein Kjønigsen* jostein@kjonigsen.net 🍵 jostein@gmail.com https://jostein.kjønigsen.no --------------bsriZ9b0MB0pVzYhv2cTJPyH Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Hey everyone.

Sorry for writing one of those long emails.

For those who wants to cut to the brief, the executive summary seems to be that using tree-sitter for fontification can give much lower performance than expected, and used together with linum-mode for line numbering will cause severe performance-degradation.

How bad this issue is ofcourse depends on the major-mode involved, and how complex the tree-sitter grammars used by that mode is. These findings are found in major-modes not part of core Emacs, but I still think it could provide valuable feedback wrt the current state of the feature/tree-sitter code and what can be improved.

As a reference I have 2 major-mode demonstrates this:

  • csharp-mode [1]
  • typescript-mode [2]

Both of these major modes are either implemented (or in the process of being implemented) for the following scenarios:

  • plain elisp (or cc-mode)
  • using the emacs-tree-sitter package/library written by Tuan-Anh Nhuyen[3], compatible with "any" Emacs.
  • using native Emacs tree-sitter based on the git feature/tree-sitter branch by Yuan Fu, requires Emacs-build from git.

For csharp-mode we've been successfully been able to pivot from elisp/cc-mode to emacs-tree-sitter with great success. The code is simpler, and performance is perfectly acceptable, and long-standing bugs where fixed in the process.

For typescript-mode we tried to do the same[4], but learnt about Yuan Fu's work before completing. The result instead was a new major-mode depending on native Emacs tree-sitter support[5]. This has also worked out well enough for me to use it as my "daily driver".

Motivated by that success, I've tried to rewrite csharp-mode to also use native Emacs tree-sitter support[6]. And while porting the code seems to work, performance for this mode has been VERY far from acceptable.

Even on a modern, fast Intel CPU, keystrokes are lagging several seconds behind and it's not really usable. You just have to stop typing and wait for your input to suddenly appear many, many seconds later.

This is in great contrast to the csharp-mode implementation which uses Tuan-Anh's library, and quite opposite of what I would expect. While perhaps somewhat naive, I honestly expected "native support" would perform better. Could there be optimizations in Tuan-Anh's library we need to add treesit.el in Emacs?

Another thing which made me really notice this issue is that by default I have linum-mode enabled for all prog-mode buffers.

And linum-mode -easily- reduces input-performance in tree-sitter mode buffers by a factor of 4 (this has been measured using profile-start, profile-stop and profile-report).

The following profiling-report stems from enabling csharp-mode based on native Emacs tree-sitter support, linum-mode and then proceeding to writing a long line with random letters (no need to be valid code).

    382,605,711  71% - linum-update-current
    382,605,711  71%  - linum-update
    382,605,711  71%   - mapc
    382,601,487  71%    - linum-update-window
    382,176,351  71%     - window-end
    382,176,351  71%      - jit-lock-function
    382,176,351  71%       - jit-lock-fontify-now
    382,176,351  71%        - jit-lock--run-functions
    382,176,351  71%         - run-hook-wrapped
    382,176,351  71%          - #<compiled -0x156ee8ca7e527443>
    382,176,351  71%           - font-lock-fontify-region
    382,127,055  71%            + treesit-font-lock-fontify-region
         49,296   0%            + font-lock-default-fontify-region
         30,616   0%       linum--face-width
    137,009,221  25% - command-execute

I realize linum-mode has been controversial wrt to performance in the past, but this kind of slow-down had me quite surprised. Disabling linum-mode makes the major-mode borderline usable, but it's still much slower than I know it -can- be (based on Thuan-Anh's library).

Can something be done to Yuan's code to make it perform equally to Thuan-Anh's? Are there improvements which can be done to linum-mode to avoid these kinds of issues?

I know for sure I'm not qualified to answer those questions, but I think it's definitely something which needs to be looked into and if anyone has anything they want me to provide feedback on though, I will be more than happy test those changes and report back.

[1] https://github.com/emacs-csharp/csharp-mode
[2] https://github.com/emacs-typescript/typescript.el
[3] https://github.com/emacs-tree-sitter/elisp-tree-sitter
[4] https://github.com/emacs-typescript/typescript.el/blob/feature/tsx-support/typescript-tree-sitter.el
[5] https://git.sr.ht/~theo/tree-sitter-modes/tree/master/item/typescript-mode.el
[6] https://git.sr.ht/~jostein/tree-sitter-modes/tree/feature/csharp/item/csharp-mode.el


--------------bsriZ9b0MB0pVzYhv2cTJPyH--