From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ergus Newsgroups: gmane.emacs.devel Subject: Re: cc-mode fontification feels random Date: Sat, 12 Jun 2021 17:04:02 +0200 Message-ID: <20210612150402.5z2xpswor4dkxh7o@Ergus> References: <837dj09p0e.fsf@gnu.org> <20210611232535.b4dyu3a2yxvdixys@Ergus> <87a6nw6jtf.fsf@telefonica.net> <20210612010844.45noqsg7wveeo3yw@Ergus> <83sg1n8t71.fsf@gnu.org> <20210612110103.u6kuh3d5vahxmxlt@Ergus> <83fsxn8gue.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="7598"; mail-complaints-to="usenet@ciao.gmane.io" Cc: ofv@wanadoo.es, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jun 12 17:05:46 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ls5Cj-0001lr-By for ged-emacs-devel@m.gmane-mx.org; Sat, 12 Jun 2021 17:05:45 +0200 Original-Received: from localhost ([::1]:38766 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ls5Ch-0006PL-QB for ged-emacs-devel@m.gmane-mx.org; Sat, 12 Jun 2021 11:05:43 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53918) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ls5BL-0005iS-Vd for emacs-devel@gnu.org; Sat, 12 Jun 2021 11:04:20 -0400 Original-Received: from sonic316-11.consmr.mail.bf2.yahoo.com ([74.6.130.121]:36762) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ls5BH-0006pO-OH for emacs-devel@gnu.org; Sat, 12 Jun 2021 11:04:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1623510250; bh=NXZ6IDrzLbYlcjA8rz5VytluvJZYKp885YB81OzNx0s=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From:Subject:Reply-To; b=sm98VfavlZyHj/5Atrz4xjwoqVNqb2qt4n+6AZwqxejAMNEK47U462aZD609E7AzVDim9LkcQmh8OeC4zvaLQwbGb0lvfDLI3wUOT4Ep8pnzpbGzqPHM4vqaKzct5MjE/pEvmrIgqv6GNBeOrNN/33k1PWp9uvmxN0kkeiCDbPlek43ZKsE0hDc9POnOURItZDK5OBQNaZC3rZ2aEFN8OQsDYdhLqeLgyePfFk39XIK9Ecl6MU7NYiqaVFyuMI1rWVDxnNBPlYClen6cpqztcEY7JRVw+K4YpdWvRiunzekFhDmykus5btdISb1EvHxfvfdo7oEYSSHeyGO4NFf7VQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1623510250; bh=ZYib0n+ZSEWGIU8aQZbv0xg53WnlXfJAplZaauDlRAv=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=B9bJJ1TJWQT2143ZWX8ClhKnjVQb2o4lmmswK1AV7jEqx48nYVjMbPSjBPTyBxCBDq91Wz+TDUwlMMOWtSUQvCW5ojd5v2HlAA1SQo1UJzXu+8wxegY8L8juw+d+nD+97AEo2y+WfZGZlpLzgXWTaIxUNYCcgKW/LQlz+YuRg0mzhUKbmmN+f1VGIV0vW9cYZcU1FBODqCh17CF0USYbz4SrHV7veqdVQZ+idI2/UonXrSo32qZliewvcRknlQr5k7cR3iYmXHK5ECq2M/seS81lZiOSNijJSfFevtXndPaL6I+SGRG5AEkciW5fg+HE1YRr6X1vy7LtNyj6RvZZGg== X-YMail-OSG: ILfDl6QVM1l6Cum5jAD5OeQ0g8T5P9N7T1O_GFgLSxU0urHCmArM7ZSotQkagnB OadcXXu0dEOcORXG37evPAg8cytZmoXZDsaHRVOZ6ain7tHO2kh5hDIFuLKQffC27zf8FO_9sqH1 x7HeRcx8hcG9gZfgKMOTpjz5wuPeDyHyyTwfC1iMorl2mIlfMZxZnip79ZF1KLmUYtwMbuK83p36 BVHQMbuqAnTUBKWRon.IOyeQy8lxSznAeTCLMQZ25QO1iMnMsYHjHGKkHYoESm8EIiS4YBknDV54 gOUQvmguQjHisHVt1Wsut.7FrPlXeANUO2HwuWYyZwY.ijDtfXIov5Ik5PEg7ICCvJUVL2yEqT5z q2GFKFSRKmMP27sKM_dt7MYl_0Mq7wzFThDhWKoe5gInxUuJIbZ.bP2VAt34d49PRCPyDFnPLHeW dapO1n4drVMWcxIm_p2yUeiobh8pFoXSZxUccz6lQSUbs6UGiFU.Yg8lXWiGh_tArGC.SmTZRx3U TUpwWuVZNWqJu.J6E_Y4zmM22IauREGCgcGdBzGO3ctGQ_LCYa74mP82bxvQcCwzn.oX6SEIP.ps WDyna5K2bylQ1xSpXCiuk2a7cKDJRbqeZfCGg53OMicQ3K5hCUrAGP6i3BcEP05SZChfcqls3OFd Iup5DTglV0LQOqpbkbN6BMyNbJlb_Jsb2koPq4d7hc1Xfc8NJ25bViKWQ9..eAXRNZPST77s.ZqM IvKgdfywmZsWvD6yz6k0RlUHnu0BAmS1224D9xhiKKNyt2D8uMfUcxkvARwTlU_8cN7Bd40Svewy VUdxKdTpGG8xaQc4H6KEm0zIeux7sWiisuMmdG7WlN X-Sonic-MF: Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic316.consmr.mail.bf2.yahoo.com with HTTP; Sat, 12 Jun 2021 15:04:10 +0000 Original-Received: by kubenode551.mail-prod1.omega.ir2.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 529fb809505e2e8bf9837283a0ccdd57; Sat, 12 Jun 2021 15:04:06 +0000 (UTC) Content-Disposition: inline In-Reply-To: <83fsxn8gue.fsf@gnu.org> X-Mailer: WebService/1.1.18368 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.aol Received-SPF: pass client-ip=74.6.130.121; envelope-from=spacibba@aol.com; helo=sonic316-11.consmr.mail.bf2.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:270766 Archived-At: On Sat, Jun 12, 2021 at 02:25:45PM +0300, Eli Zaretskii wrote: >> Date: Sat, 12 Jun 2021 13:01:03 +0200 >> From: Ergus >> Cc: ofv@wanadoo.es, emacs-devel@gnu.org >> >> If I understand something about our cc-mode functionalities (and many of >> those functionalities we don't want to loose like indentation and code >> navigation). Probably the "right" way to use tree-sitter (maybe Alan >> wants give a more precise technical description) is not only fontify but >> use the tree information to add contextual information to the text >> (something that I think cc-mode does.) And then let font-lock do the >> magic. >> >> The tree-sitter tree is basically contextual information, and (for >> example) if we have processed the whole buffer and we already have the >> tree, then scrolling won't need to parse anything, adding or removing >> text is a localized modification, so with the previous tree we can >> re-parse only the modified region. The choice may be then if we >> propertize the text of the whole buffer or just the visible region OR if >> we want to "propertize on demand". >> >> This will save us from the hard parsing in cc-mode to fontify "on the >> fly". > >I'm not sure I understand what you are suggesting. Can you describe >your suggestion in terms of 'face' text properties and the 'fontified' >property, and explain how those should fit into the existing redisplay >mechanisms? > cc-mode have something similar to the tree sitter properties. It is the information we get in c-syntactic-context or c-langelem-sym. I don't actually know where is this information stored now by cc-mode. But right now it is set in the text just by regions (visible ones) that are parsed on demand (that's why they impact commands like scrolling). So there are two operation, 1) the parsing and then 2) setting this properties to the text (or where they are stored somehow). In the other hand when we want to get things like c-defun-name-and-limits we also search on the fly with functions like c-declaration-limits-1 or c-go-list-backward, that search on the fly and try to recognize or find the contextual information. With tree sitter on the other hand: suppose we have a buffer like: int main() { int i = 5; return 0; } The tree sitter parser returns a tree that may be represented like: (translation_unit (function_definition type: (primitive_type) declarator: (function_declarator declarator: (identifier) parameters: (parameter_list)) body: (compound_statement (declaration type: (primitive_type) declarator: (init_declarator declarator: (identifier) value: (number_literal))) (return_statement (number_literal))))) This tree can be traversed, accessed and recalculated very fast; but after a change, it can be updated even faster and only by sections if we know the rest haven't change. When we have a visible region (suppose that we only see the line: int i = 5; because our screen is very small for this example) as we know where that line starts in the buffer then we can find the nearest node that extends in this region using functions like: ts_node_first_child_for_byte ts_node_descendant_for_byte_range ts_node_named_descendant_for_byte_range the design choice comes here. 1) We can iterate (or traverse) the "usefull" subtree over them to convert that information in text properties directly (using ts_tree_cursor_current_field_id). But If I remember correctly that could have some implications in redisplay... right?. Even when we modify properties that are not visible or belong to an outer node. 2) We never convert the tree information into properties (as we know them in the text now), but just use the ts_tree_cursor_* set of functions to access the information and tell to the display engine to use some faces for it. So in the lisp side instead of accessing stored information in the properties we just call a wrapper around tree-sitter C functions. ---- The first approach may be probably simpler to implement, but less optimal because of the translation between C-Lisp types and adding properties constantly on every update adds extra work on the lisp side. This may be optimized a bit using for example ts_tree_get_changed_ranges. The second approach may require a bit more of work, but will solve the issue of indentation and code navigation for all the modes with a common pattern and a single api. While the display engine could access directly to all the information from C to C. The key difference may be that (for example) basic commands like: up-list 1) with the first approach will search on the buffer for text properties changes, syntax-ppss and so on. 2) with the second one will just call ts_node_parent and go to ts_node_start_byte. >> > I don't >> >really care if TS actually processes a much larger chunk of text, if >> >it does that quickly enough, but processing the resulting faces will >> >take time on the Emacs side, and that is better avoided. >> >> But then we won't get all the contextual information we need for >> indentation, code navigation or fold the code right? > >Why not? > translating also that information may be a lot of work too. >> I see two approaches here: >> >> 1) add the tree-sitter properties/faces to the buffer text (fully or >> partially on the visible regions) >> >> 2) use the tree-sitter information directly from the tree and add the >> visible properties from there. >> >> This second one will require a more complete api of tree-sitter >> functions exposed to elisp, but in my opinion it worth it in accuracy, >> speed and simplicity (a single API to rule them all). And to support >> many languages we don't actually have like rust or the fancy C++ > 11. > >Why can't we have both? The information you are talking about, which >is needed by Emacs features other than fontification, can be used by >those other Emacs features when needed. You seem to be saying that >these two alternatives are mutually-exclusive, but you didn't explain >why. > They are not exclusive, but redundant. If we use the current infrastructure then we will spend a lot of time translating properties and contextual information. And avoiding to have part of them outdated. Navigation and indentation will continue to be based on properties we need to set and update all the time to make the match one by one. Basically we will be duplicating the information that is already in the tree. Creating many list objects, overloading the gc, and so on. So we potentially will save only the parsing time. The first one may work with a very primitive api to handle and iterate the tree-sitter tree. The second one will require to use cursors, finders and some other features from the tree-sitter API; improving performance for sure but replacing a lot of the work lisp is doing now. The second approach will probably make happy the C developers more than the Lisp ones.