From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Tree sitter support for C-like languages Date: Mon, 14 Nov 2022 00:35:47 -0800 Message-ID: <09869DDB-2C3D-4064-81B0-0E6902C46396@gmail.com> References: <87tu36em9t.fsf@thornhill.no> <45FD2F78-F15B-488B-9348-A8E298D8AD35@gmail.com> <87v8nmyqqp.fsf@thornhill.no> <834jv4nz2g.fsf@gnu.org> <871qq8hsj1.fsf@thornhill.no> <83iljklzmo.fsf@gnu.org> <87v8nkgcqj.fsf@thornhill.no> <87sfiogcbm.fsf@thornhill.no> <83pmdrkyj7.fsf@gnu.org> <87v8njw5th.fsf@thornhill.no> <83leofkwjm.fsf@gnu.org> <9E9244D3-2EFB-4621-91E0-FC8B8C1C2D52@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_1AD86A2C-D6D8-4DD0-AB3B-F46B5D7149CC" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36695"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , Theodor Thornhill , emacs-devel , monnier@iro.umontreal.ca To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Nov 15 01:15:50 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oujcE-0009Mz-3G for ged-emacs-devel@m.gmane-mx.org; Tue, 15 Nov 2022 01:15:50 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ouip0-0001Gx-5X; Mon, 14 Nov 2022 18:24:58 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouigz-0006LX-7a for emacs-devel@gnu.org; Mon, 14 Nov 2022 18:16:41 -0500 Original-Received: from mail-pg1-x533.google.com ([2607:f8b0:4864:20::533]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ouUwZ-0006J1-Tw; Mon, 14 Nov 2022 03:35:53 -0500 Original-Received: by mail-pg1-x533.google.com with SMTP id f3so3242598pgc.2; Mon, 14 Nov 2022 00:35:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=LzKNX6iXsV4l1vOafKhbxF3nlX+JVulAQYP73lQDheA=; b=IozIr8Wzfq3FJmbYrHaFTXR4omomEg2dq0+yXxYn5rMf6K6/k4qODWBn0GwTBEBSZ6 g4huDmCCioXNgHxbGoZVDZxe68PPqOVGRv47jTCXQiFbdNygIdUUtlwy27k4nw3s0iSz G9S6ADBKHuFFqDuRHnZFOt/Hg86nn0SzpU78UJqZNcWimx0V3NjH+iSnW0gaFai3JW+1 QK5TpsR+bELttoBDIqkA6EA2BEaf0m+TQDgnr5NnIknJlWUEBkNcJVur+wj0xbn8hNHs CaisNX+LhJBA3h72yIz60Da3L9/l7Y/z5xWpKnTIkyh9HF2udjkSDd/MBiqd+V3WEdeG KYvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LzKNX6iXsV4l1vOafKhbxF3nlX+JVulAQYP73lQDheA=; b=R2nZ/FRPwbmG/zgyNITqabBZ3YD6m1o67eu9do3VkValGDTjlyHH1kEbNOQkAXJjli igkzr1a3CXq2UxqhRSMSOTQJMlBqnrLlAmlFp9NukkT/nVaodh+ApipgT3VTb2lYclty SO5xEoKLylyLy1AkUdzM187rIxM1SZicK/YVNWUyVMvcku3iPn4C5460akD3/I8q65WF 2Kgfwkh4Zp/KddSzTKI2ZW3g28Sb8BXdzxyPCyVZjIzQ/u/p5B5IrY9irFD+LSNjsi5N WOoECDCBw9O7sWzxDIrwIHboFHcPZudCwIfh/rTKuZ6yaBjNnGt+ZKOho/4ZKL4jYM1o aYlw== X-Gm-Message-State: ANoB5pkBXoiIUT3cGpJG9BPXKqnmtf71QjbyJ639S+uoRbU9iY6o/LyG rtq1qLpIBUlZVHLCqK53cLA= X-Google-Smtp-Source: AA0mqf47BnmRAa9CwMvt/d/2cY76ILIFIVEuABTGNiSpTfZKIGUGgJa1GvjgFgTvWZQ4H86zoGui3Q== X-Received: by 2002:a05:6a00:4199:b0:56b:bb06:7dd5 with SMTP id ca25-20020a056a00419900b0056bbb067dd5mr13197082pfb.3.1668414949940; Mon, 14 Nov 2022 00:35:49 -0800 (PST) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id k25-20020a634b59000000b0046f7b0f504esm5444430pgl.58.2022.11.14.00.35.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 14 Nov 2022 00:35:49 -0800 (PST) In-Reply-To: X-Mailer: Apple Mail (2.3696.120.41.1.1) Received-SPF: pass client-ip=2607:f8b0:4864:20::533; envelope-from=casouri@gmail.com; helo=mail-pg1-x533.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:299791 Archived-At: --Apple-Mail=_1AD86A2C-D6D8-4DD0-AB3B-F46B5D7149CC Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Nov 13, 2022, at 5:26 PM, Dmitry Gutov wrote: >=20 > On 14.11.2022 02:22, Yuan Fu wrote: >> So if we want the warning face to automatically disappear, we need to = record these warning faces and remember to come back to refontify them = later. We need to know when to refontify them, and know when to stop = trying to refontify them (maybe the error isn=E2=80=99t transient). For = now I think it=E2=80=99s best to just not fontify the error nodes. >=20 > I'm guessing the situation could be the reverse as well: after the = user typing some chars, the warning would need to be *added* rather than = removed, in some cases. That=E2=80=99s a good perspective. But from what I see I think it=E2=80=99= s best not to fontify these =E2=80=9Cerrors=E2=80=9D, at least for C and = C++. Because a lot of things could be marked =E2=80=9Cerror=E2=80=9D in = a C file, like stuff around macros. And in extreme cases the whole file = is marked =E2=80=9Cerror=E2=80=9D, even though if we ignore the error = everything is parsed fine. I guess tree-sitter isn=E2=80=99t happy about = some tiny thing in that file but never the less can parse everything = correctly. I attached that file below. > Any chance tree-sitter gives you some info/callbacks to convey the = earliest node (closes to bob) which has changed after the most recent = buffer modification? So we'd refontify starting with its beginning = position. Yes and no, I explained in more detail in another message. --Apple-Mail=_1AD86A2C-D6D8-4DD0-AB3B-F46B5D7149CC Content-Disposition: attachment; filename=subtree.h Content-Type: application/octet-stream; x-unix-mode=0644; name="subtree.h" Content-Transfer-Encoding: 7bit #ifndef TREE_SITTER_SUBTREE_H_ #define TREE_SITTER_SUBTREE_H_ #ifdef __cplusplus extern "C" { #endif #include #include #include #include "./length.h" #include "./array.h" #include "./error_costs.h" #include "./host.h" #include "tree_sitter/api.h" #include "tree_sitter/parser.h" #define TS_TREE_STATE_NONE USHRT_MAX #define NULL_SUBTREE ((Subtree) {.ptr = NULL}) // The serialized state of an external scanner. // // Every time an external token subtree is created after a call to an // external scanner, the scanner's `serialize` function is called to // retrieve a serialized copy of its state. The bytes are then copied // onto the subtree itself so that the scanner's state can later be // restored using its `deserialize` function. // // Small byte arrays are stored inline, and long ones are allocated // separately on the heap. typedef struct { union { char *long_data; char short_data[24]; }; uint32_t length; } ExternalScannerState; // A compact representation of a subtree. // // This representation is used for small leaf nodes that are not // errors, and were not created by an external scanner. // // The idea behind the layout of this struct is that the `is_inline` // bit will fall exactly into the same location as the least significant // bit of the pointer in `Subtree` or `MutableSubtree`, respectively. // Because of alignment, for any valid pointer this will be 0, giving // us the opportunity to make use of this bit to signify whether to use // the pointer or the inline struct. typedef struct SubtreeInlineData SubtreeInlineData; #define SUBTREE_BITS \ bool visible : 1; \ bool named : 1; \ bool extra : 1; \ bool has_changes : 1; \ bool is_missing : 1; \ bool is_keyword : 1; #define SUBTREE_SIZE \ uint8_t padding_columns; \ uint8_t padding_rows : 4; \ uint8_t lookahead_bytes : 4; \ uint8_t padding_bytes; \ uint8_t size_bytes; #if TS_BIG_ENDIAN #if TS_PTR_SIZE == 32 struct SubtreeInlineData { uint16_t parse_state; uint8_t symbol; SUBTREE_BITS bool unused : 1; bool is_inline : 1; SUBTREE_SIZE }; #else struct SubtreeInlineData { SUBTREE_SIZE uint16_t parse_state; uint8_t symbol; SUBTREE_BITS bool unused : 1; bool is_inline : 1; }; #endif #else struct SubtreeInlineData { bool is_inline : 1; SUBTREE_BITS uint8_t symbol; uint16_t parse_state; SUBTREE_SIZE }; #endif #undef SUBTREE_BITS #undef SUBTREE_SIZE // A heap-allocated representation of a subtree. // // This representation is used for parent nodes, external tokens, // errors, and other leaf nodes whose data is too large to fit into // the inline representation. typedef struct { volatile uint32_t ref_count; Length padding; Length size; uint32_t lookahead_bytes; uint32_t error_cost; uint32_t child_count; TSSymbol symbol; TSStateId parse_state; bool visible : 1; bool named : 1; bool extra : 1; bool fragile_left : 1; bool fragile_right : 1; bool has_changes : 1; bool has_external_tokens : 1; bool has_external_scanner_state_change : 1; bool depends_on_column: 1; bool is_missing : 1; bool is_keyword : 1; union { // Non-terminal subtrees (`child_count > 0`) struct { uint32_t visible_child_count; uint32_t named_child_count; uint32_t node_count; int32_t dynamic_precedence; uint16_t repeat_depth; uint16_t production_id; struct { TSSymbol symbol; TSStateId parse_state; } first_leaf; }; // External terminal subtrees (`child_count == 0 && has_external_tokens`) ExternalScannerState external_scanner_state; // Error terminal subtrees (`child_count == 0 && symbol == ts_builtin_sym_error`) int32_t lookahead_char; }; } SubtreeHeapData; // The fundamental building block of a syntax tree. typedef union { SubtreeInlineData data; const SubtreeHeapData *ptr; } Subtree; // Like Subtree, but mutable. typedef union { SubtreeInlineData data; SubtreeHeapData *ptr; } MutableSubtree; typedef Array(Subtree) SubtreeArray; typedef Array(MutableSubtree) MutableSubtreeArray; typedef struct { MutableSubtreeArray free_trees; MutableSubtreeArray tree_stack; } SubtreePool; void ts_external_scanner_state_init(ExternalScannerState *, const char *, unsigned); const char *ts_external_scanner_state_data(const ExternalScannerState *); bool ts_external_scanner_state_eq(const ExternalScannerState *a, const char *, unsigned); void ts_external_scanner_state_delete(ExternalScannerState *self); void ts_subtree_array_copy(SubtreeArray, SubtreeArray *); void ts_subtree_array_clear(SubtreePool *, SubtreeArray *); void ts_subtree_array_delete(SubtreePool *, SubtreeArray *); void ts_subtree_array_remove_trailing_extras(SubtreeArray *, SubtreeArray *); void ts_subtree_array_reverse(SubtreeArray *); SubtreePool ts_subtree_pool_new(uint32_t capacity); void ts_subtree_pool_delete(SubtreePool *); Subtree ts_subtree_new_leaf( SubtreePool *, TSSymbol, Length, Length, uint32_t, TSStateId, bool, bool, bool, const TSLanguage * ); Subtree ts_subtree_new_error( SubtreePool *, int32_t, Length, Length, uint32_t, TSStateId, const TSLanguage * ); MutableSubtree ts_subtree_new_node(TSSymbol, SubtreeArray *, unsigned, const TSLanguage *); Subtree ts_subtree_new_error_node(SubtreeArray *, bool, const TSLanguage *); Subtree ts_subtree_new_missing_leaf(SubtreePool *, TSSymbol, Length, uint32_t, const TSLanguage *); MutableSubtree ts_subtree_make_mut(SubtreePool *, Subtree); void ts_subtree_retain(Subtree); void ts_subtree_release(SubtreePool *, Subtree); int ts_subtree_compare(Subtree, Subtree); void ts_subtree_set_symbol(MutableSubtree *, TSSymbol, const TSLanguage *); void ts_subtree_summarize(MutableSubtree, const Subtree *, uint32_t, const TSLanguage *); void ts_subtree_summarize_children(MutableSubtree, const TSLanguage *); void ts_subtree_balance(Subtree, SubtreePool *, const TSLanguage *); Subtree ts_subtree_edit(Subtree, const TSInputEdit *edit, SubtreePool *); char *ts_subtree_string(Subtree, const TSLanguage *, bool include_all); void ts_subtree_print_dot_graph(Subtree, const TSLanguage *, FILE *); Subtree ts_subtree_last_external_token(Subtree); const ExternalScannerState *ts_subtree_external_scanner_state(Subtree self); bool ts_subtree_external_scanner_state_eq(Subtree, Subtree); #define SUBTREE_GET(self, name) (self.data.is_inline ? self.data.name : self.ptr->name) static inline TSSymbol ts_subtree_symbol(Subtree self) { return SUBTREE_GET(self, symbol); } static inline bool ts_subtree_visible(Subtree self) { return SUBTREE_GET(self, visible); } static inline bool ts_subtree_named(Subtree self) { return SUBTREE_GET(self, named); } static inline bool ts_subtree_extra(Subtree self) { return SUBTREE_GET(self, extra); } static inline bool ts_subtree_has_changes(Subtree self) { return SUBTREE_GET(self, has_changes); } static inline bool ts_subtree_missing(Subtree self) { return SUBTREE_GET(self, is_missing); } static inline bool ts_subtree_is_keyword(Subtree self) { return SUBTREE_GET(self, is_keyword); } static inline TSStateId ts_subtree_parse_state(Subtree self) { return SUBTREE_GET(self, parse_state); } static inline uint32_t ts_subtree_lookahead_bytes(Subtree self) { return SUBTREE_GET(self, lookahead_bytes); } #undef SUBTREE_GET // Get the size needed to store a heap-allocated subtree with the given // number of children. static inline size_t ts_subtree_alloc_size(uint32_t child_count) { return child_count * sizeof(Subtree) + sizeof(SubtreeHeapData); } // Get a subtree's children, which are allocated immediately before the // tree's own heap data. #define ts_subtree_children(self) \ ((self).data.is_inline ? NULL : (Subtree *)((self).ptr) - (self).ptr->child_count) static inline void ts_subtree_set_extra(MutableSubtree *self, bool is_extra) { if (self->data.is_inline) { self->data.extra = is_extra; } else { self->ptr->extra = is_extra; } } static inline TSSymbol ts_subtree_leaf_symbol(Subtree self) { if (self.data.is_inline) return self.data.symbol; if (self.ptr->child_count == 0) return self.ptr->symbol; return self.ptr->first_leaf.symbol; } static inline TSStateId ts_subtree_leaf_parse_state(Subtree self) { if (self.data.is_inline) return self.data.parse_state; if (self.ptr->child_count == 0) return self.ptr->parse_state; return self.ptr->first_leaf.parse_state; } static inline Length ts_subtree_padding(Subtree self) { if (self.data.is_inline) { Length result = {self.data.padding_bytes, {self.data.padding_rows, self.data.padding_columns}}; return result; } else { return self.ptr->padding; } } static inline Length ts_subtree_size(Subtree self) { if (self.data.is_inline) { Length result = {self.data.size_bytes, {0, self.data.size_bytes}}; return result; } else { return self.ptr->size; } } static inline Length ts_subtree_total_size(Subtree self) { return length_add(ts_subtree_padding(self), ts_subtree_size(self)); } static inline uint32_t ts_subtree_total_bytes(Subtree self) { return ts_subtree_total_size(self).bytes; } static inline uint32_t ts_subtree_child_count(Subtree self) { return self.data.is_inline ? 0 : self.ptr->child_count; } static inline uint32_t ts_subtree_repeat_depth(Subtree self) { return self.data.is_inline ? 0 : self.ptr->repeat_depth; } static inline uint32_t ts_subtree_node_count(Subtree self) { return (self.data.is_inline || self.ptr->child_count == 0) ? 1 : self.ptr->node_count; } static inline uint32_t ts_subtree_visible_child_count(Subtree self) { if (ts_subtree_child_count(self) > 0) { return self.ptr->visible_child_count; } else { return 0; } } static inline uint32_t ts_subtree_error_cost(Subtree self) { if (ts_subtree_missing(self)) { return ERROR_COST_PER_MISSING_TREE + ERROR_COST_PER_RECOVERY; } else { return self.data.is_inline ? 0 : self.ptr->error_cost; } } static inline int32_t ts_subtree_dynamic_precedence(Subtree self) { return (self.data.is_inline || self.ptr->child_count == 0) ? 0 : self.ptr->dynamic_precedence; } static inline uint16_t ts_subtree_production_id(Subtree self) { if (ts_subtree_child_count(self) > 0) { return self.ptr->production_id; } else { return 0; } } static inline bool ts_subtree_fragile_left(Subtree self) { return self.data.is_inline ? false : self.ptr->fragile_left; } static inline bool ts_subtree_fragile_right(Subtree self) { return self.data.is_inline ? false : self.ptr->fragile_right; } static inline bool ts_subtree_has_external_tokens(Subtree self) { return self.data.is_inline ? false : self.ptr->has_external_tokens; } static inline bool ts_subtree_has_external_scanner_state_change(Subtree self) { return self.data.is_inline ? false : self.ptr->has_external_scanner_state_change; } static inline bool ts_subtree_depends_on_column(Subtree self) { return self.data.is_inline ? false : self.ptr->depends_on_column; } static inline bool ts_subtree_is_fragile(Subtree self) { return self.data.is_inline ? false : (self.ptr->fragile_left || self.ptr->fragile_right); } static inline bool ts_subtree_is_error(Subtree self) { return ts_subtree_symbol(self) == ts_builtin_sym_error; } static inline bool ts_subtree_is_eof(Subtree self) { return ts_subtree_symbol(self) == ts_builtin_sym_end; } static inline Subtree ts_subtree_from_mut(MutableSubtree self) { Subtree result; result.data = self.data; return result; } static inline MutableSubtree ts_subtree_to_mut_unsafe(Subtree self) { MutableSubtree result; result.data = self.data; return result; } #ifdef __cplusplus } #endif #endif // TREE_SITTER_SUBTREE_H_ --Apple-Mail=_1AD86A2C-D6D8-4DD0-AB3B-F46B5D7149CC--