From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Po Lu Newsgroups: gmane.emacs.devel Subject: Re: master 361c5fc2d8e 3/3: Support more predicates in tree-sitter search functions Date: Fri, 14 Apr 2023 07:55:55 +0800 Message-ID: <87o7nrjq0k.fsf@yahoo.com> References: <168142374534.3804.10641447592850495150@vcs2.savannah.gnu.org> <20230413220905.E3C7BC13A84@vcs2.savannah.gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24105"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Yuan Fu To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Apr 14 01:56:44 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pn6nz-00067e-QK for ged-emacs-devel@m.gmane-mx.org; Fri, 14 Apr 2023 01:56:44 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pn6nU-0004Gb-2d; Thu, 13 Apr 2023 19:56:12 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pn6nR-0004GN-A1 for emacs-devel@gnu.org; Thu, 13 Apr 2023 19:56:09 -0400 Original-Received: from sonic316-21.consmr.mail.ne1.yahoo.com ([66.163.187.147]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pn6nO-0006XC-Vk for emacs-devel@gnu.org; Thu, 13 Apr 2023 19:56:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1681430165; bh=Uo3ldcdlSgNDBcmV2wpyhEHrtirTfjXIdE6Y58kT1lw=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From:Subject:Reply-To; b=X7EnK00bmRyvLu7iHM5EVWyHQuIGA645bgUZLVw9n6tH8OU1yyGUhcgoGro6fcwvhkXO4WISukDPQvqbRAaxsbp5V865M6g+GTdTKVtv1XwZW5ESukNFM8Ev99XLNfEUJ1iDWY9G4t1aei119RdNx1WMC5Y/lznlPuP0A3SY0Sgs/jehG/k68eTPud5y+OEMrpi/Uswt1OEpbn/h0/Xs2yKsrux1qEHH5Yl5PZcs639ZqSIEActhw3tCl34ZxfrtWGSTInJ5eFPSRJExdlMEDLdc0sWXlezMj9SLV57G9hhzChjDXKCR3xXpSI3jcPPm/Y3b3aM/HWSOxdK16xX7tw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1681430165; bh=erwt/bqAV8I6Q4v+p4YryefuM9BC74K50cCXEQFRBnR=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=P0MIQ/GKPH4LX62ak2XLCanBns2thPSwTG8QuJV02+F1YSqSdUvrysK4fMa02pC5ZHYgbYdlZZsuTCC84JIRc4dSPt4KPfJMq/ybY1DV1uywPcVQ0xK/7rq0NuGUJ2tNFvpaQrksGQpJb4d/8PaHfc+lI+zLxMG8ctZ1EQcv+UR14JJm5cA6UJNriyJkiY4xKqQtoFCoLrP48zDucxz+6OZ2iI5GUf35dCvRwbuwmPvSp68qyoKt3DyDulkjVlud7P+gQPb55OYhe4Hl9+8TX+mmqWFDM11SozEIcT4VvHXzp3KnblfLB0bFZXLaMZE+xt1KJL/bhQqCoj4k+7Qe6Q== X-YMail-OSG: LvUnWQ4VM1k5NlfcGplb6HFFTQrfARjSOfu84G1lbVPxLjy3W4Q5.EmXmzL5NDr g5465WwtEKvbmSFdTTtt95_gPwvHjqVMIPVVmWcg6YxdrK_H4XD_lI7kN0ivY93vZUuY_5v246FC gjBJzMUS2p2zRNy.7AO81jwYWfgSDqvsKaq7hHOqfsyx39nG2EOi7unyZJiWDxO7N5fKCEOO5Iuu fmrwohGeCMMhCe4LBGU8ScILfxqOuFXr57kjWYi5kaqlMg_RmGCtDvs7Pi_hfABokADSFYYKVNB3 w9SM2fEWPoi1UU2xayms5.UiG9mZ0LkfbdkrjSmi66bZr2nWzCWHfjJrEej4xVRCB3FwAUT_cx7b 641ZvB_J8OL6ThZRV8rGB3RroX_5ma016hwP0HWJuM5KNc0CCyb1uv.VBVhac3vvOUo9IvOI4PpZ L1Qzmb1idHl3z.3HeAWFLrs5.jlpd4ttFt7DqyhNM17osp8THtJrtUg2S_yVvTQUCxesjsfw.OUj 7uTlpY8yxGM.uTv5hHaac.FFAfYltmgfseIlOpbVY0NgDecE.e9wWkjdFKc7J5DEtOyH7Rk25wi7 mMB4FJhXtFIkW4e1NwOuTfHFWhQzKb2di4WEpYlcWxhCLkC8kimy.sfteNyplnNI6hrR4_zIbUpk IwTTqyILdiso49pRnV9P8ztJOsB3xNezFrVPN9LHUWYZ4E_jPxfVooIHzqsTsY4Snj2FukXoFsqP 8vdiSFfTkcj.yub2qb0p5ik_ZzAUsKHAYHXgPzH3aOtnoxqWTgtBTlTGWlrJFPUQhkyaqdkdwqNW CKbiQ8hc29QOPPvB_JyKNyrHNa98VBoD5yIfHq0dDM X-Sonic-MF: X-Sonic-ID: 165bd382-db4f-4a73-9d49-419bf640e5bb Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic316.consmr.mail.ne1.yahoo.com with HTTP; Thu, 13 Apr 2023 23:56:05 +0000 Original-Received: by hermes--production-sg3-6d6fb994f6-2fxf8 (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID 0bc9ae3138132e69ef476d7d08e79171; Thu, 13 Apr 2023 23:55:59 +0000 (UTC) In-Reply-To: <20230413220905.E3C7BC13A84@vcs2.savannah.gnu.org> (Yuan Fu's message of "Thu, 13 Apr 2023 18:09:05 -0400 (EDT)") X-Mailer: WebService/1.1.21365 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo Received-SPF: pass client-ip=66.163.187.147; envelope-from=luangruo@yahoo.com; helo=sonic316-21.consmr.mail.ne1.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:305300 Archived-At: Yuan Fu writes: > +/* Validate the PRED passed to treesit_traverse_match_predicate. If > + there's an error, set SIGNAL_DATA to something signal accepts, and > + return false, otherwise return true. */ > +static bool > +treesit_traverse_validate_predicate (Lisp_Object pred, > + Lisp_Object *signal_data) > +{ > + if (STRINGP (pred)) > + return true; > + /* We want to allow cl-labels-defined functions, so we allow > + symbols. */ > + else if (FUNCTIONP (pred) || SYMBOLP (pred)) > + return true; > + else if (CONSP (pred)) > + { > + Lisp_Object car = XCAR (pred); > + Lisp_Object cdr = XCDR (pred); > + if (EQ (car, Qnot)) > + { > + if (!CONSP (cdr)) > + { > + *signal_data = list2 (build_string ("Invalide `not' " > + "predicate"), > + pred); > + return false; > + } > + /* At this point CDR must be a cons. */ > + if (XFIXNUM (Flength (cdr)) != 1) > + { > + *signal_data = list2 (build_string ("`not' can only " > + "have one argument"), > + pred); > + return false; > + } > + return treesit_traverse_validate_predicate (XCAR (cdr), > + signal_data); > + } > + else if (EQ (car, Qor)) > + { > + if (!CONSP (cdr) || NILP (cdr)) > + { > + *signal_data = list2 (build_string ("`or' must have a list " > + "of patterns as " > + "arguments "), > + pred); > + return false; > + } > + FOR_EACH_TAIL (cdr) > + { > + if (!treesit_traverse_validate_predicate (XCAR (cdr), > + signal_data)) > + return false; > + } > + return true; > + } > + /* We allow the function to be a symbol to support cl-label. */ > + else if (STRINGP (car) && (FUNCTIONP (cdr) || SYMBOLP (cdr))) > + return true; > + } > + *signal_data = list2 (build_string ("Invalid predicate, see TODO for " > + "valid forms of predicate"), > + pred); > + return false; > +} How is this function robust against deeply nested or circular predicate structures? Shouldn't there be a recursion limit? > +/* Return true if the node at CURSOR matches PRED. PRED can be a lot > + of things: > + > + PRED := string | function | (string . function) > + | (or PRED...) | (not PRED) > + > + See docstring of treesit-search-forward and friends for the meaning > + of each shape. > + > + This function assumes PRED is in one of its valid forms. If NAMED > + is true, also check that the node is named. > + > + This function may signal if the predicate function signals. */ > static bool > treesit_traverse_match_predicate (TSTreeCursor *cursor, Lisp_Object pred, > Lisp_Object parser, bool named) > @@ -3156,24 +3230,63 @@ treesit_traverse_match_predicate (TSTreeCursor *cursor, Lisp_Object pred, > const char *type = ts_node_type (node); > return fast_c_string_match (pred, type, strlen (type)) >= 0; > } > - else > + /* We want to allow cl-labels-defined functions, so we allow > + symbols. */ > + else if (FUNCTIONP (pred) || SYMBOLP (pred)) > { > Lisp_Object lisp_node = make_treesit_node (parser, node); > return !NILP (CALLN (Ffuncall, pred, lisp_node)); > } > + else if (CONSP (pred)) > + { > + Lisp_Object car = XCAR (pred); > + Lisp_Object cdr = XCDR (pred); > + > + if (EQ (car, Qnot)) > + { > + return !treesit_traverse_match_predicate (cursor, XCAR (cdr), > + parser, named); > + } Please remove the redundant braces. > + else if (EQ (car, Qor)) > + { > + FOR_EACH_TAIL (cdr) > + { > + if (treesit_traverse_match_predicate (cursor, XCAR (cdr), > + parser, named)) > + return true; > + } > + return false; > + } > + /* We want to allow cl-labels-defined functions, so we allow > + symbols. */ > + else if (STRINGP (car) && (FUNCTIONP (cdr) || SYMBOLP (cdr))) > + { > + /* A bit of code duplication here, but should be fine. */ > + const char *type = ts_node_type (node); > + if (!(fast_c_string_match (pred, type, strlen (type)) >= 0)) > + return false; > + > + Lisp_Object lisp_node = make_treesit_node (parser, node); > + if (NILP (CALLN (Ffuncall, pred, lisp_node))) > + return false; > + > + return true; > + } > + } > + /* Returning false is better than UB. */ > + return false; > } > > -/* Traverse the parse tree starting from CURSOR. PRED can be a > - function (takes a node and returns nil/non-nil), or a string > - (treated as regexp matching the node's type, must be all single > - byte characters). If the node satisfies PRED, leave CURSOR on that > - node and return true. If no node satisfies PRED, move CURSOR back > - to starting position and return false. > +/* Traverse the parse tree starting from CURSOR. See TODO for the > + shapes PRED can have. If the node satisfies PRED, leave CURSOR on > + that node and return true. If no node satisfies PRED, move CURSOR > + back to starting position and return false. > > LIMIT is the number of levels we descend in the tree. FORWARD > controls the direction in which we traverse the tree, true means > forward, false backward. If SKIP_ROOT is true, don't match ROOT. > - */ > + > + This function may signal if the predicate function signals. */ > static bool > treesit_search_dfs (TSTreeCursor *cursor, > Lisp_Object pred, Lisp_Object parser, > @@ -3209,7 +3322,9 @@ treesit_search_dfs (TSTreeCursor *cursor, > START. PRED, PARSER, NAMED, FORWARD are the same as in > ts_search_subtree. If a match is found, leave CURSOR at that node, > and return true, if no match is found, return false, and CURSOR's > - position is undefined. */ > + position is undefined. > + > + This function may signal if the predicate function signals. */ > static bool > treesit_search_forward (TSTreeCursor *cursor, > Lisp_Object pred, Lisp_Object parser, > @@ -3272,11 +3387,13 @@ Return the first matched node, or nil if none matches. */) > Lisp_Object all, Lisp_Object depth) > { > CHECK_TS_NODE (node); > - CHECK_TYPE (STRINGP (predicate) || FUNCTIONP (predicate), > - list3 (Qor, Qstringp, Qfunctionp), predicate); > CHECK_SYMBOL (all); > CHECK_SYMBOL (backward); > > + Lisp_Object signal_data = Qnil; > + if (!treesit_traverse_validate_predicate (predicate, &signal_data)) > + xsignal1 (Qtreesit_invalid_predicate, signal_data); > + > /* We use a default limit of 1000. See bug#59426 for the > discussion. */ > ptrdiff_t the_limit = treesit_recursion_limit; > @@ -3344,11 +3461,13 @@ always traverse leaf nodes first, then upwards. */) > Lisp_Object all) > { > CHECK_TS_NODE (start); > - CHECK_TYPE (STRINGP (predicate) || FUNCTIONP (predicate), > - list3 (Qor, Qstringp, Qfunctionp), predicate); > CHECK_SYMBOL (all); > CHECK_SYMBOL (backward); > > + Lisp_Object signal_data = Qnil; > + if (!treesit_traverse_validate_predicate (predicate, &signal_data)) > + xsignal1 (Qtreesit_invalid_predicate, signal_data); > + > treesit_initialize (); > > Lisp_Object parser = XTS_NODE (start)->parser; > @@ -3376,7 +3495,9 @@ always traverse leaf nodes first, then upwards. */) > /* Recursively traverse the tree under CURSOR, and append the result > subtree to PARENT's cdr. See more in Ftreesit_induce_sparse_tree. > Note that the top-level children list is reversed, because > - reasons. */ > + reasons. > + > + This function may signal if the predicate function signals. */ > static void > treesit_build_sparse_tree (TSTreeCursor *cursor, Lisp_Object parent, > Lisp_Object pred, Lisp_Object process_fn, > @@ -3462,8 +3583,10 @@ a regexp. */) > Lisp_Object depth) > { > CHECK_TS_NODE (root); > - CHECK_TYPE (STRINGP (predicate) || FUNCTIONP (predicate), > - list3 (Qor, Qstringp, Qfunctionp), predicate); > + > + Lisp_Object signal_data = Qnil; > + if (!treesit_traverse_validate_predicate (predicate, &signal_data)) > + xsignal1 (Qtreesit_invalid_predicate, signal_data); ISTM that if `treesit_traverse_validate_predicate' returns false, `signal_data' is always initialized. The initialization of `signal_data' is thus redundant. In fact, I'm pretty sure you could have `treesit_traverse_validate_predicate' return Qnil upon success, and a signal upon failure. > if (!NILP (process_fn)) > CHECK_TYPE (FUNCTIONP (process_fn), Qfunctionp, process_fn); > @@ -3595,6 +3718,7 @@ syms_of_treesit (void) > DEFSYM (Qoutdated, "outdated"); > DEFSYM (Qhas_error, "has-error"); > DEFSYM (Qlive, "live"); > + DEFSYM (Qnot, "not"); > > DEFSYM (QCanchor, ":anchor"); > DEFSYM (QCequal, ":equal"); > @@ -3619,6 +3743,7 @@ syms_of_treesit (void) > "user-emacs-directory"); > DEFSYM (Qtreesit_parser_deleted, "treesit-parser-deleted"); > DEFSYM (Qtreesit_pattern_expand, "treesit-pattern-expand"); > + DEFSYM (Qtreesit_invalid_predicate, "treesit-invalid-predicate"); > > DEFSYM (Qor, "or"); > > @@ -3646,6 +3771,9 @@ syms_of_treesit (void) > define_error (Qtreesit_parser_deleted, > "This parser is deleted and cannot be used", > Qtreesit_error); > + define_error (Qtreesit_invalid_predicate, > + "Invalid predicate, see TODO for valid forms for a predicate", > + Qtreesit_error); Shouldn't this be in actual documentation and not etc/TODO?