From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#67262: python-ts-mode cannot identify triple-quoted-strings Date: Sun, 26 Nov 2023 04:04:07 +0200 Message-ID: References: <66A741A1-38B8-40C9-BE84-AF99F74A079F@gmail.com> <838r6vm3dj.fsf@gnu.org> <9bfc5e6f-3612-115f-a59d-35ad629bdf9e@gutov.dev> <83v89qcfsb.fsf@gnu.org> <9B8C904A-3729-44AF-82F7-3BEA849F46D0@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------Q0aIrxjQwHlh0z0jWLCDsSeL" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27873"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: 67262@debbugs.gnu.org To: JD Smith , Eli Zaretskii , Yuan Fu Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Nov 26 03:05:08 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1r74WB-0006wZ-H9 for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 26 Nov 2023 03:05:08 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r74W2-00031E-Oi; Sat, 25 Nov 2023 21:04:58 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r74W1-00030x-MM for bug-gnu-emacs@gnu.org; Sat, 25 Nov 2023 21:04:57 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r74W1-0006Lw-Bx for bug-gnu-emacs@gnu.org; Sat, 25 Nov 2023 21:04:57 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1r74W6-00068c-Dn for bug-gnu-emacs@gnu.org; Sat, 25 Nov 2023 21:05:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 26 Nov 2023 02:05:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 67262 X-GNU-PR-Package: emacs Original-Received: via spool by 67262-submit@debbugs.gnu.org id=B67262.170096426723544 (code B ref 67262); Sun, 26 Nov 2023 02:05:02 +0000 Original-Received: (at 67262) by debbugs.gnu.org; 26 Nov 2023 02:04:27 +0000 Original-Received: from localhost ([127.0.0.1]:40514 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r74VX-00067f-27 for submit@debbugs.gnu.org; Sat, 25 Nov 2023 21:04:27 -0500 Original-Received: from wout1-smtp.messagingengine.com ([64.147.123.24]:38751) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r74VU-00067O-UI for 67262@debbugs.gnu.org; Sat, 25 Nov 2023 21:04:25 -0500 Original-Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 4F9573200A29; Sat, 25 Nov 2023 21:04:13 -0500 (EST) Original-Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Sat, 25 Nov 2023 21:04:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm3; t=1700964252; x=1701050652; bh=a9 Lyz/C1Q7pCc+s0vAFxWUT3th/1UmmH+obTnOAqC7c=; b=NXUJCAE5E7psMVGs4e R7ckz+cpGO5GV9XPl4ME+opls6HQB1mJLVbUisBYibHliUq3+xCnu8cQojqiEdmx +ceKFfe133hq3LS1XrnSCp+vCY4lfbiOutG0330xRbW/Tkn9DOY5PpnWRkx/vgdV hr8FZrWIT3ua/uW254lJbYFg9inuoQKvZ2K8kCaaIIxwMPDjwcaWc/tsveUP42zm mj6limlnk3kgJRuNLxs+oj/kaZnY4JfHW0prDdB4TH52F06WQAFrS9UHNMX11Gsk R3YfxYEC3VpYGUEc92/LgPtaGjMZRT+Fpm71jhoDIylV8FGDmJ3wt6jxP6MJ8iS6 5b+w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1700964252; x=1701050652; bh=a9Lyz/C1Q7pCc +s0vAFxWUT3th/1UmmH+obTnOAqC7c=; b=E2drv21aSFQbZiyIKR1EQqw90x5um PlvAhFNO0vKVAGj2hI175YHfEUPGQCCGRaAqDMubGKpX2XjJ+IduViPb/Y94P/Z/ ey/VHzv0E23KuK4s1S49mTZJ3gYGVUjhQBIXSp+xbhNTRmPBg5g2gIPXLASMnqbI LN7dC4yoZS/hxilit0SWFRkMafrI9f3StpSB6dwwchVLKwFRNxoDgpQsxQ6hNNWG MJtvE95DL5IaPU0aZARDa0qU6e7xpf+O5dpKVo9dOrl34Xic6uy35NRannWoqYRA ZmMmxT6TqFXGMChgELL7SoU4CHaMzOcXHvTE/AZQQXoAePYrNiHq70c+A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrudehkedggeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurheptgfkffggfgfuvfevfhfhjgesmhdtreertdefjeenucfhrhhomhepffhmihht rhihucfiuhhtohhvuceoughmihhtrhihsehguhhtohhvrdguvghvqeenucggtffrrghtth gvrhhnpeehteekgfetieeujeeuvddtvdelteffleejteduvdefffejieehheeuteffveei jeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegumh hithhrhiesghhuthhovhdruggvvh X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Original-Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 25 Nov 2023 21:04:11 -0500 (EST) Content-Language: en-US In-Reply-To: <9B8C904A-3729-44AF-82F7-3BEA849F46D0@gmail.com> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:275029 Archived-At: This is a multi-part message in MIME format. --------------Q0aIrxjQwHlh0z0jWLCDsSeL Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 25/11/2023 16:42, JD Smith wrote: > Bridging emacs syntax to treesitter in a robust way seems like it could be a subtle enterprise, so I’d prefer to leave that to one of the experts. Right now the syntax-propertize-function in python-mode does one simple thing: ensure triple quotes are properly marked as strings. Since the treesitter grammar doesn’t distinguish between different flavors of strings, something similar would still be needed, if we want to continue to treat various string flavors distinctly using syntax. > > Is moving all syntactification (beyond just font-lock) over to TS an explicit goal for all the *-ts-mode’s? It would make sense - since this way we would only have one source of syntax-recognition bugs, rather than two (both the grammar and the definition in Elisp). Attached is a patch you can try (that uses treesit for s-p-f). Unfortunately, it's not quite perfect (nor is python-syntax-stringify, according to its FIXME inside): after certain modifications, the syntax-table property is not applied. I've done some print-debugging in python--treesit-parser-after-change, and it looks like the problem is this: in certain cases (e.g. when electric-pair-post-self-insert-function fires) the parser notifier fires only after syntax-propertize has been called -- and it fires inside of it. Meaning it's too late to flush the syntax-propertize cache at that point. The reason for it is, overall, the fast that we're trigger parser's after-change notifiers lazily: only after some other feature has to initialize the parser, calling treesit_ensure_parsed from treesit-parser-root-node. I think bug#66732 might also be a variation of this problem. As for what to do about this one -- probably something involving syntax-propertize-extend-region-functions, adding an entry which would initialize the parser, but not call syntax-ppss-flush-cache directly (or at least not just that). It would signal the earlier position to extend to through some dynamic variable. This is getting tricky enough to move from the individual major modes into treesit.el proper, I think. Yuan and others, thoughts welcome. JD, I do believe the attached patch is TRT (or close to it), but depending on how it works for you, and how quickly we deal with the above problem, it might make sense to enact your original suggestion first. And finally, here's the backtrace that led me to the above conclusions: backtrace() (message "in progress, backtrace %s" (backtrace)) (progn (message "in progress, backtrace %s" (backtrace))) (if (syntax-propertize--in-process-p) (progn (message "in progress, backtrace %s" (backtrace)))) (save-current-buffer (set-buffer (treesit-parser-buffer parser)) (message "flushing %s up to %s" ranges (let* ((--cl-var-- ranges) (r nil) (--cl-var-- nil)) (while (consp --cl-var--) (setq r (car --cl-var--)) (let* ((temp (car r))) (setq --cl-var-- (if --cl-var-- (min --cl-var-- temp) temp))) (setq --cl-var-- (cdr --cl-var--))) --cl-var--)) (syntax-ppss-flush-cache (let* ((--cl-var-- ranges) (r nil) (--cl-var-- nil)) (while (consp --cl-var--) (setq r (car --cl-var--)) (let* ((temp (car r))) (setq --cl-var-- (if --cl-var-- (min --cl-var-- temp) temp))) (setq --cl-var-- (cdr --cl-var--))) --cl-var--)) (if (syntax-propertize--in-process-p) (progn (message "in progress, backtrace %s" (backtrace)))) (message "flushed up to %d, %s" syntax-propertize--done syntax-ppss-wide)) (progn (save-current-buffer (set-buffer (treesit-parser-buffer parser)) (message "flushing %s up to %s" ranges (let* ((--cl-var-- ranges) (r nil) (--cl-var-- nil)) (while (consp --cl-var--) (setq r (car --cl-var--)) (let* ((temp ...)) (setq --cl-var-- (if --cl-var-- ... temp))) (setq --cl-var-- (cdr --cl-var--))) --cl-var--)) (syntax-ppss-flush-cache (let* ((--cl-var-- ranges) (r nil) (--cl-var-- nil)) (while (consp --cl-var--) (setq r (car --cl-var--)) (let* ((temp ...)) (setq --cl-var-- (if --cl-var-- ... temp))) (setq --cl-var-- (cdr --cl-var--))) --cl-var--)) (if (syntax-propertize--in-process-p) (progn (message "in progress, backtrace %s" (backtrace)))) (message "flushed up to %d, %s" syntax-propertize--done syntax-ppss-wide))) (if ranges (progn (save-current-buffer (set-buffer (treesit-parser-buffer parser)) (message "flushing %s up to %s" ranges (let* ((--cl-var-- ranges) (r nil) (--cl-var-- nil)) (while (consp --cl-var--) (setq r (car --cl-var--)) (let* (...) (setq --cl-var-- ...)) (setq --cl-var-- (cdr --cl-var--))) --cl-var--)) (syntax-ppss-flush-cache (let* ((--cl-var-- ranges) (r nil) (--cl-var-- nil)) (while (consp --cl-var--) (setq r (car --cl-var--)) (let* (...) (setq --cl-var-- ...)) (setq --cl-var-- (cdr --cl-var--))) --cl-var--)) (if (syntax-propertize--in-process-p) (progn (message "in progress, backtrace %s" (backtrace)))) (message "flushed up to %d, %s" syntax-propertize--done syntax-ppss-wide)))) python--treesit-parser-after-change(((27 . 50)) #) treesit-buffer-root-node(python) treesit-node-at(42) (let ((node (treesit-node-at (point)))) (cond ((equal (treesit-node-type node) "string_content") (put-text-property (- (point) 3) (- (point) 2) 'syntax-table (string-to-syntax "|"))) ((and (equal (treesit-node-type node) "\"") (= (treesit-node-start node) (- (point) 3))) (put-text-property (1- (point)) (point) 'syntax-table (string-to-syntax "|"))))) (cond (t (message "pt %s" (point)) (let ((node (treesit-node-at (point)))) (cond ((equal (treesit-node-type node) "string_content") (put-text-property (- (point) 3) (- (point) 2) 'syntax-table (string-to-syntax "|"))) ((and (equal (treesit-node-type node) "\"") (= (treesit-node-start node) (- ... 3))) (put-text-property (1- (point)) (point) 'syntax-table (string-to-syntax "|"))))))) (while (and (< (point) end) (re-search-forward "\\(?:\"\"\"\\|'''\\)" end t)) (cond (t (message "pt %s" (point)) (let ((node (treesit-node-at (point)))) (cond ((equal (treesit-node-type node) "string_content") (put-text-property (- ... 3) (- ... 2) 'syntax-table (string-to-syntax "|"))) ((and (equal ... "\"") (= ... ...)) (put-text-property (1- ...) (point) 'syntax-table (string-to-syntax "|")))))))) (closure (t) (start end) (goto-char start) (while (and (< (point) end) (re-search-forward "\\(?:\"\"\"\\|'''\\)" end t)) (cond (t (message "pt %s" (point)) (let ((node ...)) (cond (... ...) (... ...)))))))(39 50) funcall((closure (t) (start end) (goto-char start) (while (and (< (point) end) (re-search-forward "\\(?:\"\"\"\\|'''\\)" end t)) (cond (t (message "pt %s" (point)) (let ((node ...)) (cond (... ...) (... ...))))))) 39 50) python--treesit-syntax-propertize-function-1(39 50) syntax-propertize(42) syntax-ppss(42) electric-pair-syntax-info(39) electric-pair-post-self-insert-function() self-insert-command(1 39) funcall-interactively(self-insert-command 1 39) #(self-insert-command nil nil) call-interactively@ido-cr+-record-current-command(# self-insert-command nil nil) apply(call-interactively@ido-cr+-record-current-command # (self-insert-command nil nil)) call-interactively(self-insert-command nil nil) command-execute(self-insert-command) --------------Q0aIrxjQwHlh0z0jWLCDsSeL Content-Type: text/x-patch; charset=UTF-8; name="python--treesit-syntax-propertize-function.diff" Content-Disposition: attachment; filename="python--treesit-syntax-propertize-function.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL2xpc3AvcHJvZ21vZGVzL3B5dGhvbi5lbCBiL2xpc3AvcHJvZ21vZGVz L3B5dGhvbi5lbAppbmRleCBhYjNiZjFiNGVjMC4uNjU5ZGVmMTg5OTkgMTAwNjQ0Ci0tLSBh L2xpc3AvcHJvZ21vZGVzL3B5dGhvbi5lbAorKysgYi9saXNwL3Byb2dtb2Rlcy9weXRob24u ZWwKQEAgLTEyMzcsNiArMTIzNywyOSBAQCBweXRob24tLXRyZWVzaXQtZm9udGlmeS12YXJp YWJsZQogICAgICAodHJlZXNpdC1ub2RlLXN0YXJ0IG5vZGUpICh0cmVlc2l0LW5vZGUtZW5k IG5vZGUpCiAgICAgICdmb250LWxvY2stdmFyaWFibGUtdXNlLWZhY2Ugb3ZlcnJpZGUgc3Rh cnQgZW5kKSkpCiAKKyhkZWZjb25zdCBweXRob24tLXRyZWVzaXQtc3ludGF4LXByb3BlcnRp emUtZnVuY3Rpb24KKyAgKHN5bnRheC1wcm9wZXJ0aXplLXJ1bGVzCisgICAoKHJ4IChvciAi XCJcIlwiIiAiJycnIikpCisgICAgKDAgKGlnbm9yZQorICAgICAgICAobGV0ICgobm9kZSAo dHJlZXNpdC1ub2RlLWF0IChwb2ludCkpKSkKKyAgICAgICAgICAoY29uZAorICAgICAgICAg ICAoKGVxdWFsICh0cmVlc2l0LW5vZGUtdHlwZSBub2RlKSAic3RyaW5nX2NvbnRlbnQiKQor ICAgICAgICAgICAgKHB1dC10ZXh0LXByb3BlcnR5ICgtIChwb2ludCkgMykgKC0gKHBvaW50 KSAyKQorICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICdzeW50YXgtdGFibGUgKHN0 cmluZy10by1zeW50YXggInwiKSkpCisgICAgICAgICAgICgoYW5kIChlcXVhbCAodHJlZXNp dC1ub2RlLXR5cGUgbm9kZSkgIlwiIikKKyAgICAgICAgICAgICAgICAgKD0gKHRyZWVzaXQt bm9kZS1zdGFydCBub2RlKSAoLSAocG9pbnQpIDMpKSkKKyAgICAgICAgICAgIChwdXQtdGV4 dC1wcm9wZXJ0eSAoMS0gKHBvaW50KSkgKHBvaW50KQorICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICdzeW50YXgtdGFibGUgKHN0cmluZy10by1zeW50YXggInwiKSkpKSkpKSkp KQorCisoZGVmdW4gcHl0aG9uLS10cmVlc2l0LXBhcnNlci1hZnRlci1jaGFuZ2UgKHJhbmdl cyBwYXJzZXIpCisgIDs7IE1ha2Ugc3VyZSB3ZSByZS1zeW50YXgtcHJvcGVydGl6ZSB0aGUg ZnVsbCBub2RlIHRoYXQgaXMgYmVpbmcKKyAgOzsgZWRpdGVkLiAgRm9yIHRyaXBsZS1xdW90 ZWQgc3RyaW5ncy4KKyAgKHdoZW4gcmFuZ2VzCisgICAgKHdpdGgtY3VycmVudC1idWZmZXIg KHRyZWVzaXQtcGFyc2VyLWJ1ZmZlciBwYXJzZXIpCisgICAgICAoc3ludGF4LXBwc3MtZmx1 c2gtY2FjaGUgKGNsLWxvb3AgZm9yIHIgaW4gcmFuZ2VzCisgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgbWluaW1pemUgKGNhciByKSkpKSkpCisKIAwKIDs7OyBJ bmRlbnRhdGlvbgogCkBAIC02ODUxLDYgKzY4NzQsMTQgQEAgcHl0aG9uLXRzLW1vZGUKICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAiX2RlZmluaXRp b24iKSkKICAgICAoc2V0cS1sb2NhbCB0cmVlc2l0LWRlZnVuLW5hbWUtZnVuY3Rpb24KICAg ICAgICAgICAgICAgICAjJ3B5dGhvbi0tdHJlZXNpdC1kZWZ1bi1uYW1lKQorCisgICAgKHNl dHEtbG9jYWwgc3ludGF4LXByb3BlcnRpemUtZnVuY3Rpb24KKyAgICAgICAgICAgICAgICBw eXRob24tLXRyZWVzaXQtc3ludGF4LXByb3BlcnRpemUtZnVuY3Rpb24pCisKKyAgICA7OyBN YWtlIHN1cmUgaXQncyBwbGFjZWQgYmVmb3JlIGZvbnQtbG9jaydzIG5vdGlmaWVyLgorICAg ICh0cmVlc2l0LXBhcnNlci1hZGQtbm90aWZpZXIgKGNhciAodHJlZXNpdC1wYXJzZXItbGlz dCkpCisgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjJ3B5dGhvbi0tdHJlZXNp dC1wYXJzZXItYWZ0ZXItY2hhbmdlKQorCiAgICAgKHRyZWVzaXQtbWFqb3ItbW9kZS1zZXR1 cCkKIAogICAgIChweXRob24tc2tlbGV0b24tYWRkLW1lbnUtaXRlbXMpCg== --------------Q0aIrxjQwHlh0z0jWLCDsSeL--