* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize [not found] <eke7il3w6ztw.wl-kobarity@gmail.com> @ 2024-01-21 14:47 ` kobarity 2024-01-21 18:09 ` Dmitry Gutov 0 siblings, 1 reply; 6+ messages in thread From: kobarity @ 2024-01-21 14:47 UTC (permalink / raw) To: Yuan Fu, Dmitry Gutov; +Cc: 68445 I am resending my mail, as I made a mistake in X-Debbugs-CC. I wrote: > Hi, > > I found a problem with python--treesit-syntax-propertize recently > introduced by the Bug#67977 patch. > > 1. emacs -Q > 2. Open a file in python-ts-mode with the following contents: > > #+begin_src python > """Docstring. > > test. > """ > S = """string.""" > #+end_src > > 3. Locate the point on the third line. > 4. M-q > 5. An empty line will be inserted. > 6. M-q > 7. The string literal on the last line will be split as follows: > > S = "" > > "string.""" > > This problem does not occur in python-mode. > > The direct cause of this problem is that the string-delimiter property > set in the docstring is removed. python--treesit-syntax-propertize is > called to set the property, but it fails to set it properly. Here is > the trace of python--treesit-syntax-propertize from step 4 above. > > ====================================================================== > 1 -> (python--treesit-syntax-propertize 1 45) > 1 <- python--treesit-syntax-propertize: nil > ====================================================================== > 1 -> (python--treesit-syntax-propertize 16 45) > 1 <- python--treesit-syntax-propertize: nil > > python--treesit-syntax-propertize is called with argument START 16. > This is the position inside the docstring. > > It seems to me that python--treesit-syntax-propertize assumes that the > START argument is outside the triple-quoted string. So one solution > might be to change START to the start of the string if it is within a > string, as in the attached patch. However, I'm not sure this is the > right approach. Should we use > syntax-propertize-extend-region-functions? > > -- > In GNU Emacs 30.0.50 (build 5, x86_64-pc-linux-gnu, X toolkit, cairo > version 1.16.0, Xaw scroll bars) of 2024-01-13 built on ubuntu > Repository revision: 106cd9aafe8248ef91d7e89161adc5f912ea54eb > Repository branch: master > System Description: Ubuntu 22.04.3 LTS I appreciate your comments. ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize 2024-01-21 14:47 ` bug#68445: [PATCH] Problem with python--treesit-syntax-propertize kobarity @ 2024-01-21 18:09 ` Dmitry Gutov 2024-01-22 15:44 ` kobarity 0 siblings, 1 reply; 6+ messages in thread From: Dmitry Gutov @ 2024-01-21 18:09 UTC (permalink / raw) To: kobarity, Yuan Fu; +Cc: 68445 Hi! On 21/01/2024 16:47, kobarity wrote: > > I am resending my mail, as I made a mistake in X-Debbugs-CC. Was it supposed to appear in the bug's thread? I don't see it anywhere. > I wrote: >> Hi, >> >> I found a problem with python--treesit-syntax-propertize recently >> introduced by the Bug#67977 patch. >> >> 1. emacs -Q >> 2. Open a file in python-ts-mode with the following contents: >> >> #+begin_src python >> """Docstring. >> >> test. >> """ >> S = """string.""" >> #+end_src >> >> 3. Locate the point on the third line. >> 4. M-q >> 5. An empty line will be inserted. >> 6. M-q >> 7. The string literal on the last line will be split as follows: >> >> S = "" >> >> "string.""" >> >> This problem does not occur in python-mode. >> >> The direct cause of this problem is that the string-delimiter property >> set in the docstring is removed. python--treesit-syntax-propertize is >> called to set the property, but it fails to set it properly. Here is >> the trace of python--treesit-syntax-propertize from step 4 above. >> >> ====================================================================== >> 1 -> (python--treesit-syntax-propertize 1 45) >> 1 <- python--treesit-syntax-propertize: nil >> ====================================================================== >> 1 -> (python--treesit-syntax-propertize 16 45) >> 1 <- python--treesit-syntax-propertize: nil >> >> python--treesit-syntax-propertize is called with argument START 16. >> This is the position inside the docstring. >> >> It seems to me that python--treesit-syntax-propertize assumes that the >> START argument is outside the triple-quoted string. So one solution >> might be to change START to the start of the string if it is within a >> string, as in the attached patch. However, I'm not sure this is the >> right approach. Sounds good to me. I don't see the patch, though, or where to read it. >> Should we use >> syntax-propertize-extend-region-functions? That's another option, but it shouldn't be necessary. After all, the absence of a notification from the parser (which would extend the range) should mean that the node before position 16 is untouched, so there's no real need to clear the properties there. I think there is also another approach--handle two different types of nodes separately, instead of just string_content, so we don't have to start from the beginning of the literal. Like this: diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el index e2f614f52c2..4f8b0cb9473 100644 --- a/lisp/progmodes/python.el +++ b/lisp/progmodes/python.el @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize (while (re-search-forward (rx (or "\"\"\"" "'''")) end t) (let ((node (treesit-node-at (point)))) ;; The triple quotes surround a non-empty string. - (when (equal (treesit-node-type node) "string_content") - (let ((start (treesit-node-start node)) - (end (treesit-node-end node))) - (put-text-property (1- start) start - 'syntax-table (string-to-syntax "|")) - (put-text-property end (min (1+ end) (point-max)) - 'syntax-table (string-to-syntax "|")))))))) + (cond + ((equal (treesit-node-type node) "string_content") + (put-text-property (1- (treesit-node-start node)) + (treesit-node-start node) + 'syntax-table (string-to-syntax "|"))) + ((and (equal (treesit-node-type node) "string_end") + (= (treesit-node-start node) (- (point) 3))) + (put-text-property (- (point) 3) (- (point) 2) + 'syntax-table (string-to-syntax "|")))))))) \f ;;; Indentation ^ permalink raw reply related [flat|nested] 6+ messages in thread
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize 2024-01-21 18:09 ` Dmitry Gutov @ 2024-01-22 15:44 ` kobarity 2024-01-22 18:52 ` Dmitry Gutov 0 siblings, 1 reply; 6+ messages in thread From: kobarity @ 2024-01-22 15:44 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Yuan Fu, 68445 Hi, Dmitry Gutov wrote: > On 21/01/2024 16:47, kobarity wrote: > > I am resending my mail, as I made a mistake in X-Debbugs-CC. > Was it supposed to appear in the bug's thread? I don't see it anywhere. My first mail was registered as Bug#68445, and my patch is there. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445 It says: Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org: The extra period is my mistake and it may have caused the problem. I'm sorry for the confusion. > I think there is also another approach--handle two different types of > nodes separately, instead of just string_content, so we don't have to > start from the beginning of the literal. Like this: > > diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el > index e2f614f52c2..4f8b0cb9473 100644 > --- a/lisp/progmodes/python.el > +++ b/lisp/progmodes/python.el > @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize > (while (re-search-forward (rx (or "\"\"\"" "'''")) end t) > (let ((node (treesit-node-at (point)))) > ;; The triple quotes surround a non-empty string. > - (when (equal (treesit-node-type node) "string_content") > - (let ((start (treesit-node-start node)) > - (end (treesit-node-end node))) > - (put-text-property (1- start) start > - 'syntax-table (string-to-syntax "|")) > - (put-text-property end (min (1+ end) (point-max)) > - 'syntax-table (string-to-syntax "|")))))))) > + (cond > + ((equal (treesit-node-type node) "string_content") > + (put-text-property (1- (treesit-node-start node)) > + (treesit-node-start node) > + 'syntax-table (string-to-syntax "|"))) > + ((and (equal (treesit-node-type node) "string_end") > + (= (treesit-node-start node) (- (point) 3))) > + (put-text-property (- (point) 3) (- (point) 2) > + 'syntax-table (string-to-syntax "|")))))))) > > \f > ;;; Indentation > This approach seems better than my patch, but it does not seem to address the following special case. #+begin_src python """a""""""b""" #+end_src ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize 2024-01-22 15:44 ` kobarity @ 2024-01-22 18:52 ` Dmitry Gutov 2024-01-23 14:14 ` kobarity 0 siblings, 1 reply; 6+ messages in thread From: Dmitry Gutov @ 2024-01-22 18:52 UTC (permalink / raw) To: kobarity; +Cc: Yuan Fu, 68445 On 22/01/2024 17:44, kobarity wrote: > Hi, > > Dmitry Gutov wrote: >> On 21/01/2024 16:47, kobarity wrote: >>> I am resending my mail, as I made a mistake in X-Debbugs-CC. >> Was it supposed to appear in the bug's thread? I don't see it anywhere. > > My first mail was registered as Bug#68445, and my patch is there. > > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445 > > It says: > > Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org: > > The extra period is my mistake and it may have caused the problem. > I'm sorry for the confusion. Yeah, but even so that's odd: I'm subscribed to the bug tracker, so the email should have at least arrived in my inbox, but it did not. >> I think there is also another approach--handle two different types of >> nodes separately, instead of just string_content, so we don't have to >> start from the beginning of the literal. Like this: >> >> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el >> index e2f614f52c2..4f8b0cb9473 100644 >> --- a/lisp/progmodes/python.el >> +++ b/lisp/progmodes/python.el >> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize >> (while (re-search-forward (rx (or "\"\"\"" "'''")) end t) >> (let ((node (treesit-node-at (point)))) >> ;; The triple quotes surround a non-empty string. >> - (when (equal (treesit-node-type node) "string_content") >> - (let ((start (treesit-node-start node)) >> - (end (treesit-node-end node))) >> - (put-text-property (1- start) start >> - 'syntax-table (string-to-syntax "|")) >> - (put-text-property end (min (1+ end) (point-max)) >> - 'syntax-table (string-to-syntax "|")))))))) >> + (cond >> + ((equal (treesit-node-type node) "string_content") >> + (put-text-property (1- (treesit-node-start node)) >> + (treesit-node-start node) >> + 'syntax-table (string-to-syntax "|"))) >> + ((and (equal (treesit-node-type node) "string_end") >> + (= (treesit-node-start node) (- (point) 3))) >> + (put-text-property (- (point) 3) (- (point) 2) >> + 'syntax-table (string-to-syntax "|")))))))) >> >> \f >> ;;; Indentation >> > > This approach seems better than my patch, but it does not seem to > address the following special case. > > #+begin_src python > """a""""""b""" > #+end_src All right, try the patch below, please. It also covers the case of the empty literal. I've tried to find a case where it would behave poorly (e.g. by misdetecting three quotes from a combination of some other string literals), but couldn't. E.g., s = '''asdasd' is not a concatenation. It's always an error, at least according to the TS grammar. diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el index e2f614f52c2..41f612c8b1c 100644 --- a/lisp/progmodes/python.el +++ b/lisp/progmodes/python.el @@ -1359,15 +1359,15 @@ python--treesit-syntax-propertize (save-excursion (goto-char start) (while (re-search-forward (rx (or "\"\"\"" "'''")) end t) - (let ((node (treesit-node-at (point)))) - ;; The triple quotes surround a non-empty string. - (when (equal (treesit-node-type node) "string_content") - (let ((start (treesit-node-start node)) - (end (treesit-node-end node))) - (put-text-property (1- start) start - 'syntax-table (string-to-syntax "|")) - (put-text-property end (min (1+ end) (point-max)) - 'syntax-table (string-to-syntax "|")))))))) + (let ((node (treesit-node-at (- (point) 3)))) + ;; Handle triple-quoted strings. + (pcase (treesit-node-type node) + ("string_start" + (put-text-property (1- (point)) (point) + 'syntax-table (string-to-syntax "|"))) + ("string_end" + (put-text-property (- (point) 3) (- (point) 2) + 'syntax-table (string-to-syntax "|")))))))) \f ;;; Indentation ^ permalink raw reply related [flat|nested] 6+ messages in thread
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize 2024-01-22 18:52 ` Dmitry Gutov @ 2024-01-23 14:14 ` kobarity 2024-01-26 1:15 ` Dmitry Gutov 0 siblings, 1 reply; 6+ messages in thread From: kobarity @ 2024-01-23 14:14 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Yuan Fu, 68445 Dmitry Gutov wrote: > > On 22/01/2024 17:44, kobarity wrote: > > Hi, > > > > Dmitry Gutov wrote: > >> On 21/01/2024 16:47, kobarity wrote: > >>> I am resending my mail, as I made a mistake in X-Debbugs-CC. > >> Was it supposed to appear in the bug's thread? I don't see it anywhere. > > > > My first mail was registered as Bug#68445, and my patch is there. > > > > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445 > > > > It says: > > > > Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org: > > > > The extra period is my mistake and it may have caused the problem. > > I'm sorry for the confusion. > > Yeah, but even so that's odd: I'm subscribed to the bug tracker, so > the email should have at least arrived in my inbox, but it did not. I agree. I can't find my first mail in the bug-gnu-emacs archive. > >> I think there is also another approach--handle two different types of > >> nodes separately, instead of just string_content, so we don't have to > >> start from the beginning of the literal. Like this: > >> > >> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el > >> index e2f614f52c2..4f8b0cb9473 100644 > >> --- a/lisp/progmodes/python.el > >> +++ b/lisp/progmodes/python.el > >> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize > >> (while (re-search-forward (rx (or "\"\"\"" "'''")) end t) > >> (let ((node (treesit-node-at (point)))) > >> ;; The triple quotes surround a non-empty string. > >> - (when (equal (treesit-node-type node) "string_content") > >> - (let ((start (treesit-node-start node)) > >> - (end (treesit-node-end node))) > >> - (put-text-property (1- start) start > >> - 'syntax-table (string-to-syntax "|")) > >> - (put-text-property end (min (1+ end) (point-max)) > >> - 'syntax-table (string-to-syntax "|")))))))) > >> + (cond > >> + ((equal (treesit-node-type node) "string_content") > >> + (put-text-property (1- (treesit-node-start node)) > >> + (treesit-node-start node) > >> + 'syntax-table (string-to-syntax "|"))) > >> + ((and (equal (treesit-node-type node) "string_end") > >> + (= (treesit-node-start node) (- (point) 3))) > >> + (put-text-property (- (point) 3) (- (point) 2) > >> + 'syntax-table (string-to-syntax "|")))))))) > >> > >> \f > >> ;;; Indentation > >> > > > > This approach seems better than my patch, but it does not seem to > > address the following special case. > > > > #+begin_src python > > """a""""""b""" > > #+end_src > > All right, try the patch below, please. It also covers the case of the > empty literal. Thanks, it looks good to me. > I've tried to find a case where it would behave poorly (e.g. by > misdetecting three quotes from a combination of some other string > literals), but couldn't. E.g., > > s = '''asdasd' > > is not a concatenation. It's always an error, at least according to > the TS grammar. I think the TS grammar is correct, because this example is also an error according to the Python interpreter. ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize 2024-01-23 14:14 ` kobarity @ 2024-01-26 1:15 ` Dmitry Gutov 0 siblings, 0 replies; 6+ messages in thread From: Dmitry Gutov @ 2024-01-26 1:15 UTC (permalink / raw) To: kobarity; +Cc: Yuan Fu, 68445-done On 23/01/2024 16:14, kobarity wrote: > > Dmitry Gutov wrote: >> >> On 22/01/2024 17:44, kobarity wrote: >>> Hi, >>> >>> Dmitry Gutov wrote: >>>> On 21/01/2024 16:47, kobarity wrote: >>>>> I am resending my mail, as I made a mistake in X-Debbugs-CC. >>>> Was it supposed to appear in the bug's thread? I don't see it anywhere. >>> >>> My first mail was registered as Bug#68445, and my patch is there. >>> >>> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445 >>> >>> It says: >>> >>> Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org: >>> >>> The extra period is my mistake and it may have caused the problem. >>> I'm sorry for the confusion. >> >> Yeah, but even so that's odd: I'm subscribed to the bug tracker, so >> the email should have at least arrived in my inbox, but it did not. > > I agree. I can't find my first mail in the bug-gnu-emacs archive. > >>>> I think there is also another approach--handle two different types of >>>> nodes separately, instead of just string_content, so we don't have to >>>> start from the beginning of the literal. Like this: >>>> >>>> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el >>>> index e2f614f52c2..4f8b0cb9473 100644 >>>> --- a/lisp/progmodes/python.el >>>> +++ b/lisp/progmodes/python.el >>>> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize >>>> (while (re-search-forward (rx (or "\"\"\"" "'''")) end t) >>>> (let ((node (treesit-node-at (point)))) >>>> ;; The triple quotes surround a non-empty string. >>>> - (when (equal (treesit-node-type node) "string_content") >>>> - (let ((start (treesit-node-start node)) >>>> - (end (treesit-node-end node))) >>>> - (put-text-property (1- start) start >>>> - 'syntax-table (string-to-syntax "|")) >>>> - (put-text-property end (min (1+ end) (point-max)) >>>> - 'syntax-table (string-to-syntax "|")))))))) >>>> + (cond >>>> + ((equal (treesit-node-type node) "string_content") >>>> + (put-text-property (1- (treesit-node-start node)) >>>> + (treesit-node-start node) >>>> + 'syntax-table (string-to-syntax "|"))) >>>> + ((and (equal (treesit-node-type node) "string_end") >>>> + (= (treesit-node-start node) (- (point) 3))) >>>> + (put-text-property (- (point) 3) (- (point) 2) >>>> + 'syntax-table (string-to-syntax "|")))))))) >>>> >>>> \f >>>> ;;; Indentation >>>> >>> >>> This approach seems better than my patch, but it does not seem to >>> address the following special case. >>> >>> #+begin_src python >>> """a""""""b""" >>> #+end_src >> >> All right, try the patch below, please. It also covers the case of the >> empty literal. > > Thanks, it looks good to me. > >> I've tried to find a case where it would behave poorly (e.g. by >> misdetecting three quotes from a combination of some other string >> literals), but couldn't. E.g., >> >> s = '''asdasd' >> >> is not a concatenation. It's always an error, at least according to >> the TS grammar. > > I think the TS grammar is correct, because this example is also an > error according to the Python interpreter. Thanks for testing! Installed, and closing. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-26 1:15 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <eke7il3w6ztw.wl-kobarity@gmail.com> 2024-01-21 14:47 ` bug#68445: [PATCH] Problem with python--treesit-syntax-propertize kobarity 2024-01-21 18:09 ` Dmitry Gutov 2024-01-22 15:44 ` kobarity 2024-01-22 18:52 ` Dmitry Gutov 2024-01-23 14:14 ` kobarity 2024-01-26 1:15 ` Dmitry Gutov
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).