* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
[not found] <eke7il3w6ztw.wl-kobarity@gmail.com>
@ 2024-01-21 14:47 ` kobarity
2024-01-21 18:09 ` Dmitry Gutov
0 siblings, 1 reply; 6+ messages in thread
From: kobarity @ 2024-01-21 14:47 UTC (permalink / raw)
To: Yuan Fu, Dmitry Gutov; +Cc: 68445
I am resending my mail, as I made a mistake in X-Debbugs-CC.
I wrote:
> Hi,
>
> I found a problem with python--treesit-syntax-propertize recently
> introduced by the Bug#67977 patch.
>
> 1. emacs -Q
> 2. Open a file in python-ts-mode with the following contents:
>
> #+begin_src python
> """Docstring.
>
> test.
> """
> S = """string."""
> #+end_src
>
> 3. Locate the point on the third line.
> 4. M-q
> 5. An empty line will be inserted.
> 6. M-q
> 7. The string literal on the last line will be split as follows:
>
> S = ""
>
> "string."""
>
> This problem does not occur in python-mode.
>
> The direct cause of this problem is that the string-delimiter property
> set in the docstring is removed. python--treesit-syntax-propertize is
> called to set the property, but it fails to set it properly. Here is
> the trace of python--treesit-syntax-propertize from step 4 above.
>
> ======================================================================
> 1 -> (python--treesit-syntax-propertize 1 45)
> 1 <- python--treesit-syntax-propertize: nil
> ======================================================================
> 1 -> (python--treesit-syntax-propertize 16 45)
> 1 <- python--treesit-syntax-propertize: nil
>
> python--treesit-syntax-propertize is called with argument START 16.
> This is the position inside the docstring.
>
> It seems to me that python--treesit-syntax-propertize assumes that the
> START argument is outside the triple-quoted string. So one solution
> might be to change START to the start of the string if it is within a
> string, as in the attached patch. However, I'm not sure this is the
> right approach. Should we use
> syntax-propertize-extend-region-functions?
>
> --
> In GNU Emacs 30.0.50 (build 5, x86_64-pc-linux-gnu, X toolkit, cairo
> version 1.16.0, Xaw scroll bars) of 2024-01-13 built on ubuntu
> Repository revision: 106cd9aafe8248ef91d7e89161adc5f912ea54eb
> Repository branch: master
> System Description: Ubuntu 22.04.3 LTS
I appreciate your comments.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
2024-01-21 14:47 ` bug#68445: [PATCH] Problem with python--treesit-syntax-propertize kobarity
@ 2024-01-21 18:09 ` Dmitry Gutov
2024-01-22 15:44 ` kobarity
0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Gutov @ 2024-01-21 18:09 UTC (permalink / raw)
To: kobarity, Yuan Fu; +Cc: 68445
Hi!
On 21/01/2024 16:47, kobarity wrote:
>
> I am resending my mail, as I made a mistake in X-Debbugs-CC.
Was it supposed to appear in the bug's thread? I don't see it anywhere.
> I wrote:
>> Hi,
>>
>> I found a problem with python--treesit-syntax-propertize recently
>> introduced by the Bug#67977 patch.
>>
>> 1. emacs -Q
>> 2. Open a file in python-ts-mode with the following contents:
>>
>> #+begin_src python
>> """Docstring.
>>
>> test.
>> """
>> S = """string."""
>> #+end_src
>>
>> 3. Locate the point on the third line.
>> 4. M-q
>> 5. An empty line will be inserted.
>> 6. M-q
>> 7. The string literal on the last line will be split as follows:
>>
>> S = ""
>>
>> "string."""
>>
>> This problem does not occur in python-mode.
>>
>> The direct cause of this problem is that the string-delimiter property
>> set in the docstring is removed. python--treesit-syntax-propertize is
>> called to set the property, but it fails to set it properly. Here is
>> the trace of python--treesit-syntax-propertize from step 4 above.
>>
>> ======================================================================
>> 1 -> (python--treesit-syntax-propertize 1 45)
>> 1 <- python--treesit-syntax-propertize: nil
>> ======================================================================
>> 1 -> (python--treesit-syntax-propertize 16 45)
>> 1 <- python--treesit-syntax-propertize: nil
>>
>> python--treesit-syntax-propertize is called with argument START 16.
>> This is the position inside the docstring.
>>
>> It seems to me that python--treesit-syntax-propertize assumes that the
>> START argument is outside the triple-quoted string. So one solution
>> might be to change START to the start of the string if it is within a
>> string, as in the attached patch. However, I'm not sure this is the
>> right approach.
Sounds good to me. I don't see the patch, though, or where to read it.
>> Should we use
>> syntax-propertize-extend-region-functions?
That's another option, but it shouldn't be necessary. After all, the
absence of a notification from the parser (which would extend the range)
should mean that the node before position 16 is untouched, so there's no
real need to clear the properties there.
I think there is also another approach--handle two different types of
nodes separately, instead of just string_content, so we don't have to
start from the beginning of the literal. Like this:
diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
index e2f614f52c2..4f8b0cb9473 100644
--- a/lisp/progmodes/python.el
+++ b/lisp/progmodes/python.el
@@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize
(while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
(let ((node (treesit-node-at (point))))
;; The triple quotes surround a non-empty string.
- (when (equal (treesit-node-type node) "string_content")
- (let ((start (treesit-node-start node))
- (end (treesit-node-end node)))
- (put-text-property (1- start) start
- 'syntax-table (string-to-syntax "|"))
- (put-text-property end (min (1+ end) (point-max))
- 'syntax-table (string-to-syntax "|"))))))))
+ (cond
+ ((equal (treesit-node-type node) "string_content")
+ (put-text-property (1- (treesit-node-start node))
+ (treesit-node-start node)
+ 'syntax-table (string-to-syntax "|")))
+ ((and (equal (treesit-node-type node) "string_end")
+ (= (treesit-node-start node) (- (point) 3)))
+ (put-text-property (- (point) 3) (- (point) 2)
+ 'syntax-table (string-to-syntax "|"))))))))
\f
;;; Indentation
^ permalink raw reply related [flat|nested] 6+ messages in thread
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
2024-01-21 18:09 ` Dmitry Gutov
@ 2024-01-22 15:44 ` kobarity
2024-01-22 18:52 ` Dmitry Gutov
0 siblings, 1 reply; 6+ messages in thread
From: kobarity @ 2024-01-22 15:44 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Yuan Fu, 68445
Hi,
Dmitry Gutov wrote:
> On 21/01/2024 16:47, kobarity wrote:
> > I am resending my mail, as I made a mistake in X-Debbugs-CC.
> Was it supposed to appear in the bug's thread? I don't see it anywhere.
My first mail was registered as Bug#68445, and my patch is there.
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445
It says:
Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org:
The extra period is my mistake and it may have caused the problem.
I'm sorry for the confusion.
> I think there is also another approach--handle two different types of
> nodes separately, instead of just string_content, so we don't have to
> start from the beginning of the literal. Like this:
>
> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
> index e2f614f52c2..4f8b0cb9473 100644
> --- a/lisp/progmodes/python.el
> +++ b/lisp/progmodes/python.el
> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize
> (while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
> (let ((node (treesit-node-at (point))))
> ;; The triple quotes surround a non-empty string.
> - (when (equal (treesit-node-type node) "string_content")
> - (let ((start (treesit-node-start node))
> - (end (treesit-node-end node)))
> - (put-text-property (1- start) start
> - 'syntax-table (string-to-syntax "|"))
> - (put-text-property end (min (1+ end) (point-max))
> - 'syntax-table (string-to-syntax "|"))))))))
> + (cond
> + ((equal (treesit-node-type node) "string_content")
> + (put-text-property (1- (treesit-node-start node))
> + (treesit-node-start node)
> + 'syntax-table (string-to-syntax "|")))
> + ((and (equal (treesit-node-type node) "string_end")
> + (= (treesit-node-start node) (- (point) 3)))
> + (put-text-property (- (point) 3) (- (point) 2)
> + 'syntax-table (string-to-syntax "|"))))))))
>
> \f
> ;;; Indentation
>
This approach seems better than my patch, but it does not seem to
address the following special case.
#+begin_src python
"""a""""""b"""
#+end_src
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
2024-01-22 15:44 ` kobarity
@ 2024-01-22 18:52 ` Dmitry Gutov
2024-01-23 14:14 ` kobarity
0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Gutov @ 2024-01-22 18:52 UTC (permalink / raw)
To: kobarity; +Cc: Yuan Fu, 68445
On 22/01/2024 17:44, kobarity wrote:
> Hi,
>
> Dmitry Gutov wrote:
>> On 21/01/2024 16:47, kobarity wrote:
>>> I am resending my mail, as I made a mistake in X-Debbugs-CC.
>> Was it supposed to appear in the bug's thread? I don't see it anywhere.
>
> My first mail was registered as Bug#68445, and my patch is there.
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445
>
> It says:
>
> Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org:
>
> The extra period is my mistake and it may have caused the problem.
> I'm sorry for the confusion.
Yeah, but even so that's odd: I'm subscribed to the bug tracker, so the
email should have at least arrived in my inbox, but it did not.
>> I think there is also another approach--handle two different types of
>> nodes separately, instead of just string_content, so we don't have to
>> start from the beginning of the literal. Like this:
>>
>> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
>> index e2f614f52c2..4f8b0cb9473 100644
>> --- a/lisp/progmodes/python.el
>> +++ b/lisp/progmodes/python.el
>> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize
>> (while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
>> (let ((node (treesit-node-at (point))))
>> ;; The triple quotes surround a non-empty string.
>> - (when (equal (treesit-node-type node) "string_content")
>> - (let ((start (treesit-node-start node))
>> - (end (treesit-node-end node)))
>> - (put-text-property (1- start) start
>> - 'syntax-table (string-to-syntax "|"))
>> - (put-text-property end (min (1+ end) (point-max))
>> - 'syntax-table (string-to-syntax "|"))))))))
>> + (cond
>> + ((equal (treesit-node-type node) "string_content")
>> + (put-text-property (1- (treesit-node-start node))
>> + (treesit-node-start node)
>> + 'syntax-table (string-to-syntax "|")))
>> + ((and (equal (treesit-node-type node) "string_end")
>> + (= (treesit-node-start node) (- (point) 3)))
>> + (put-text-property (- (point) 3) (- (point) 2)
>> + 'syntax-table (string-to-syntax "|"))))))))
>>
>> \f
>> ;;; Indentation
>>
>
> This approach seems better than my patch, but it does not seem to
> address the following special case.
>
> #+begin_src python
> """a""""""b"""
> #+end_src
All right, try the patch below, please. It also covers the case of the
empty literal.
I've tried to find a case where it would behave poorly (e.g. by
misdetecting three quotes from a combination of some other string
literals), but couldn't. E.g.,
s = '''asdasd'
is not a concatenation. It's always an error, at least according to the
TS grammar.
diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
index e2f614f52c2..41f612c8b1c 100644
--- a/lisp/progmodes/python.el
+++ b/lisp/progmodes/python.el
@@ -1359,15 +1359,15 @@ python--treesit-syntax-propertize
(save-excursion
(goto-char start)
(while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
- (let ((node (treesit-node-at (point))))
- ;; The triple quotes surround a non-empty string.
- (when (equal (treesit-node-type node) "string_content")
- (let ((start (treesit-node-start node))
- (end (treesit-node-end node)))
- (put-text-property (1- start) start
- 'syntax-table (string-to-syntax "|"))
- (put-text-property end (min (1+ end) (point-max))
- 'syntax-table (string-to-syntax "|"))))))))
+ (let ((node (treesit-node-at (- (point) 3))))
+ ;; Handle triple-quoted strings.
+ (pcase (treesit-node-type node)
+ ("string_start"
+ (put-text-property (1- (point)) (point)
+ 'syntax-table (string-to-syntax "|")))
+ ("string_end"
+ (put-text-property (- (point) 3) (- (point) 2)
+ 'syntax-table (string-to-syntax "|"))))))))
\f
;;; Indentation
^ permalink raw reply related [flat|nested] 6+ messages in thread
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
2024-01-22 18:52 ` Dmitry Gutov
@ 2024-01-23 14:14 ` kobarity
2024-01-26 1:15 ` Dmitry Gutov
0 siblings, 1 reply; 6+ messages in thread
From: kobarity @ 2024-01-23 14:14 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Yuan Fu, 68445
Dmitry Gutov wrote:
>
> On 22/01/2024 17:44, kobarity wrote:
> > Hi,
> >
> > Dmitry Gutov wrote:
> >> On 21/01/2024 16:47, kobarity wrote:
> >>> I am resending my mail, as I made a mistake in X-Debbugs-CC.
> >> Was it supposed to appear in the bug's thread? I don't see it anywhere.
> >
> > My first mail was registered as Bug#68445, and my patch is there.
> >
> > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445
> >
> > It says:
> >
> > Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org:
> >
> > The extra period is my mistake and it may have caused the problem.
> > I'm sorry for the confusion.
>
> Yeah, but even so that's odd: I'm subscribed to the bug tracker, so
> the email should have at least arrived in my inbox, but it did not.
I agree. I can't find my first mail in the bug-gnu-emacs archive.
> >> I think there is also another approach--handle two different types of
> >> nodes separately, instead of just string_content, so we don't have to
> >> start from the beginning of the literal. Like this:
> >>
> >> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
> >> index e2f614f52c2..4f8b0cb9473 100644
> >> --- a/lisp/progmodes/python.el
> >> +++ b/lisp/progmodes/python.el
> >> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize
> >> (while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
> >> (let ((node (treesit-node-at (point))))
> >> ;; The triple quotes surround a non-empty string.
> >> - (when (equal (treesit-node-type node) "string_content")
> >> - (let ((start (treesit-node-start node))
> >> - (end (treesit-node-end node)))
> >> - (put-text-property (1- start) start
> >> - 'syntax-table (string-to-syntax "|"))
> >> - (put-text-property end (min (1+ end) (point-max))
> >> - 'syntax-table (string-to-syntax "|"))))))))
> >> + (cond
> >> + ((equal (treesit-node-type node) "string_content")
> >> + (put-text-property (1- (treesit-node-start node))
> >> + (treesit-node-start node)
> >> + 'syntax-table (string-to-syntax "|")))
> >> + ((and (equal (treesit-node-type node) "string_end")
> >> + (= (treesit-node-start node) (- (point) 3)))
> >> + (put-text-property (- (point) 3) (- (point) 2)
> >> + 'syntax-table (string-to-syntax "|"))))))))
> >>
> >> \f
> >> ;;; Indentation
> >>
> >
> > This approach seems better than my patch, but it does not seem to
> > address the following special case.
> >
> > #+begin_src python
> > """a""""""b"""
> > #+end_src
>
> All right, try the patch below, please. It also covers the case of the
> empty literal.
Thanks, it looks good to me.
> I've tried to find a case where it would behave poorly (e.g. by
> misdetecting three quotes from a combination of some other string
> literals), but couldn't. E.g.,
>
> s = '''asdasd'
>
> is not a concatenation. It's always an error, at least according to
> the TS grammar.
I think the TS grammar is correct, because this example is also an
error according to the Python interpreter.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
2024-01-23 14:14 ` kobarity
@ 2024-01-26 1:15 ` Dmitry Gutov
0 siblings, 0 replies; 6+ messages in thread
From: Dmitry Gutov @ 2024-01-26 1:15 UTC (permalink / raw)
To: kobarity; +Cc: Yuan Fu, 68445-done
On 23/01/2024 16:14, kobarity wrote:
>
> Dmitry Gutov wrote:
>>
>> On 22/01/2024 17:44, kobarity wrote:
>>> Hi,
>>>
>>> Dmitry Gutov wrote:
>>>> On 21/01/2024 16:47, kobarity wrote:
>>>>> I am resending my mail, as I made a mistake in X-Debbugs-CC.
>>>> Was it supposed to appear in the bug's thread? I don't see it anywhere.
>>>
>>> My first mail was registered as Bug#68445, and my patch is there.
>>>
>>> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445
>>>
>>> It says:
>>>
>>> Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org:
>>>
>>> The extra period is my mistake and it may have caused the problem.
>>> I'm sorry for the confusion.
>>
>> Yeah, but even so that's odd: I'm subscribed to the bug tracker, so
>> the email should have at least arrived in my inbox, but it did not.
>
> I agree. I can't find my first mail in the bug-gnu-emacs archive.
>
>>>> I think there is also another approach--handle two different types of
>>>> nodes separately, instead of just string_content, so we don't have to
>>>> start from the beginning of the literal. Like this:
>>>>
>>>> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
>>>> index e2f614f52c2..4f8b0cb9473 100644
>>>> --- a/lisp/progmodes/python.el
>>>> +++ b/lisp/progmodes/python.el
>>>> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize
>>>> (while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
>>>> (let ((node (treesit-node-at (point))))
>>>> ;; The triple quotes surround a non-empty string.
>>>> - (when (equal (treesit-node-type node) "string_content")
>>>> - (let ((start (treesit-node-start node))
>>>> - (end (treesit-node-end node)))
>>>> - (put-text-property (1- start) start
>>>> - 'syntax-table (string-to-syntax "|"))
>>>> - (put-text-property end (min (1+ end) (point-max))
>>>> - 'syntax-table (string-to-syntax "|"))))))))
>>>> + (cond
>>>> + ((equal (treesit-node-type node) "string_content")
>>>> + (put-text-property (1- (treesit-node-start node))
>>>> + (treesit-node-start node)
>>>> + 'syntax-table (string-to-syntax "|")))
>>>> + ((and (equal (treesit-node-type node) "string_end")
>>>> + (= (treesit-node-start node) (- (point) 3)))
>>>> + (put-text-property (- (point) 3) (- (point) 2)
>>>> + 'syntax-table (string-to-syntax "|"))))))))
>>>>
>>>> \f
>>>> ;;; Indentation
>>>>
>>>
>>> This approach seems better than my patch, but it does not seem to
>>> address the following special case.
>>>
>>> #+begin_src python
>>> """a""""""b"""
>>> #+end_src
>>
>> All right, try the patch below, please. It also covers the case of the
>> empty literal.
>
> Thanks, it looks good to me.
>
>> I've tried to find a case where it would behave poorly (e.g. by
>> misdetecting three quotes from a combination of some other string
>> literals), but couldn't. E.g.,
>>
>> s = '''asdasd'
>>
>> is not a concatenation. It's always an error, at least according to
>> the TS grammar.
>
> I think the TS grammar is correct, because this example is also an
> error according to the Python interpreter.
Thanks for testing! Installed, and closing.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-26 1:15 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <eke7il3w6ztw.wl-kobarity@gmail.com>
2024-01-21 14:47 ` bug#68445: [PATCH] Problem with python--treesit-syntax-propertize kobarity
2024-01-21 18:09 ` Dmitry Gutov
2024-01-22 15:44 ` kobarity
2024-01-22 18:52 ` Dmitry Gutov
2024-01-23 14:14 ` kobarity
2024-01-26 1:15 ` Dmitry Gutov
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).