unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
       [not found] <eke7il3w6ztw.wl-kobarity@gmail.com>
@ 2024-01-21 14:47 ` kobarity
  2024-01-21 18:09   ` Dmitry Gutov
  0 siblings, 1 reply; 6+ messages in thread
From: kobarity @ 2024-01-21 14:47 UTC (permalink / raw)
  To: Yuan Fu, Dmitry Gutov; +Cc: 68445


I am resending my mail, as I made a mistake in X-Debbugs-CC.

I wrote:
> Hi,
> 
> I found a problem with python--treesit-syntax-propertize recently
> introduced by the Bug#67977 patch.
> 
> 1. emacs -Q
> 2. Open a file in python-ts-mode with the following contents:
> 
> #+begin_src python
> """Docstring.
> 
> test.
> """
> S = """string."""
> #+end_src
> 
> 3. Locate the point on the third line.
> 4. M-q
> 5. An empty line will be inserted.
> 6. M-q
> 7. The string literal on the last line will be split as follows:
> 
> S = ""
> 
> "string."""
> 
> This problem does not occur in python-mode.
> 
> The direct cause of this problem is that the string-delimiter property
> set in the docstring is removed.  python--treesit-syntax-propertize is
> called to set the property, but it fails to set it properly.  Here is
> the trace of python--treesit-syntax-propertize from step 4 above.
> 
> ======================================================================
> 1 -> (python--treesit-syntax-propertize 1 45)
> 1 <- python--treesit-syntax-propertize: nil
> ======================================================================
> 1 -> (python--treesit-syntax-propertize 16 45)
> 1 <- python--treesit-syntax-propertize: nil
> 
> python--treesit-syntax-propertize is called with argument START 16.
> This is the position inside the docstring.
> 
> It seems to me that python--treesit-syntax-propertize assumes that the
> START argument is outside the triple-quoted string.  So one solution
> might be to change START to the start of the string if it is within a
> string, as in the attached patch.  However, I'm not sure this is the
> right approach.  Should we use
> syntax-propertize-extend-region-functions?
> 
> --
> In GNU Emacs 30.0.50 (build 5, x86_64-pc-linux-gnu, X toolkit, cairo
>  version 1.16.0, Xaw scroll bars) of 2024-01-13 built on ubuntu
> Repository revision: 106cd9aafe8248ef91d7e89161adc5f912ea54eb
> Repository branch: master
> System Description: Ubuntu 22.04.3 LTS

I appreciate your comments.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
  2024-01-21 14:47 ` bug#68445: [PATCH] Problem with python--treesit-syntax-propertize kobarity
@ 2024-01-21 18:09   ` Dmitry Gutov
  2024-01-22 15:44     ` kobarity
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Gutov @ 2024-01-21 18:09 UTC (permalink / raw)
  To: kobarity, Yuan Fu; +Cc: 68445

Hi!

On 21/01/2024 16:47, kobarity wrote:
> 
> I am resending my mail, as I made a mistake in X-Debbugs-CC.

Was it supposed to appear in the bug's thread? I don't see it anywhere.

> I wrote:
>> Hi,
>>
>> I found a problem with python--treesit-syntax-propertize recently
>> introduced by the Bug#67977 patch.
>>
>> 1. emacs -Q
>> 2. Open a file in python-ts-mode with the following contents:
>>
>> #+begin_src python
>> """Docstring.
>>
>> test.
>> """
>> S = """string."""
>> #+end_src
>>
>> 3. Locate the point on the third line.
>> 4. M-q
>> 5. An empty line will be inserted.
>> 6. M-q
>> 7. The string literal on the last line will be split as follows:
>>
>> S = ""
>>
>> "string."""
>>
>> This problem does not occur in python-mode.
>>
>> The direct cause of this problem is that the string-delimiter property
>> set in the docstring is removed.  python--treesit-syntax-propertize is
>> called to set the property, but it fails to set it properly.  Here is
>> the trace of python--treesit-syntax-propertize from step 4 above.
>>
>> ======================================================================
>> 1 -> (python--treesit-syntax-propertize 1 45)
>> 1 <- python--treesit-syntax-propertize: nil
>> ======================================================================
>> 1 -> (python--treesit-syntax-propertize 16 45)
>> 1 <- python--treesit-syntax-propertize: nil
>>
>> python--treesit-syntax-propertize is called with argument START 16.
>> This is the position inside the docstring.
>>
>> It seems to me that python--treesit-syntax-propertize assumes that the
>> START argument is outside the triple-quoted string.  So one solution
>> might be to change START to the start of the string if it is within a
>> string, as in the attached patch.  However, I'm not sure this is the
>> right approach.

Sounds good to me. I don't see the patch, though, or where to read it.

>> Should we use
>> syntax-propertize-extend-region-functions?

That's another option, but it shouldn't be necessary. After all, the 
absence of a notification from the parser (which would extend the range) 
should mean that the node before position 16 is untouched, so there's no 
real need to clear the properties there.

I think there is also another approach--handle two different types of 
nodes separately, instead of just string_content, so we don't have to 
start from the beginning of the literal. Like this:

diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
index e2f614f52c2..4f8b0cb9473 100644
--- a/lisp/progmodes/python.el
+++ b/lisp/progmodes/python.el
@@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize
      (while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
        (let ((node (treesit-node-at (point))))
          ;; The triple quotes surround a non-empty string.
-        (when (equal (treesit-node-type node) "string_content")
-          (let ((start (treesit-node-start node))
-                (end (treesit-node-end node)))
-            (put-text-property (1- start) start
-                               'syntax-table (string-to-syntax "|"))
-            (put-text-property end (min (1+ end) (point-max))
-                               'syntax-table (string-to-syntax "|"))))))))
+        (cond
+         ((equal (treesit-node-type node) "string_content")
+          (put-text-property (1- (treesit-node-start node))
+                             (treesit-node-start node)
+                             'syntax-table (string-to-syntax "|")))
+         ((and (equal (treesit-node-type node) "string_end")
+               (= (treesit-node-start node) (- (point) 3)))
+          (put-text-property (- (point) 3) (- (point) 2)
+                             'syntax-table (string-to-syntax "|"))))))))

  \f
  ;;; Indentation






^ permalink raw reply related	[flat|nested] 6+ messages in thread

* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
  2024-01-21 18:09   ` Dmitry Gutov
@ 2024-01-22 15:44     ` kobarity
  2024-01-22 18:52       ` Dmitry Gutov
  0 siblings, 1 reply; 6+ messages in thread
From: kobarity @ 2024-01-22 15:44 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Yuan Fu, 68445

Hi,

Dmitry Gutov wrote:
> On 21/01/2024 16:47, kobarity wrote:
> > I am resending my mail, as I made a mistake in X-Debbugs-CC.
> Was it supposed to appear in the bug's thread? I don't see it anywhere.

My first mail was registered as Bug#68445, and my patch is there.

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445

It says:

Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org:

The extra period is my mistake and it may have caused the problem.
I'm sorry for the confusion.

> I think there is also another approach--handle two different types of
> nodes separately, instead of just string_content, so we don't have to
> start from the beginning of the literal. Like this:
> 
> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
> index e2f614f52c2..4f8b0cb9473 100644
> --- a/lisp/progmodes/python.el
> +++ b/lisp/progmodes/python.el
> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize
>      (while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
>        (let ((node (treesit-node-at (point))))
>          ;; The triple quotes surround a non-empty string.
> -        (when (equal (treesit-node-type node) "string_content")
> -          (let ((start (treesit-node-start node))
> -                (end (treesit-node-end node)))
> -            (put-text-property (1- start) start
> -                               'syntax-table (string-to-syntax "|"))
> -            (put-text-property end (min (1+ end) (point-max))
> -                               'syntax-table (string-to-syntax "|"))))))))
> +        (cond
> +         ((equal (treesit-node-type node) "string_content")
> +          (put-text-property (1- (treesit-node-start node))
> +                             (treesit-node-start node)
> +                             'syntax-table (string-to-syntax "|")))
> +         ((and (equal (treesit-node-type node) "string_end")
> +               (= (treesit-node-start node) (- (point) 3)))
> +          (put-text-property (- (point) 3) (- (point) 2)
> +                             'syntax-table (string-to-syntax "|"))))))))
> 
>  \f
>  ;;; Indentation
> 

This approach seems better than my patch, but it does not seem to
address the following special case.

#+begin_src python
"""a""""""b"""
#+end_src





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
  2024-01-22 15:44     ` kobarity
@ 2024-01-22 18:52       ` Dmitry Gutov
  2024-01-23 14:14         ` kobarity
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Gutov @ 2024-01-22 18:52 UTC (permalink / raw)
  To: kobarity; +Cc: Yuan Fu, 68445

On 22/01/2024 17:44, kobarity wrote:
> Hi,
> 
> Dmitry Gutov wrote:
>> On 21/01/2024 16:47, kobarity wrote:
>>> I am resending my mail, as I made a mistake in X-Debbugs-CC.
>> Was it supposed to appear in the bug's thread? I don't see it anywhere.
> 
> My first mail was registered as Bug#68445, and my patch is there.
> 
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445
> 
> It says:
> 
> Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org:
> 
> The extra period is my mistake and it may have caused the problem.
> I'm sorry for the confusion.

Yeah, but even so that's odd: I'm subscribed to the bug tracker, so the 
email should have at least arrived in my inbox, but it did not.

>> I think there is also another approach--handle two different types of
>> nodes separately, instead of just string_content, so we don't have to
>> start from the beginning of the literal. Like this:
>>
>> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
>> index e2f614f52c2..4f8b0cb9473 100644
>> --- a/lisp/progmodes/python.el
>> +++ b/lisp/progmodes/python.el
>> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize
>>       (while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
>>         (let ((node (treesit-node-at (point))))
>>           ;; The triple quotes surround a non-empty string.
>> -        (when (equal (treesit-node-type node) "string_content")
>> -          (let ((start (treesit-node-start node))
>> -                (end (treesit-node-end node)))
>> -            (put-text-property (1- start) start
>> -                               'syntax-table (string-to-syntax "|"))
>> -            (put-text-property end (min (1+ end) (point-max))
>> -                               'syntax-table (string-to-syntax "|"))))))))
>> +        (cond
>> +         ((equal (treesit-node-type node) "string_content")
>> +          (put-text-property (1- (treesit-node-start node))
>> +                             (treesit-node-start node)
>> +                             'syntax-table (string-to-syntax "|")))
>> +         ((and (equal (treesit-node-type node) "string_end")
>> +               (= (treesit-node-start node) (- (point) 3)))
>> +          (put-text-property (- (point) 3) (- (point) 2)
>> +                             'syntax-table (string-to-syntax "|"))))))))
>>
>>   \f
>>   ;;; Indentation
>>
> 
> This approach seems better than my patch, but it does not seem to
> address the following special case.
> 
> #+begin_src python
> """a""""""b"""
> #+end_src

All right, try the patch below, please. It also covers the case of the 
empty literal.

I've tried to find a case where it would behave poorly (e.g. by 
misdetecting three quotes from a combination of some other string 
literals), but couldn't. E.g.,

   s = '''asdasd'

is not a concatenation. It's always an error, at least according to the 
TS grammar.

diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
index e2f614f52c2..41f612c8b1c 100644
--- a/lisp/progmodes/python.el
+++ b/lisp/progmodes/python.el
@@ -1359,15 +1359,15 @@ python--treesit-syntax-propertize
    (save-excursion
      (goto-char start)
      (while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
-      (let ((node (treesit-node-at (point))))
-        ;; The triple quotes surround a non-empty string.
-        (when (equal (treesit-node-type node) "string_content")
-          (let ((start (treesit-node-start node))
-                (end (treesit-node-end node)))
-            (put-text-property (1- start) start
-                               'syntax-table (string-to-syntax "|"))
-            (put-text-property end (min (1+ end) (point-max))
-                               'syntax-table (string-to-syntax "|"))))))))
+      (let ((node (treesit-node-at (- (point) 3))))
+        ;; Handle triple-quoted strings.
+        (pcase (treesit-node-type node)
+          ("string_start"
+           (put-text-property (1- (point)) (point)
+                              'syntax-table (string-to-syntax "|")))
+          ("string_end"
+           (put-text-property (- (point) 3) (- (point) 2)
+                              'syntax-table (string-to-syntax "|"))))))))

  \f
  ;;; Indentation







^ permalink raw reply related	[flat|nested] 6+ messages in thread

* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
  2024-01-22 18:52       ` Dmitry Gutov
@ 2024-01-23 14:14         ` kobarity
  2024-01-26  1:15           ` Dmitry Gutov
  0 siblings, 1 reply; 6+ messages in thread
From: kobarity @ 2024-01-23 14:14 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Yuan Fu, 68445


Dmitry Gutov wrote:
> 
> On 22/01/2024 17:44, kobarity wrote:
> > Hi,
> > 
> > Dmitry Gutov wrote:
> >> On 21/01/2024 16:47, kobarity wrote:
> >>> I am resending my mail, as I made a mistake in X-Debbugs-CC.
> >> Was it supposed to appear in the bug's thread? I don't see it anywhere.
> > 
> > My first mail was registered as Bug#68445, and my patch is there.
> > 
> > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445
> > 
> > It says:
> > 
> > Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org:
> > 
> > The extra period is my mistake and it may have caused the problem.
> > I'm sorry for the confusion.
> 
> Yeah, but even so that's odd: I'm subscribed to the bug tracker, so
> the email should have at least arrived in my inbox, but it did not.

I agree.  I can't find my first mail in the bug-gnu-emacs archive.

> >> I think there is also another approach--handle two different types of
> >> nodes separately, instead of just string_content, so we don't have to
> >> start from the beginning of the literal. Like this:
> >> 
> >> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
> >> index e2f614f52c2..4f8b0cb9473 100644
> >> --- a/lisp/progmodes/python.el
> >> +++ b/lisp/progmodes/python.el
> >> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize
> >>       (while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
> >>         (let ((node (treesit-node-at (point))))
> >>           ;; The triple quotes surround a non-empty string.
> >> -        (when (equal (treesit-node-type node) "string_content")
> >> -          (let ((start (treesit-node-start node))
> >> -                (end (treesit-node-end node)))
> >> -            (put-text-property (1- start) start
> >> -                               'syntax-table (string-to-syntax "|"))
> >> -            (put-text-property end (min (1+ end) (point-max))
> >> -                               'syntax-table (string-to-syntax "|"))))))))
> >> +        (cond
> >> +         ((equal (treesit-node-type node) "string_content")
> >> +          (put-text-property (1- (treesit-node-start node))
> >> +                             (treesit-node-start node)
> >> +                             'syntax-table (string-to-syntax "|")))
> >> +         ((and (equal (treesit-node-type node) "string_end")
> >> +               (= (treesit-node-start node) (- (point) 3)))
> >> +          (put-text-property (- (point) 3) (- (point) 2)
> >> +                             'syntax-table (string-to-syntax "|"))))))))
> >> 
> >>   \f
> >>   ;;; Indentation
> >> 
> > 
> > This approach seems better than my patch, but it does not seem to
> > address the following special case.
> > 
> > #+begin_src python
> > """a""""""b"""
> > #+end_src
> 
> All right, try the patch below, please. It also covers the case of the
> empty literal.

Thanks, it looks good to me.

> I've tried to find a case where it would behave poorly (e.g. by
> misdetecting three quotes from a combination of some other string
> literals), but couldn't. E.g.,
> 
>   s = '''asdasd'
> 
> is not a concatenation. It's always an error, at least according to
> the TS grammar.

I think the TS grammar is correct, because this example is also an
error according to the Python interpreter.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#68445: [PATCH] Problem with python--treesit-syntax-propertize
  2024-01-23 14:14         ` kobarity
@ 2024-01-26  1:15           ` Dmitry Gutov
  0 siblings, 0 replies; 6+ messages in thread
From: Dmitry Gutov @ 2024-01-26  1:15 UTC (permalink / raw)
  To: kobarity; +Cc: Yuan Fu, 68445-done

On 23/01/2024 16:14, kobarity wrote:
> 
> Dmitry Gutov wrote:
>>
>> On 22/01/2024 17:44, kobarity wrote:
>>> Hi,
>>>
>>> Dmitry Gutov wrote:
>>>> On 21/01/2024 16:47, kobarity wrote:
>>>>> I am resending my mail, as I made a mistake in X-Debbugs-CC.
>>>> Was it supposed to appear in the bug's thread? I don't see it anywhere.
>>>
>>> My first mail was registered as Bug#68445, and my patch is there.
>>>
>>> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68445
>>>
>>> It says:
>>>
>>> Report forwarded to casouri <at> gmail.com, dmitry@.gutov.dev, bug-gnu-emacs <at> gnu.org:
>>>
>>> The extra period is my mistake and it may have caused the problem.
>>> I'm sorry for the confusion.
>>
>> Yeah, but even so that's odd: I'm subscribed to the bug tracker, so
>> the email should have at least arrived in my inbox, but it did not.
> 
> I agree.  I can't find my first mail in the bug-gnu-emacs archive.
> 
>>>> I think there is also another approach--handle two different types of
>>>> nodes separately, instead of just string_content, so we don't have to
>>>> start from the beginning of the literal. Like this:
>>>>
>>>> diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
>>>> index e2f614f52c2..4f8b0cb9473 100644
>>>> --- a/lisp/progmodes/python.el
>>>> +++ b/lisp/progmodes/python.el
>>>> @@ -1361,13 +1361,15 @@ python--treesit-syntax-propertize
>>>>        (while (re-search-forward (rx (or "\"\"\"" "'''")) end t)
>>>>          (let ((node (treesit-node-at (point))))
>>>>            ;; The triple quotes surround a non-empty string.
>>>> -        (when (equal (treesit-node-type node) "string_content")
>>>> -          (let ((start (treesit-node-start node))
>>>> -                (end (treesit-node-end node)))
>>>> -            (put-text-property (1- start) start
>>>> -                               'syntax-table (string-to-syntax "|"))
>>>> -            (put-text-property end (min (1+ end) (point-max))
>>>> -                               'syntax-table (string-to-syntax "|"))))))))
>>>> +        (cond
>>>> +         ((equal (treesit-node-type node) "string_content")
>>>> +          (put-text-property (1- (treesit-node-start node))
>>>> +                             (treesit-node-start node)
>>>> +                             'syntax-table (string-to-syntax "|")))
>>>> +         ((and (equal (treesit-node-type node) "string_end")
>>>> +               (= (treesit-node-start node) (- (point) 3)))
>>>> +          (put-text-property (- (point) 3) (- (point) 2)
>>>> +                             'syntax-table (string-to-syntax "|"))))))))
>>>>
>>>>    \f
>>>>    ;;; Indentation
>>>>
>>>
>>> This approach seems better than my patch, but it does not seem to
>>> address the following special case.
>>>
>>> #+begin_src python
>>> """a""""""b"""
>>> #+end_src
>>
>> All right, try the patch below, please. It also covers the case of the
>> empty literal.
> 
> Thanks, it looks good to me.
> 
>> I've tried to find a case where it would behave poorly (e.g. by
>> misdetecting three quotes from a combination of some other string
>> literals), but couldn't. E.g.,
>>
>>    s = '''asdasd'
>>
>> is not a concatenation. It's always an error, at least according to
>> the TS grammar.
> 
> I think the TS grammar is correct, because this example is also an
> error according to the Python interpreter.

Thanks for testing! Installed, and closing.





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-26  1:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <eke7il3w6ztw.wl-kobarity@gmail.com>
2024-01-21 14:47 ` bug#68445: [PATCH] Problem with python--treesit-syntax-propertize kobarity
2024-01-21 18:09   ` Dmitry Gutov
2024-01-22 15:44     ` kobarity
2024-01-22 18:52       ` Dmitry Gutov
2024-01-23 14:14         ` kobarity
2024-01-26  1:15           ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).