unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: kobarity <kobarity@gmail.com>
To: Ruijie Yu <ruijie@netyu.xyz>
Cc: Tom Gillespie <tgbugs@gmail.com>, Eli Zaretskii <eliz@gnu.org>,
	Stefan Monnier <monnier@iro.umontreal.ca>,
	63622@debbugs.gnu.org
Subject: bug#63622: lisp/progmodes/python.el: performance regression introduced by multiline font-lock
Date: Thu, 25 May 2023 00:05:09 +0900	[thread overview]
Message-ID: <eke7jzwxhisa.wl-kobarity@gmail.com> (raw)
In-Reply-To: <35B358E3-13AD-41F7-8060-2B1B1968DAE2@netyu.xyz>

[-- Attachment #1: Type: text/plain, Size: 2367 bytes --]


Ruijie Yu wrote:
> On May 23, 2023, at 23:46, kobarity <kobarity@gmail.com> wrote:
> > The performance problem in the example shown by Tom can be resolved by
> > modifying the above code as follows:
> > 
> >          (re (concat "[uU]?[rR]?"
> >                      (rx (or "\"\"\"" "\"" "'''" "'")))))
> 
> I didn’t read the context for this snippet, but isn’t it sufficient to match for only one single-quote and double-quote, instead of also matching for the triple (multiline) counterparts? 

You are right.  I copied the above code from the definition of
python-rx string-delimiter.  However, it was inside of the group
construct.  As group capturing is not needed in
python-info-docstring-p, the regex can be simplified to:

          (re "[uU]?[rR]?[\"']"))

The same regex was used in another place in python-info-docstring-p.
So I fixed it too.  Also I added a simple ERT to identify this fix.

I wrote:
> It breaks some ERTs, but I think we should fix the ERTs.  However,
> there seems to be another problem in python-info-docstring-p.  It
> intentionally considers contiguous strings as docstring as in the ERT
> python-info-docstring-p-1:
> 
> '''
> Module Docstring Django style.
> '''
> u'''Additional module docstring.'''
> '''Not a module docstring.'''
> 
> However, as far as I have tried with Python 3 and Python 2.7, this is
> not correct.

This was my misunderstanding.  PEP-257
(https://www.python.org/dev/peps/pep-0257/#what-is-a-docstring)
clearly states:

#+begin_quote
String literals occurring elsewhere in Python code may also act as
documentation. They are not recognized by the Python bytecode compiler
and are not accessible as runtime object attributes (i.e. not assigned
to __doc__), but two types of extra docstrings may be extracted by
software tools:

1. String literals occurring immediately after a simple assignment at
    the top level of a module, class, or __init__ method are called
    “attribute docstrings”.
2. String literals occurring immediately after another docstring are
    called “additional docstrings”.
#+end_quote

However, there still seems to be a bug in python-info-docstring-p.
Therefore, I would like to keep failing ERTs as expected fail at this
time.  Maybe f-string can also be a docstring.

Attached is a series of patches that replace the previous patches.

[-- Attachment #2: 0001-Fix-python-info-docstring-p.patch --]
[-- Type: application/octet-stream, Size: 4001 bytes --]

From ce25adbbae75426ad568a7cd12e02cddc4e9e23e Mon Sep 17 00:00:00 2001
From: kobarity <kobarity@gmail.com>
Date: Wed, 24 May 2023 22:01:12 +0900
Subject: [PATCH 1/2] Fix python-info-docstring-p

* lisp/progmodes/python.el (python-info-docstring-p): Stop using
python-rx string-delimiter.
* test/lisp/progmodes/python-tests.el
(python-font-lock-escape-sequence-bytes-newline),
(python-font-lock-escape-sequence-hex-octal),
(python-font-lock-escape-sequence-unicode),
(python-font-lock-raw-escape-sequence): Mark as expected failures
until another bug in python-info-docstring-p is
corrected.
(python-info-docstring-p-7): New test.  (Bug#63622)
---
 lisp/progmodes/python.el            |  7 ++-----
 test/lisp/progmodes/python-tests.el | 16 ++++++++++++++++
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
index 6fc05b246a6..032a17c52ff 100644
--- a/lisp/progmodes/python.el
+++ b/lisp/progmodes/python.el
@@ -6018,8 +6018,7 @@ python-info-docstring-p
     (let ((counter 1)
           (indentation (current-indentation))
           (backward-sexp-point)
-          (re (concat "[uU]?[rR]?"
-                      (python-rx string-delimiter))))
+          (re "[uU]?[rR]?[\"']"))
       (when (and
              (not (python-info-assignment-statement-p))
              (looking-at-p re)
@@ -6040,9 +6039,7 @@ python-info-docstring-p
                                                     backward-sexp-point))
                                      (setq last-backward-sexp-point
                                            backward-sexp-point))
-                                   (looking-at-p
-                                    (concat "[uU]?[rR]?"
-                                            (python-rx string-delimiter))))))
+                                   (looking-at-p re))))
                   ;; Previous sexp was a string, restore point.
                   (goto-char backward-sexp-point)
                   (cl-incf counter))
diff --git a/test/lisp/progmodes/python-tests.el b/test/lisp/progmodes/python-tests.el
index 50153e66da5..cbaf5b698bd 100644
--- a/test/lisp/progmodes/python-tests.el
+++ b/test/lisp/progmodes/python-tests.el
@@ -729,6 +729,7 @@ python-font-lock-escape-sequence-multiline-string
      (845 . font-lock-string-face) (886))))
 
 (ert-deftest python-font-lock-escape-sequence-bytes-newline ()
+  :expected-result :failed
   (python-tests-assert-faces
    "b'\\n'
 b\"\\n\""
@@ -741,6 +742,7 @@ python-font-lock-escape-sequence-bytes-newline
      (11 . font-lock-doc-face))))
 
 (ert-deftest python-font-lock-escape-sequence-hex-octal ()
+  :expected-result :failed
   (python-tests-assert-faces
    "b'\\x12 \\777 \\1\\23'
 '\\x12 \\777 \\1\\23'"
@@ -761,6 +763,7 @@ python-font-lock-escape-sequence-hex-octal
      (36 . font-lock-doc-face))))
 
 (ert-deftest python-font-lock-escape-sequence-unicode ()
+  :expected-result :failed
   (python-tests-assert-faces
    "b'\\u1234 \\U00010348 \\N{Plus-Minus Sign}'
 '\\u1234 \\U00010348 \\N{Plus-Minus Sign}'"
@@ -775,6 +778,7 @@ python-font-lock-escape-sequence-unicode
      (80 . font-lock-doc-face))))
 
 (ert-deftest python-font-lock-raw-escape-sequence ()
+  :expected-result :failed
   (python-tests-assert-faces
    "rb'\\x12 \123 \\n'
 r'\\x12 \123 \\n \\u1234 \\U00010348 \\N{Plus-Minus Sign}'"
@@ -6598,6 +6602,18 @@ python-info-docstring-p-6
    (python-tests-look-at "'''Not a method docstring.'''")
    (should (not (python-info-docstring-p)))))
 
+(ert-deftest python-info-docstring-p-7 ()
+  "Test string in a dictionary."
+  (python-tests-with-temp-buffer
+   "
+{'Not a docstring': 1}
+'Also not a docstring'
+"
+   (python-tests-look-at "Not a docstring")
+   (should-not (python-info-docstring-p))
+   (python-tests-look-at "Also not a docstring")
+   (should-not (python-info-docstring-p))))
+
 (ert-deftest python-info-triple-quoted-string-p-1 ()
   "Test triple quoted string."
   (python-tests-with-temp-buffer
-- 
2.34.1


[-- Attachment #3: 0002-Use-font-lock-extend-region-functions-in-python-mode.patch --]
[-- Type: application/octet-stream, Size: 2541 bytes --]

From f166993f4874d28d2f533bfbc3df1c7ca8d3aa2e Mon Sep 17 00:00:00 2001
From: kobarity <kobarity@gmail.com>
Date: Wed, 24 May 2023 22:06:51 +0900
Subject: [PATCH 2/2] Use font-lock-extend-region-functions in python-mode

* lisp/progmodes/python.el (python-font-lock-extend-region): Change
arguments and return value for font-lock-extend-region-functions.
(python-mode): Change from
font-lock-extend-after-change-region-function to
font-lock-extend-region-functions.  (Bug#63622)
---
 lisp/progmodes/python.el | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/lisp/progmodes/python.el b/lisp/progmodes/python.el
index 032a17c52ff..adaeacc2ec1 100644
--- a/lisp/progmodes/python.el
+++ b/lisp/progmodes/python.el
@@ -869,18 +869,22 @@ python-font-lock-keywords
 Which one will be chosen depends on the value of
 `font-lock-maximum-decoration'.")
 
-(defun python-font-lock-extend-region (beg end _old-len)
-  "Extend font-lock region given by BEG and END to statement boundaries."
-  (save-excursion
-    (save-match-data
-      (goto-char beg)
-      (python-nav-beginning-of-statement)
-      (setq beg (point))
-      (goto-char end)
-      (python-nav-end-of-statement)
-      (setq end (point))
-      (cons beg end))))
-
+(defvar font-lock-beg)
+(defvar font-lock-end)
+(defun python-font-lock-extend-region ()
+  "Extend font-lock region to statement boundaries."
+  (let ((beg font-lock-beg)
+        (end font-lock-end))
+    (goto-char beg)
+    (python-nav-beginning-of-statement)
+    (beginning-of-line)
+    (when (< (point) beg)
+      (setq font-lock-beg (point)))
+    (goto-char end)
+    (python-nav-end-of-statement)
+    (when (< end (point))
+      (setq font-lock-end (point)))
+    (or (/= beg font-lock-beg) (/= end font-lock-end))))
 
 (defconst python-syntax-propertize-function
   (syntax-propertize-rules
@@ -6704,9 +6708,9 @@ python-mode
               `(,python-font-lock-keywords
                 nil nil nil nil
                 (font-lock-syntactic-face-function
-                 . python-font-lock-syntactic-face-function)
-                (font-lock-extend-after-change-region-function
-                 . python-font-lock-extend-region)))
+                 . python-font-lock-syntactic-face-function)))
+  (add-hook 'font-lock-extend-region-functions
+            #'python-font-lock-extend-region nil t)
   (setq-local syntax-propertize-function
               python-syntax-propertize-function)
   (setq-local imenu-create-index-function
-- 
2.34.1


  reply	other threads:[~2023-05-24 15:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-21  3:14 bug#63622: lisp/progmodes/python.el: performance regression introduced by multiline font-lock Tom Gillespie
2023-05-21  4:53 ` bug#63622: source of problem identified to be python-font-lock-extend-region Tom Gillespie
2023-05-21  6:08 ` bug#63622: lisp/progmodes/python.el: performance regression introduced by multiline font-lock Eli Zaretskii
2023-05-21  7:13   ` Tom Gillespie
2023-05-21  9:31     ` kobarity
2023-05-21 15:16       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-22 14:58         ` kobarity
2023-05-22 15:08           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-23 15:45             ` kobarity
2023-05-23 17:08               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-23 19:04               ` Tom Gillespie
2023-05-23 23:21                 ` kobarity
2023-05-24 18:53                   ` Tom Gillespie
2023-05-23 23:41               ` Ruijie Yu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-05-24 15:05                 ` kobarity [this message]
2023-05-26 10:01                   ` Eli Zaretskii
2023-05-21 15:44       ` kobarity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eke7jzwxhisa.wl-kobarity@gmail.com \
    --to=kobarity@gmail.com \
    --cc=63622@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=ruijie@netyu.xyz \
    --cc=tgbugs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).