unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#66363: gdb-control-commands-regexp issues
@ 2023-10-05 15:07 Mattias Engdegård
  2023-10-05 16:29 ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Mattias Engdegård @ 2023-10-05 15:07 UTC (permalink / raw)
  To: 66363

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

The variable `gdb-control-commands-regexp` in gdb-mi.el cannot work as currently defined and used. Only group 3 is of  interest, but that group hasn't referred to anything useful for several years.

Group 3 probably refers to a part of the regex's tail where the command argument is matched:

   "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"

However, this seems to be broken as well, because all groups here are inside repetitions. This part of the regexp is also exponential in form if not in practice but we'd better simplify it anyway.

Attached is a suggested patch which makes explicit the command abbreviations matched, and leaves only a single submatch. It also changes the tail to assuming that the command argument doesn't contain non-newlines (or the final eol anchor wouldn't make sense) but that it can contain spaces (which seems reasonable). However, right now the argument is only checked for being non-empty or not.

I don't have a working gdb setup at the moment so if someone would be kind to test it, I would be very grateful for it.


[-- Attachment #2: gdb-control-commands-regexp.diff --]
[-- Type: application/octet-stream, Size: 2185 bytes --]

diff --git a/lisp/progmodes/gdb-mi.el b/lisp/progmodes/gdb-mi.el
index bc0070d2630..f7e96a2f19e 100644
--- a/lisp/progmodes/gdb-mi.el
+++ b/lisp/progmodes/gdb-mi.el
@@ -1960,19 +1960,28 @@ breakpoint-disabled
   :group 'gdb)
 
 \f
+(rx-define gdb-python-guile-commands
+  (or "python" "python-interactive" "pi" "guile" "guile-repl" "gr"))
+
 (defvar gdb-python-guile-commands-regexp
-  "python\\|python-interactive\\|pi\\|guile\\|guile-repl\\|gr"
+  (rx gdb-python-guile-commands)
   "Regexp that matches Python and Guile commands supported by GDB.")
 
 (defvar gdb-control-commands-regexp
-  (concat
-   "^\\("
-   "comm\\(a\\(n\\(ds?\\)?\\)?\\)?\\|if\\|while"
-   "\\|def\\(i\\(ne?\\)?\\)?\\|doc\\(u\\(m\\(e\\(nt?\\)?\\)?\\)?\\)?\\|"
-   gdb-python-guile-commands-regexp
-   "\\|while-stepping\\|stepp\\(i\\(ng?\\)?\\)?\\|ws\\|actions"
-   "\\|expl\\(o\\(re?\\)?\\)?"
-   "\\)\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$")
+  (rx bol
+      (or "comm" "comma" "comman" "command" "commands"
+          "if" "while"
+          "def" "defi" "defin" "define"
+          "doc" "docu" "docum" "docume" "documen" "document"
+          "while-stepping"
+          "stepp" "steppi" "steppin" "stepping"
+          "ws" "actions"
+          "expl" "explo" "explor" "explore"
+          gdb-python-guile-commands)
+      (? (+ blank)
+         (group       ; Group 1 contains the command arguments.
+          (* nonl)))
+      eol)
   "Regexp matching GDB commands that enter a recursive reading loop.
 As long as GDB is in the recursive reading loop, it does not expect
 commands to be prefixed by \"-interpreter-exec console\".")
@@ -2033,7 +2042,7 @@ gdb-send
   ;; Python and Guile commands that have an argument don't enter the
   ;; recursive reading loop.
   (let* ((control-command-p (string-match gdb-control-commands-regexp string))
-         (command-arg (and control-command-p (match-string 3 string)))
+         (command-arg (and control-command-p (match-string 1 string)))
          (python-or-guile-p (string-match gdb-python-guile-commands-regexp
                                           string)))
     (if (and control-command-p

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#66363: gdb-control-commands-regexp issues
  2023-10-05 15:07 bug#66363: gdb-control-commands-regexp issues Mattias Engdegård
@ 2023-10-05 16:29 ` Eli Zaretskii
  2023-10-05 17:16   ` Mattias Engdegård
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2023-10-05 16:29 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 66363

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Thu, 5 Oct 2023 17:07:22 +0200
> 
> The variable `gdb-control-commands-regexp` in gdb-mi.el cannot work as currently defined and used. Only group 3 is of  interest, but that group hasn't referred to anything useful for several years.
> 
> Group 3 probably refers to a part of the regex's tail where the command argument is matched:
> 
>    "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"
> 
> However, this seems to be broken as well, because all groups here are inside repetitions. This part of the regexp is also exponential in form if not in practice but we'd better simplify it anyway.
> 
> Attached is a suggested patch which makes explicit the command abbreviations matched, and leaves only a single submatch. It also changes the tail to assuming that the command argument doesn't contain non-newlines (or the final eol anchor wouldn't make sense) but that it can contain spaces (which seems reasonable). However, right now the argument is only checked for being non-empty or not.
> 
> I don't have a working gdb setup at the moment so if someone would be kind to test it, I would be very grateful for it.

I'd appreciate a few examples of using this that don't work correctly
(or not at all), as it is otherwise not easy to understand the
problem, and much less the proposed solution and how to test it.

TIA





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66363: gdb-control-commands-regexp issues
  2023-10-05 16:29 ` Eli Zaretskii
@ 2023-10-05 17:16   ` Mattias Engdegård
  2023-10-05 17:43     ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Mattias Engdegård @ 2023-10-05 17:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 66363

> I'd appreciate a few examples of using this that don't work correctly
> (or not at all), as it is otherwise not easy to understand the
> problem, and much less the proposed solution and how to test it.

I don't have any examples but the problems are clear from reading the code.
The regexp is defined by:

> (defvar gdb-python-guile-commands-regexp
>   "python\\|python-interactive\\|pi\\|guile\\|guile-repl\\|gr"
>   "Regexp that matches Python and Guile commands supported by GDB.")
> 
> (defvar gdb-control-commands-regexp
>   (concat
>    "^\\("
>    "comm\\(a\\(n\\(ds?\\)?\\)?\\)?\\|if\\|while"
>    "\\|def\\(i\\(ne?\\)?\\)?\\|doc\\(u\\(m\\(e\\(nt?\\)?\\)?\\)?\\)?\\|"
>    gdb-python-guile-commands-regexp
>    "\\|while-stepping\\|stepp\\(i\\(ng?\\)?\\)?\\|ws\\|actions"
>    "\\|expl\\(o\\(re?\\)?\\)?"
>    "\\)\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$")
>   "Regexp matching GDB commands that enter a recursive reading loop.
> As long as GDB is in the recursive reading loop, it does not expect
> commands to be prefixed by \"-interpreter-exec console\".")

which results in the string regexp

> "^\\(comm\\(a\\(n\\(ds?\\)?\\)?\\)?\\|if\\|while\\|def\\(i\\(ne?\\)?\\)?\\|doc\\(u\\(m\\(e\\(nt?\\)?\\)?\\)?\\)?\\|python\\|python-interactive\\|pi\\|guile\\|guile-repl\\|gr\\|while-stepping\\|stepp\\(i\\(ng?\\)?\\)?\\|ws\\|actions\\|expl\\(o\\(re?\\)?\\)?\\)\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"

It is used in one place only:

>   (let* ((control-command-p (string-match gdb-control-commands-regexp string))
>          (command-arg (and control-command-p (match-string 3 string)))

As you can see it refers to group 3, which is matched by "\\(n\\(ds?\\)?\\)" -- clearly not what anybody intended.

As for the tail, even if the match group is corrected it's a rather roundabout way of checking whether there is a non-blank character after the command (and is written using nested repetitions which is how this came to my attention).

For the examples you asked about, perhaps commit 30c0f81f9f which introduced the now broken mechanism can be of help.







^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66363: gdb-control-commands-regexp issues
  2023-10-05 17:16   ` Mattias Engdegård
@ 2023-10-05 17:43     ` Eli Zaretskii
  2023-10-05 18:11       ` Mattias Engdegård
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2023-10-05 17:43 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 66363

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Thu, 5 Oct 2023 19:16:32 +0200
> Cc: 66363@debbugs.gnu.org
> 
> > I'd appreciate a few examples of using this that don't work correctly
> > (or not at all), as it is otherwise not easy to understand the
> > problem, and much less the proposed solution and how to test it.
> 
> I don't have any examples but the problems are clear from reading the code.
> The regexp is defined by:
> 
> > (defvar gdb-python-guile-commands-regexp
> >   "python\\|python-interactive\\|pi\\|guile\\|guile-repl\\|gr"
> >   "Regexp that matches Python and Guile commands supported by GDB.")
> > 
> > (defvar gdb-control-commands-regexp
> >   (concat
> >    "^\\("
> >    "comm\\(a\\(n\\(ds?\\)?\\)?\\)?\\|if\\|while"
> >    "\\|def\\(i\\(ne?\\)?\\)?\\|doc\\(u\\(m\\(e\\(nt?\\)?\\)?\\)?\\)?\\|"
> >    gdb-python-guile-commands-regexp
> >    "\\|while-stepping\\|stepp\\(i\\(ng?\\)?\\)?\\|ws\\|actions"
> >    "\\|expl\\(o\\(re?\\)?\\)?"
> >    "\\)\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$")
> >   "Regexp matching GDB commands that enter a recursive reading loop.
> > As long as GDB is in the recursive reading loop, it does not expect
> > commands to be prefixed by \"-interpreter-exec console\".")
> 
> which results in the string regexp
> 
> > "^\\(comm\\(a\\(n\\(ds?\\)?\\)?\\)?\\|if\\|while\\|def\\(i\\(ne?\\)?\\)?\\|doc\\(u\\(m\\(e\\(nt?\\)?\\)?\\)?\\)?\\|python\\|python-interactive\\|pi\\|guile\\|guile-repl\\|gr\\|while-stepping\\|stepp\\(i\\(ng?\\)?\\)?\\|ws\\|actions\\|expl\\(o\\(re?\\)?\\)?\\)\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"
> 
> It is used in one place only:
> 
> >   (let* ((control-command-p (string-match gdb-control-commands-regexp string))
> >          (command-arg (and control-command-p (match-string 3 string)))
> 
> As you can see it refers to group 3, which is matched by "\\(n\\(ds?\\)?\\)" -- clearly not what anybody intended.

So the problem is only that 3 should be changed to the correct group
number?

> As for the tail, even if the match group is corrected it's a rather roundabout way of checking whether there is a non-blank character after the command (and is written using nested repetitions which is how this came to my attention).

Here you are talking about some optimization of the regexp, or is it
another bug?  If the latter, what is the bug here?





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66363: gdb-control-commands-regexp issues
  2023-10-05 17:43     ` Eli Zaretskii
@ 2023-10-05 18:11       ` Mattias Engdegård
  2023-10-05 18:52         ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Mattias Engdegård @ 2023-10-05 18:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 66363

5 okt. 2023 kl. 19.43 skrev Eli Zaretskii <eliz@gnu.org>:

> So the problem is only that 3 should be changed to the correct group
> number?

That would perhaps work, but it wouldn't be a very robust solution. There are many groups preceding that we don't care about, and the bug arose precisely because they were added without considering that a group was being used.

Better then to get rid of all groups save the one in use. (The proposed patch does that.)
That would make it much more difficult for the same bug to arise again.

> Here you are talking about some optimization of the regexp, or is it
> another bug?  If the latter, what is the bug here?

It's related. When the reference to subgroup 3 was added (30c0f81f9f), the tail looked like this:

  "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)?$"
                  ^^^^^^^^^^^^^^^^^^^
and subgroup 3 was the second group in this substring (underlined above). Later (f71afd600a) the last `?` was changed into a `*`, but that made the contents of that group somewhat hazy because of the repetition:

  "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"
                                        ^
which doesn't leave us with a useful group for the purpose of detecting command arguments at all.
So the tail of the regexp has to be rewritten anyway, and we might as well do it in a more straightforward way. (The proposed patch does that as well.)

The reason I found this is that it contains a sub-pattern on the form (A+B*)* which is super-linear in general but usually easy to fix so that the result is actually more readable, yet faster.






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66363: gdb-control-commands-regexp issues
  2023-10-05 18:11       ` Mattias Engdegård
@ 2023-10-05 18:52         ` Eli Zaretskii
  2023-10-06 12:09           ` Mattias Engdegård
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2023-10-05 18:52 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 66363

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Thu, 5 Oct 2023 20:11:24 +0200
> Cc: 66363@debbugs.gnu.org
> 
> 5 okt. 2023 kl. 19.43 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > So the problem is only that 3 should be changed to the correct group
> > number?
> 
> That would perhaps work, but it wouldn't be a very robust solution. There are many groups preceding that we don't care about, and the bug arose precisely because they were added without considering that a group was being used.
> 
> Better then to get rid of all groups save the one in use. (The proposed patch does that.)
> That would make it much more difficult for the same bug to arise again.
> 
> > Here you are talking about some optimization of the regexp, or is it
> > another bug?  If the latter, what is the bug here?
> 
> It's related. When the reference to subgroup 3 was added (30c0f81f9f), the tail looked like this:
> 
>   "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)?$"
>                   ^^^^^^^^^^^^^^^^^^^
> and subgroup 3 was the second group in this substring (underlined above). Later (f71afd600a) the last `?` was changed into a `*`, but that made the contents of that group somewhat hazy because of the repetition:
> 
>   "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"
>                                         ^
> which doesn't leave us with a useful group for the purpose of detecting command arguments at all.
> So the tail of the regexp has to be rewritten anyway, and we might as well do it in a more straightforward way. (The proposed patch does that as well.)
> 
> The reason I found this is that it contains a sub-pattern on the form (A+B*)* which is super-linear in general but usually easy to fix so that the result is actually more readable, yet faster.

Thanks.  I'm okay with the changes in principle, but someone will have
to test them by running all of the control commands and verifying they
work after the fix, before this can be installed.  I myself won't have
the time for doing that any time soon, sorry.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66363: gdb-control-commands-regexp issues
  2023-10-05 18:52         ` Eli Zaretskii
@ 2023-10-06 12:09           ` Mattias Engdegård
  2023-10-29 16:34             ` Mattias Engdegård
  0 siblings, 1 reply; 8+ messages in thread
From: Mattias Engdegård @ 2023-10-06 12:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 66363

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

5 okt. 2023 kl. 20.52 skrev Eli Zaretskii <eliz@gnu.org>:

> I'm okay with the changes in principle, but someone will have
> to test them by running all of the control commands and verifying they
> work after the fix, before this can be installed.  I myself won't have
> the time for doing that any time soon, sorry.

That's fine, I can wait. The bug only affects GDB users so I'm not personally inconvenienced.

Here's an improved version of the patch that also fixes another bug in the original code: the extra match against gdb-python-guile-commands-regexp is both incorrect (not anchored) and superfluous as that information is available in the match just made.

For instance, consider what happens if the command string is "stepping ...". It will match gdb-control-command-regexp and also gdb-python-guile-commands-regexp and set python-or-guile-p, despite not being a Python or Guile command, and  prevent gdb-control-level from being incremented.


[-- Attachment #2: gdb-control-commands-regexp-2.diff --]
[-- Type: application/octet-stream, Size: 2744 bytes --]

diff --git a/lisp/progmodes/gdb-mi.el b/lisp/progmodes/gdb-mi.el
index bc0070d2630..3afdc59a67e 100644
--- a/lisp/progmodes/gdb-mi.el
+++ b/lisp/progmodes/gdb-mi.el
@@ -1960,19 +1960,23 @@ breakpoint-disabled
   :group 'gdb)
 
 \f
-(defvar gdb-python-guile-commands-regexp
-  "python\\|python-interactive\\|pi\\|guile\\|guile-repl\\|gr"
-  "Regexp that matches Python and Guile commands supported by GDB.")
-
 (defvar gdb-control-commands-regexp
-  (concat
-   "^\\("
-   "comm\\(a\\(n\\(ds?\\)?\\)?\\)?\\|if\\|while"
-   "\\|def\\(i\\(ne?\\)?\\)?\\|doc\\(u\\(m\\(e\\(nt?\\)?\\)?\\)?\\)?\\|"
-   gdb-python-guile-commands-regexp
-   "\\|while-stepping\\|stepp\\(i\\(ng?\\)?\\)?\\|ws\\|actions"
-   "\\|expl\\(o\\(re?\\)?\\)?"
-   "\\)\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$")
+  (rx bol
+      (or
+       (or "comm" "comma" "comman" "command" "commands"
+           "if" "while"
+           "def" "defi" "defin" "define"
+           "doc" "docu" "docum" "docume" "documen" "document"
+           "while-stepping"
+           "stepp" "steppi" "steppin" "stepping"
+           "ws" "actions"
+           "expl" "explo" "explor" "explore")
+       (group         ; group 1: Python and Guile commands
+        (or "python" "python-interactive" "pi" "guile" "guile-repl" "gr")))
+      (? (+ blank)
+         (group       ; group 2: command arguments
+          (* nonl)))
+      eol)
   "Regexp matching GDB commands that enter a recursive reading loop.
 As long as GDB is in the recursive reading loop, it does not expect
 commands to be prefixed by \"-interpreter-exec console\".")
@@ -2032,15 +2036,13 @@ gdb-send
       (setq gdb-continuation nil)))
   ;; Python and Guile commands that have an argument don't enter the
   ;; recursive reading loop.
-  (let* ((control-command-p (string-match gdb-control-commands-regexp string))
-         (command-arg (and control-command-p (match-string 3 string)))
-         (python-or-guile-p (string-match gdb-python-guile-commands-regexp
-                                          string)))
-    (if (and control-command-p
-             (or (not python-or-guile-p)
-                 (null command-arg)
-                 (zerop (length command-arg))))
-        (setq gdb-control-level (1+ gdb-control-level)))))
+  (when (string-match gdb-control-commands-regexp string)
+    (let ((python-or-guile-p (match-beginning 1))
+          (command-arg (match-string 2 string)))
+      (when (or (not python-or-guile-p)
+                (null command-arg)
+                (zerop (length command-arg)))
+        (setq gdb-control-level (1+ gdb-control-level))))))
 
 (defun gdb-mi-quote (string)
   "Return STRING quoted properly as an MI argument.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#66363: gdb-control-commands-regexp issues
  2023-10-06 12:09           ` Mattias Engdegård
@ 2023-10-29 16:34             ` Mattias Engdegård
  0 siblings, 0 replies; 8+ messages in thread
From: Mattias Engdegård @ 2023-10-29 16:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 66363-done

Pushed to master after finally finding a machine with gdb on it.






^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-29 16:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-05 15:07 bug#66363: gdb-control-commands-regexp issues Mattias Engdegård
2023-10-05 16:29 ` Eli Zaretskii
2023-10-05 17:16   ` Mattias Engdegård
2023-10-05 17:43     ` Eli Zaretskii
2023-10-05 18:11       ` Mattias Engdegård
2023-10-05 18:52         ` Eli Zaretskii
2023-10-06 12:09           ` Mattias Engdegård
2023-10-29 16:34             ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).