* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags @ 2023-07-29 20:08 Wilhelm Kirschbaum 2023-07-29 21:38 ` Wilhelm Kirschbaum ` (3 more replies) 0 siblings, 4 replies; 20+ messages in thread From: Wilhelm Kirschbaum @ 2023-07-29 20:08 UTC (permalink / raw) To: 64939 A file without an extension will load ruby-mode if the first line is: #!/usr/bin/env ruby but not when the first line is: #!/usr/bin/env -S ruby -e 'puts 123' Is there any reason why the latter should not be matched by the default `auto-mode-interpreter-regexp' value in 'files.el'? A more useful example I stumbled on today while working on a language server after adding: `(add-to-list 'interpreter-mode-alist '("elixir" . elixir-ts-mode))' #!/usr/bin/env -S elixir --erl "-kernel standard_io_encoding latin1" Node.start(:"next-ls-#{System.system_time()}", :shortnames) .... Wilhelm ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-29 20:08 bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Wilhelm Kirschbaum @ 2023-07-29 21:38 ` Wilhelm Kirschbaum 2023-07-30 5:04 ` Eli Zaretskii 2023-07-30 4:53 ` Eli Zaretskii ` (2 subsequent siblings) 3 siblings, 1 reply; 20+ messages in thread From: Wilhelm Kirschbaum @ 2023-07-29 21:38 UTC (permalink / raw) To: 64939 > A file without an extension will load ruby-mode if the first > line is: > > #!/usr/bin/env ruby > > but not when the first line is: > > #!/usr/bin/env -S ruby -e 'puts 123' > > Is there any reason why the latter should not be matched by the > default > `auto-mode-interpreter-regexp' value in 'files.el'? > > > A more useful example I stumbled on today while working on a > language > server after adding: > > `(add-to-list 'interpreter-mode-alist '("elixir" > . elixir-ts-mode))' > > > #!/usr/bin/env -S elixir --erl "-kernel standard_io_encoding > latin1" > > Node.start(:"next-ls-#{System.system_time()}", :shortnames) > .... > > > > Wilhelm This is a very naive solution to the above, but I am probably missing some knowledge here and will break for anyone setting the var to something custom. modified lisp/files.el @@ -3243,7 +3243,7 @@ inhibit-local-variables-p (defvar auto-mode-interpreter-regexp (purecopy "#![ \t]?\\([^ \t\n]*\ -/bin/env[ \t]\\)?\\([^ \t\n]+\\)") +/bin/env[ \t]\\)?\\(-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)?\\([^ \t\n]+\\)") "Regexp matching interpreters, for file mode determination. This regular expression is matched against the first line of a file to determine the file's mode in `set-auto-mode'. If it matches, the file @@ -3445,7 +3445,7 @@ set-auto-mode (setq mode (save-excursion (goto-char (point-min)) (if (looking-at auto-mode-interpreter-regexp) - (match-string 2)))) + (match-string 3)))) ;; Map interpreter name to a mode, signaling we're done at the ;; same time. (setq done (assoc-default modified lisp/progmodes/sh-script.el @@ -1481,7 +1481,7 @@ sh--guess-shell (cond ((save-excursion (goto-char (point-min)) (looking-at auto-mode-interpreter-regexp)) - (match-string 2)) + (match-string 3)) ((not buffer-file-name) sh-shell-file) ;; Checks that use `buffer-file-name' follow. ((string-match "\\.m?spec\\'" buffer-file-name) "rpm") ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-29 21:38 ` Wilhelm Kirschbaum @ 2023-07-30 5:04 ` Eli Zaretskii 2023-07-30 9:38 ` Wilhelm Kirschbaum 0 siblings, 1 reply; 20+ messages in thread From: Eli Zaretskii @ 2023-07-30 5:04 UTC (permalink / raw) To: Wilhelm Kirschbaum; +Cc: 64939 > From: Wilhelm Kirschbaum <wkirschbaum@gmail.com> > Date: Sat, 29 Jul 2023 23:38:07 +0200 > > This is a very naive solution to the above, but I am probably > missing some knowledge here and will break for anyone setting the > var to something custom. Feel free to make this change locally, but I don't see how this can be general enough for us to install it as the default value. For starters, 'env' can be invoked with several options, not just with one. Also, some 'env' options accept arguments, and how do we know if the word that follows "env -OPTION" is the command to check against interpreter-mode-alist or an argument of an option? IOW, I don't think this is a problem for a regexp-based solution. If we want to support such complex shebang lines (btw, does the Posix or GNU/Linux shell support them?), we should analyze the text after "env" to find the candidate interpreter. Not sure whether even that will provide a robust solution. Btw, can't you satisfy your needs via file-local variables? ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-30 5:04 ` Eli Zaretskii @ 2023-07-30 9:38 ` Wilhelm Kirschbaum 2023-07-30 10:04 ` Eli Zaretskii 2023-07-31 17:38 ` Juri Linkov 0 siblings, 2 replies; 20+ messages in thread From: Wilhelm Kirschbaum @ 2023-07-30 9:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64939 Eli Zaretskii <eliz@gnu.org> writes: >> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com> >> Date: Sat, 29 Jul 2023 23:38:07 +0200 >> >> This is a very naive solution to the above, but I am probably >> missing some knowledge here and will break for anyone setting >> the >> var to something custom. > > Feel free to make this change locally, but I don't see how this > can be > general enough for us to install it as the default value. > The problem is that even with a local change, the match group 2 is hard coded. > For starters, 'env' can be invoked with several options, not > just with > one. Also, some 'env' options accept arguments, and how do we > know if > the word that follows "env -OPTION" is the command to check > against > interpreter-mode-alist or an argument of an option? > Understand, it is a bit complex perhaps. > IOW, I don't think this is a problem for a regexp-based > solution. If > we want to support such complex shebang lines (btw, does the > Posix or > GNU/Linux shell support them?), we should analyze the text after > "env" > to find the candidate interpreter. Not sure whether even that > will > provide a robust solution. > I can perhaps have a look if there is something concrete about how this can be interpreted. > Btw, can't you satisfy your needs via file-local variables? Not without convincing the project maintainers to add Emacs specific lines to the code, which I don't really think is appropriate in some cases. It is not a common occurrence to run into this issue, but thought is strange to not work as expected. If I am the only one, then happy to close this and just keep a local patch. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-30 9:38 ` Wilhelm Kirschbaum @ 2023-07-30 10:04 ` Eli Zaretskii 2023-07-31 7:11 ` Wilhelm Kirschbaum 2023-07-31 17:38 ` Juri Linkov 1 sibling, 1 reply; 20+ messages in thread From: Eli Zaretskii @ 2023-07-30 10:04 UTC (permalink / raw) To: Wilhelm Kirschbaum; +Cc: 64939 > From: Wilhelm Kirschbaum <wkirschbaum@gmail.com> > Cc: 64939@debbugs.gnu.org > Date: Sun, 30 Jul 2023 11:38:06 +0200 > > > Btw, can't you satisfy your needs via file-local variables? > > Not without convincing the project maintainers to add Emacs specific > lines to the code, which I don't really think is appropriate in some > cases. Why not? Many projects add such variables, both for Emacs and for Vim. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-30 10:04 ` Eli Zaretskii @ 2023-07-31 7:11 ` Wilhelm Kirschbaum 0 siblings, 0 replies; 20+ messages in thread From: Wilhelm Kirschbaum @ 2023-07-31 7:11 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64939 Eli Zaretskii <eliz@gnu.org> writes: >> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com> >> Cc: 64939@debbugs.gnu.org >> Date: Sun, 30 Jul 2023 11:38:06 +0200 >> >> > Btw, can't you satisfy your needs via file-local variables? >> >> Not without convincing the project maintainers to add Emacs >> specific >> lines to the code, which I don't really think is appropriate in >> some >> cases. > > Why not? Many projects add such variables, both for Emacs and > for > Vim. Sure, but its certainly not common on projects I work on where the vast majority of the contributors don't care about Emacs. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-30 9:38 ` Wilhelm Kirschbaum 2023-07-30 10:04 ` Eli Zaretskii @ 2023-07-31 17:38 ` Juri Linkov 2023-08-01 6:20 ` Wilhelm Kirschbaum 1 sibling, 1 reply; 20+ messages in thread From: Juri Linkov @ 2023-07-31 17:38 UTC (permalink / raw) To: Wilhelm Kirschbaum; +Cc: 64939, Eli Zaretskii > The problem is that even with a local change, the match group 2 is hard > coded. You could use a shy group \(?: ... \) in your customized value of 'auto-mode-interpreter-regexp' that doesn't change the match index from 2 to 3. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-31 17:38 ` Juri Linkov @ 2023-08-01 6:20 ` Wilhelm Kirschbaum 0 siblings, 0 replies; 20+ messages in thread From: Wilhelm Kirschbaum @ 2023-08-01 6:20 UTC (permalink / raw) To: Juri Linkov; +Cc: 64939, Eli Zaretskii Juri Linkov <juri@linkov.net> writes: >> The problem is that even with a local change, the match group 2 >> is hard >> coded. > > You could use a shy group \(?: ... \) > in your customized value of 'auto-mode-interpreter-regexp' > that doesn't change the match index from 2 to 3. Fantastic, thanks. I was not aware that existed. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-29 20:08 bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Wilhelm Kirschbaum 2023-07-29 21:38 ` Wilhelm Kirschbaum @ 2023-07-30 4:53 ` Eli Zaretskii 2023-07-30 8:28 ` Wilhelm Kirschbaum 2024-01-31 19:52 ` bug#64939: Malcolm Cook 2024-02-01 18:52 ` bug#64939: Malcolm Cook 3 siblings, 1 reply; 20+ messages in thread From: Eli Zaretskii @ 2023-07-30 4:53 UTC (permalink / raw) To: Wilhelm Kirschbaum; +Cc: 64939 > From: Wilhelm Kirschbaum <wkirschbaum@gmail.com> > Date: Sat, 29 Jul 2023 22:08:19 +0200 > > > A file without an extension will load ruby-mode if the first line > is: > > #!/usr/bin/env ruby > > but not when the first line is: > > #!/usr/bin/env -S ruby -e 'puts 123' > > Is there any reason why the latter should not be matched by the > default > `auto-mode-interpreter-regexp' value in 'files.el'? That line _is_ matched by auto-mode-interpreter-regexp: (string-match-p auto-mode-interpreter-regexp "#!/usr/bin/env -S ruby -e 'puts 123'") => 0 The problem is how to find the name of the interpreter if the text after "/usr/bin/env" includes more than one word? Once we start using command-line switches and their arguments, and take into consideration that many GNU/Linux programs can freely intersperse options and non-option arguments on the command line in any order, where does this end? > A more useful example I stumbled on today while working on a > language > server after adding: > > `(add-to-list 'interpreter-mode-alist '("elixir" > . elixir-ts-mode))' > > > #!/usr/bin/env -S elixir --erl "-kernel standard_io_encoding > latin1" How about making a script that invokes elixir with those arguments, and then augment interpreter-mode-alist to name that script instead? ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-30 4:53 ` Eli Zaretskii @ 2023-07-30 8:28 ` Wilhelm Kirschbaum 2023-07-30 10:03 ` Eli Zaretskii 0 siblings, 1 reply; 20+ messages in thread From: Wilhelm Kirschbaum @ 2023-07-30 8:28 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64939 Eli Zaretskii <eliz@gnu.org> writes: >> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com> >> Date: Sat, 29 Jul 2023 22:08:19 +0200 >> >> >> A file without an extension will load ruby-mode if the first >> line >> is: >> >> #!/usr/bin/env ruby >> >> but not when the first line is: >> >> #!/usr/bin/env -S ruby -e 'puts 123' >> >> Is there any reason why the latter should not be matched by the >> default >> `auto-mode-interpreter-regexp' value in 'files.el'? > > That line _is_ matched by auto-mode-interpreter-regexp: > > (string-match-p auto-mode-interpreter-regexp "#!/usr/bin/env > -S ruby -e 'puts 123'") => 0 > > The problem is how to find the name of the interpreter if the > text > after "/usr/bin/env" includes more than one word? Once we start > using > command-line switches and their arguments, and take into > consideration > that many GNU/Linux programs can freely intersperse options and > non-option arguments on the command line in any order, where > does this > end? > I am hoping there is some way of effectively matching command-line switches for '/usr/bin/env', but sounds like it is perhaps too complex? This probably ends with a match to some common command-line variations for the '/usr/bin/env' program. It is not complete now, so make it guess slightly better is perhaps appropriate? >> A more useful example I stumbled on today while working on a >> language >> server after adding: >> >> `(add-to-list 'interpreter-mode-alist '("elixir" >> . elixir-ts-mode))' >> >> >> #!/usr/bin/env -S elixir --erl "-kernel >> standard_io_encoding >> latin1" > > How about making a script that invokes elixir with those > arguments, > and then augment interpreter-mode-alist to name that script > instead? That is an option, but then I have to convince the maintainers of all the packages I work on to make this change, vs my editor handling this. Even locally changing the regexp means that the match index has changed, so can't see this working without a patch. If this is perhaps too much work and/or uncertainty for negligible benefit to users, then it won't cause distress on my side :). Wilhelm ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-30 8:28 ` Wilhelm Kirschbaum @ 2023-07-30 10:03 ` Eli Zaretskii 2023-07-30 10:27 ` Wilhelm Kirschbaum 0 siblings, 1 reply; 20+ messages in thread From: Eli Zaretskii @ 2023-07-30 10:03 UTC (permalink / raw) To: Wilhelm Kirschbaum; +Cc: 64939 > From: Wilhelm Kirschbaum <wkirschbaum@gmail.com> > Cc: 64939@debbugs.gnu.org > Date: Sun, 30 Jul 2023 10:28:14 +0200 > > > Eli Zaretskii <eliz@gnu.org> writes: > > > That line _is_ matched by auto-mode-interpreter-regexp: > > > > (string-match-p auto-mode-interpreter-regexp "#!/usr/bin/env > > -S ruby -e 'puts 123'") => 0 > > > > The problem is how to find the name of the interpreter if the text > > after "/usr/bin/env" includes more than one word? Once we start > > using command-line switches and their arguments, and take into > > consideration that many GNU/Linux programs can freely intersperse > > options and non-option arguments on the command line in any order, > > where does this end? > > I am hoping there is some way of effectively matching command-line > switches for '/usr/bin/env', but sounds like it is perhaps too > complex? How can we do that without incurring non-trivial maintenance costs? 'env' is being actively developed; e.g., the old version I have where I'm typing this doesn't even have the -S option. Are we supposed to track every new command-line option added to 'env', and update our code accordingly? That is even impractical, because someone could use 2-year old Emacs with Coreutils released just yesterday. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2023-07-30 10:03 ` Eli Zaretskii @ 2023-07-30 10:27 ` Wilhelm Kirschbaum 0 siblings, 0 replies; 20+ messages in thread From: Wilhelm Kirschbaum @ 2023-07-30 10:27 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 64939 Eli Zaretskii <eliz@gnu.org> writes: >> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com> >> Cc: 64939@debbugs.gnu.org >> Date: Sun, 30 Jul 2023 10:28:14 +0200 >> >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > That line _is_ matched by auto-mode-interpreter-regexp: >> > >> > (string-match-p auto-mode-interpreter-regexp >> > "#!/usr/bin/env >> > -S ruby -e 'puts 123'") => 0 >> > >> > The problem is how to find the name of the interpreter if the >> > text >> > after "/usr/bin/env" includes more than one word? Once we >> > start >> > using command-line switches and their arguments, and take >> > into >> > consideration that many GNU/Linux programs can freely >> > intersperse >> > options and non-option arguments on the command line in any >> > order, >> > where does this end? >> >> I am hoping there is some way of effectively matching >> command-line >> switches for '/usr/bin/env', but sounds like it is perhaps too >> complex? > > How can we do that without incurring non-trivial maintenance > costs? > 'env' is being actively developed; e.g., the old version I have > where > I'm typing this doesn't even have the -S option. Are we > supposed to > track every new command-line option added to 'env', and update > our > code accordingly? That is even impractical, because someone > could use > 2-year old Emacs with Coreutils released just yesterday. No, I don't think we should track every command-line option added, but just allow to match command-line options for env ( maybe this is the non-trivial part). I don't understand why it will have an impact for someone who is using a version 5+ year old version of Coreutils. If it was possible for a user to add configuration to set-auto-mode script files like the ones mentioned above I think this will be less of an issue. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 2023-07-29 20:08 bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Wilhelm Kirschbaum 2023-07-29 21:38 ` Wilhelm Kirschbaum 2023-07-30 4:53 ` Eli Zaretskii @ 2024-01-31 19:52 ` Malcolm Cook 2024-02-01 18:52 ` bug#64939: Malcolm Cook 3 siblings, 0 replies; 20+ messages in thread From: Malcolm Cook @ 2024-01-31 19:52 UTC (permalink / raw) To: 64939 [-- Attachment #1: Type: text/plain, Size: 539 bytes --] I find allowing the shy regexp to match zero or more times (using a '*' instead of '?') solves not only the use case of including -S as an option, but also can support other options to env. I have this now in my init.el: (setq auto-mode-interpreter-regexp ;; support -S and other options to /bin/env. See ;; https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939 (purecopy "#![ \t]?\\([^ \t\n]*\ /bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^ \t\n]+\\)")) Hooray? Why not patch lisp/files.el accordingly? ~ Malcolm Cook [-- Attachment #2: Type: text/html, Size: 706 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 2023-07-29 20:08 bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Wilhelm Kirschbaum ` (2 preceding siblings ...) 2024-01-31 19:52 ` bug#64939: Malcolm Cook @ 2024-02-01 18:52 ` Malcolm Cook 2024-02-10 8:27 ` bug#64939: Eli Zaretskii 3 siblings, 1 reply; 20+ messages in thread From: Malcolm Cook @ 2024-02-01 18:52 UTC (permalink / raw) To: 64939 Regarding [1] allowing emacs to recognize shebang lines containing calls to /bin/env with options (such as -S as allowed in new core utils [2])... I prefer allowing the proposed "shy" regexp to match zero or more times (using a '*' instead of '?'). To wit, I have this now in my init.el: (setq auto-mode-interpreter-regexp ;; Support shbang line calling `/bin/env` with `-S` (and/or other options). ;; c.f. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939 (purecopy "#![ \t]?\\([^ \t\n]*\ /bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^ \t\n]+\\)")) [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939 [2] https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation YMMV? ~ Malcolm Cook ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 2024-02-01 18:52 ` bug#64939: Malcolm Cook @ 2024-02-10 8:27 ` Eli Zaretskii 2024-02-10 10:23 ` bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Kévin Le Gouguec 0 siblings, 1 reply; 20+ messages in thread From: Eli Zaretskii @ 2024-02-10 8:27 UTC (permalink / raw) To: Malcolm Cook, Kévin Le Gouguec; +Cc: 64939 > From: Malcolm Cook <malcolm.cook@gmail.com> > Date: Thu, 1 Feb 2024 12:52:39 -0600 > > Regarding [1] allowing emacs to recognize shebang lines containing > calls to /bin/env with options (such as -S as allowed in new core > utils [2])... > > I prefer allowing the proposed "shy" regexp to match zero or more > times (using a '*' instead of '?'). > > To wit, I have this now in my init.el: > > (setq auto-mode-interpreter-regexp > ;; Support shbang line calling `/bin/env` with `-S` (and/or > other options). > ;; c.f. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939 > (purecopy "#![ \t]?\\([^ \t\n]*\ > /bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^ > \t\n]+\\)")) > > [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939 > [2] https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation > > YMMV? Kevin, any comments about the proposals in this bug report? ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2024-02-10 8:27 ` bug#64939: Eli Zaretskii @ 2024-02-10 10:23 ` Kévin Le Gouguec 2024-02-10 17:08 ` Kévin Le Gouguec 0 siblings, 1 reply; 20+ messages in thread From: Kévin Le Gouguec @ 2024-02-10 10:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Wilhelm Kirschbaum, Malcolm Cook, 64939 Thanks for the CC, this report had completely slipped past my notice when I worked on bug#66902, and so did Malcolm's follow-ups. Boldly adding Wilhelm as well, since I am not 100% sure Debbugs sends a copy of every message in a report to their OP. Comments below. Eli Zaretskii <eliz@gnu.org> writes: >> From: Malcolm Cook <malcolm.cook@gmail.com> >> Date: Thu, 1 Feb 2024 12:52:39 -0600 >> >> Regarding [1] allowing emacs to recognize shebang lines containing >> calls to /bin/env with options (such as -S as allowed in new core >> utils [2])... >> >> I prefer allowing the proposed "shy" regexp to match zero or more >> times (using a '*' instead of '?'). >> >> To wit, I have this now in my init.el: >> >> (setq auto-mode-interpreter-regexp >> ;; Support shbang line calling `/bin/env` with `-S` (and/or >> other options). >> ;; c.f. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939 >> (purecopy "#![ \t]?\\([^ \t\n]*\ >> /bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^ >> \t\n]+\\)")) IIUC this would be a more lax variant of what we installed for bug#66902, can you confirm Malcolm? This is what the current regexp looks like on the master branch: (purecopy (concat "#![ \t]*" ;; Optional group 1: env(1) invocation. "\\(" "[^ \t\n]*/bin/env[ \t]*" "\\(?:-S[ \t]*\\|--split-string\\(?:=\\|[ \t]*\\)\\)?" "\\)?" ;; Group 2: interpreter. "\\([^ \t\n]+\\)")) And the corresponding test cases: (ert-deftest files-tests-auto-mode-interpreter () "Test that `set-auto-mode' deduces correct modes from shebangs." (files-tests--check-shebang "#!/bin/bash" 'sh-mode) (files-tests--check-shebang "#!/usr/bin/env bash" 'sh-mode) (files-tests--check-shebang "#!/usr/bin/env python" 'python-base-mode) (files-tests--check-shebang "#!/usr/bin/env python3" 'python-base-mode) (files-tests--check-shebang "#!/usr/bin/env -S awk -v FS=\"\\t\" -v OFS=\"\\t\" -f" 'awk-mode) (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode) (files-tests--check-shebang "#!/usr/bin/make -f" 'makefile-mode)) Is this Good Enough™ for your purposes (Malcolm, Wilhelm), or should we sophisticate the regexp further? FWIW, in no particular order: (a) env(1) does seem to support mixing up arbitrary options with -S¹, so in principle it would make sense to support that; (b) Eli did not seem too found of the regexp hammer², so I don't know which direction we'd want to go between maximally correct (accept all arguments, _as long as_ -S|--split-string is in there) or good enough (just skip over --everything --that --looks --like -a --switch). (c) FWIW the "maximally correct" regexp might not be _that_ ugly, since "-[v]S[OPTION]" must be the *first* token after env; in other words no need to support --some-option --split-string --more-options. >> [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939 >> [2] https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation >> >> YMMV? > > Kevin, any comments about the proposals in this bug report? Comments above; footnotes below. Again, thanks for the heads up. ¹ $ cat demo.sh #!/usr/bin/env -vS -uFOOBAR bash -eux echo hi echo $FOOBAR echo bye $ FOOBAR=totally-set ./demo.sh split -S: ‘ -uFOOBAR bash -eux’ into: ‘-uFOOBAR’ & ‘bash’ & ‘-eux’ unset: FOOBAR executing: bash arg[0]= ‘bash’ arg[1]= ‘-eux’ arg[2]= ‘./foo.sh’ + echo hi hi ./foo.sh: line 4: FOOBAR: unbound variable ² https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939#14 ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2024-02-10 10:23 ` bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Kévin Le Gouguec @ 2024-02-10 17:08 ` Kévin Le Gouguec 2024-02-10 17:23 ` Malcolm Cook 0 siblings, 1 reply; 20+ messages in thread From: Kévin Le Gouguec @ 2024-02-10 17:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Wilhelm Kirschbaum, Malcolm Cook, 64939 [-- Attachment #1: Type: text/plain, Size: 1061 bytes --] Kévin Le Gouguec <kevin.legouguec@gmail.com> writes: > Is this Good Enough™ for your purposes (Malcolm, Wilhelm), or should we > sophisticate the regexp further? FWIW, in no particular order: > > (a) env(1) does seem to support mixing up arbitrary options with -S¹, so > in principle it would make sense to support that; > > (b) Eli did not seem too found of the regexp hammer², so I don't know > which direction we'd want to go between maximally correct (accept > all arguments, _as long as_ -S|--split-string is in there) or good > enough (just skip over --everything --that --looks --like -a > --switch). > > (c) FWIW the "maximally correct" regexp might not be _that_ ugly, since > "-[v]S[OPTION]" must be the *first* token after env; in other words > no need to support --some-option --split-string --more-options. Well, sorry, couldn't resist. How do the attached patches look? The new testcases should tell the whole story. ('make && make -C test files-tests' seems none the worse for wear) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Refine-shebang-tests-bug-64939.patch --] [-- Type: text/x-patch, Size: 3494 bytes --] From 0a3dcfa9d5859c8a7ecd4679b748298b0f5d3597 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?K=C3=A9vin=20Le=20Gouguec?= <kevin.legouguec@gmail.com> Date: Sat, 10 Feb 2024 16:14:08 +0100 Subject: [PATCH 1/3] Refine shebang tests (bug#64939) * test/lisp/files-tests.el (files-tests--check-shebang): For shell-script modes, verify that the correct shell is set. (files-tests-auto-mode-interpreter): Prefer sh-base-mode to sh-mode to stay tree-sitter-agnostic; re-organize test cases to make future ones easier to add. --- test/lisp/files-tests.el | 45 ++++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/test/lisp/files-tests.el b/test/lisp/files-tests.el index 718ecd51f8b..23516ff0d7d 100644 --- a/test/lisp/files-tests.el +++ b/test/lisp/files-tests.el @@ -1656,30 +1656,39 @@ files-tests-file-name-base (should (equal (file-name-base "foo") "foo")) (should (equal (file-name-base "foo/bar") "bar"))) -(defun files-tests--check-shebang (shebang expected-mode) - "Assert that mode for SHEBANG derives from EXPECTED-MODE." - (let ((actual-mode - (ert-with-temp-file script-file - :text shebang - (find-file script-file) - (if (derived-mode-p expected-mode) - expected-mode - major-mode)))) - ;; Tuck all the information we need in the `should' form: input - ;; shebang, expected mode vs actual. - (should - (equal (list shebang actual-mode) - (list shebang expected-mode))))) +(defvar sh-shell) + +(defun files-tests--check-shebang (shebang expected-mode &optional expected-dialect) + "Assert that mode for SHEBANG derives from EXPECTED-MODE. + +If EXPECTED-MODE is sh-base-mode, DIALECT says what `sh-shell' should be +set to." + (ert-with-temp-file script-file + :text shebang + (find-file script-file) + (let ((actual-mode (if (derived-mode-p expected-mode) + expected-mode + major-mode))) + ;; Tuck all the information we need in the `should' form: input + ;; shebang, expected mode vs actual. + (should + (equal (list shebang actual-mode) + (list shebang expected-mode))) + (when (eq expected-mode 'sh-base-mode) + (should (eq sh-shell expected-dialect)))))) (ert-deftest files-tests-auto-mode-interpreter () "Test that `set-auto-mode' deduces correct modes from shebangs." - (files-tests--check-shebang "#!/bin/bash" 'sh-mode) - (files-tests--check-shebang "#!/usr/bin/env bash" 'sh-mode) + ;; Straightforward interpreter invocation. + (files-tests--check-shebang "#!/bin/bash" 'sh-base-mode 'bash) + (files-tests--check-shebang "#!/usr/bin/make -f" 'makefile-mode) + ;; Invocation through env. + (files-tests--check-shebang "#!/usr/bin/env bash" 'sh-base-mode 'bash) (files-tests--check-shebang "#!/usr/bin/env python" 'python-base-mode) (files-tests--check-shebang "#!/usr/bin/env python3" 'python-base-mode) + ;; Invocation through env, with supplementary arguments. (files-tests--check-shebang "#!/usr/bin/env -S awk -v FS=\"\\t\" -v OFS=\"\\t\" -f" 'awk-mode) - (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode) - (files-tests--check-shebang "#!/usr/bin/make -f" 'makefile-mode)) + (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode)) (ert-deftest files-test-dir-locals-auto-mode-alist () "Test an `auto-mode-alist' entry in `.dir-locals.el'" -- 2.43.0 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 0002-Support-more-complex-env-invocations-in-shebang-line.patch --] [-- Type: text/x-patch, Size: 3003 bytes --] From ec011e258fa3a7697dad631d82bbd53e5bc93f50 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?K=C3=A9vin=20Le=20Gouguec?= <kevin.legouguec@gmail.com> Date: Sat, 10 Feb 2024 17:37:35 +0100 Subject: [PATCH 2/3] Support more complex env invocations in shebang lines This is not an exact re-implementation of what env accepts, but hopefully it should be "good enough". Example of known limitation: we assume that arguments for --long-options will be set with '=', but that is not necessarily the case. '--unset' (mandatory argument) can be passed as '--unset=VAR' or '--unset VAR', but '--default-signal' (optional argument) requires an '=' sign. For bug#64939. * lisp/files.el (auto-mode-interpreter-regexp): Account for supplementary arguments passed beside -S/--split-string. * test/lisp/files-tests.el (files-tests-auto-mode-interpreter): Test some of these combinations. --- lisp/files.el | 8 +++++++- test/lisp/files-tests.el | 8 +++++++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/lisp/files.el b/lisp/files.el index f67b650cb92..5098d49048e 100644 --- a/lisp/files.el +++ b/lisp/files.el @@ -3274,7 +3274,13 @@ auto-mode-interpreter-regexp ;; Optional group 1: env(1) invocation. "\\(" "[^ \t\n]*/bin/env[ \t]*" - "\\(?:-S[ \t]*\\|--split-string\\(?:=\\|[ \t]*\\)\\)?" + ;; Within group 1: possible -S/--split-string. + "\\(?:" + ;; -S/--split-string + "\\(?:-[0a-z]*S[ \t]*\\|--split-string=\\)" + ;; More env arguments. + "\\(?:-[^ \t\n]+[ \t]+\\)*" + "\\)?" "\\)?" ;; Group 2: interpreter. "\\([^ \t\n]+\\)")) diff --git a/test/lisp/files-tests.el b/test/lisp/files-tests.el index 23516ff0d7d..0a5c3b897e4 100644 --- a/test/lisp/files-tests.el +++ b/test/lisp/files-tests.el @@ -1687,8 +1687,14 @@ files-tests-auto-mode-interpreter (files-tests--check-shebang "#!/usr/bin/env python" 'python-base-mode) (files-tests--check-shebang "#!/usr/bin/env python3" 'python-base-mode) ;; Invocation through env, with supplementary arguments. + (files-tests--check-shebang "#!/usr/bin/env --split-string=bash -eux" 'sh-base-mode 'bash) + (files-tests--check-shebang "#!/usr/bin/env --split-string=-iv --default-signal bash -eux" 'sh-base-mode 'bash) (files-tests--check-shebang "#!/usr/bin/env -S awk -v FS=\"\\t\" -v OFS=\"\\t\" -f" 'awk-mode) - (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode)) + (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode) + (files-tests--check-shebang "#!/usr/bin/env -S-vi bash -eux" 'sh-base-mode 'bash) + (files-tests--check-shebang "#!/usr/bin/env -ivS --default-signal=INT bash -eux" 'sh-base-mode 'bash) + (files-tests--check-shebang "#!/usr/bin/env -ivS --default-signal bash -eux" 'sh-base-mode 'bash) + (files-tests--check-shebang "#!/usr/bin/env -vS -uFOOBAR bash -eux" 'sh-base-mode 'bash)) (ert-deftest files-test-dir-locals-auto-mode-alist () "Test an `auto-mode-alist' entry in `.dir-locals.el'" -- 2.43.0 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #4: 0003-Support-shebang-lines-with-amended-environment.patch --] [-- Type: text/x-patch, Size: 2275 bytes --] From 9e24e7f000011105a89e5ce81cd9f16eb9ef15b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?K=C3=A9vin=20Le=20Gouguec?= <kevin.legouguec@gmail.com> Date: Sat, 10 Feb 2024 17:56:57 +0100 Subject: [PATCH 3/3] Support shebang lines with amended environment For bug#64939. * lisp/files.el (auto-mode-interpreter-regexp): Account for possible VARIABLE=[VALUE] operands. * test/lisp/files-tests.el (files-tests-auto-mode-interpreter): Add an example from the coreutils manual. --- lisp/files.el | 5 ++++- test/lisp/files-tests.el | 4 +++- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/lisp/files.el b/lisp/files.el index 5098d49048e..524385edc84 100644 --- a/lisp/files.el +++ b/lisp/files.el @@ -3274,12 +3274,15 @@ auto-mode-interpreter-regexp ;; Optional group 1: env(1) invocation. "\\(" "[^ \t\n]*/bin/env[ \t]*" - ;; Within group 1: possible -S/--split-string. + ;; Within group 1: possible -S/--split-string and environment + ;; adjustments. "\\(?:" ;; -S/--split-string "\\(?:-[0a-z]*S[ \t]*\\|--split-string=\\)" ;; More env arguments. "\\(?:-[^ \t\n]+[ \t]+\\)*" + ;; Interpreter environment modifications. + "\\(?:[^ \t\n]+=[^ \t\n]*[ \t]+\\)*" "\\)?" "\\)?" ;; Group 2: interpreter. diff --git a/test/lisp/files-tests.el b/test/lisp/files-tests.el index 0a5c3b897e4..d4c1ef3ba67 100644 --- a/test/lisp/files-tests.el +++ b/test/lisp/files-tests.el @@ -1694,7 +1694,9 @@ files-tests-auto-mode-interpreter (files-tests--check-shebang "#!/usr/bin/env -S-vi bash -eux" 'sh-base-mode 'bash) (files-tests--check-shebang "#!/usr/bin/env -ivS --default-signal=INT bash -eux" 'sh-base-mode 'bash) (files-tests--check-shebang "#!/usr/bin/env -ivS --default-signal bash -eux" 'sh-base-mode 'bash) - (files-tests--check-shebang "#!/usr/bin/env -vS -uFOOBAR bash -eux" 'sh-base-mode 'bash)) + (files-tests--check-shebang "#!/usr/bin/env -vS -uFOOBAR bash -eux" 'sh-base-mode 'bash) + ;; Invocation through env, with modified environment. + (files-tests--check-shebang "#!/usr/bin/env -S PYTHONPATH=/...:${PYTHONPATH} python" 'python-base-mode)) (ert-deftest files-test-dir-locals-auto-mode-alist () "Test an `auto-mode-alist' entry in `.dir-locals.el'" -- 2.43.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2024-02-10 17:08 ` Kévin Le Gouguec @ 2024-02-10 17:23 ` Malcolm Cook 2024-02-17 8:33 ` Eli Zaretskii 0 siblings, 1 reply; 20+ messages in thread From: Malcolm Cook @ 2024-02-10 17:23 UTC (permalink / raw) To: Kévin Le Gouguec; +Cc: Wilhelm Kirschbaum, 64939, Eli Zaretskii Hooray - thanks - all seems perfect to me On Sat, Feb 10, 2024 at 11:08 AM Kévin Le Gouguec <kevin.legouguec@gmail.com> wrote: > > Kévin Le Gouguec <kevin.legouguec@gmail.com> writes: > > > Is this Good Enough™ for your purposes (Malcolm, Wilhelm), or should we > > sophisticate the regexp further? FWIW, in no particular order: > > > > (a) env(1) does seem to support mixing up arbitrary options with -S¹, so > > in principle it would make sense to support that; > > > > (b) Eli did not seem too found of the regexp hammer², so I don't know > > which direction we'd want to go between maximally correct (accept > > all arguments, _as long as_ -S|--split-string is in there) or good > > enough (just skip over --everything --that --looks --like -a > > --switch). > > > > (c) FWIW the "maximally correct" regexp might not be _that_ ugly, since > > "-[v]S[OPTION]" must be the *first* token after env; in other words > > no need to support --some-option --split-string --more-options. > > Well, sorry, couldn't resist. How do the attached patches look? The > new testcases should tell the whole story. > > ('make && make -C test files-tests' seems none the worse for wear) > -- ~ Malcolm Cook ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2024-02-10 17:23 ` Malcolm Cook @ 2024-02-17 8:33 ` Eli Zaretskii 2024-02-28 17:57 ` Wilhelm Kirschbaum 0 siblings, 1 reply; 20+ messages in thread From: Eli Zaretskii @ 2024-02-17 8:33 UTC (permalink / raw) To: Malcolm Cook; +Cc: wkirschbaum, 64939-done, kevin.legouguec > From: Malcolm Cook <malcolm.cook@gmail.com> > Date: Sat, 10 Feb 2024 11:23:11 -0600 > Cc: Eli Zaretskii <eliz@gnu.org>, Wilhelm Kirschbaum <wkirschbaum@gmail.com>, 64939@debbugs.gnu.org > > Hooray - thanks - all seems perfect to me Thanks, so I've now installed these changes on the master branch, and I'm therefore closing this bug. ^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags 2024-02-17 8:33 ` Eli Zaretskii @ 2024-02-28 17:57 ` Wilhelm Kirschbaum 0 siblings, 0 replies; 20+ messages in thread From: Wilhelm Kirschbaum @ 2024-02-28 17:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Malcolm Cook, 64939-done, kevin.legouguec [-- Attachment #1: Type: text/plain, Size: 240 bytes --] > Thanks, so I've now installed these changes on the master branch, and > I'm therefore closing this bug. > Thank you, this works! ( I am not sure if it's fine for me to reply to a closed bug, but just wanted to express my gratitude :) ). [-- Attachment #2: Type: text/html, Size: 500 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-02-28 17:57 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-29 20:08 bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Wilhelm Kirschbaum 2023-07-29 21:38 ` Wilhelm Kirschbaum 2023-07-30 5:04 ` Eli Zaretskii 2023-07-30 9:38 ` Wilhelm Kirschbaum 2023-07-30 10:04 ` Eli Zaretskii 2023-07-31 7:11 ` Wilhelm Kirschbaum 2023-07-31 17:38 ` Juri Linkov 2023-08-01 6:20 ` Wilhelm Kirschbaum 2023-07-30 4:53 ` Eli Zaretskii 2023-07-30 8:28 ` Wilhelm Kirschbaum 2023-07-30 10:03 ` Eli Zaretskii 2023-07-30 10:27 ` Wilhelm Kirschbaum 2024-01-31 19:52 ` bug#64939: Malcolm Cook 2024-02-01 18:52 ` bug#64939: Malcolm Cook 2024-02-10 8:27 ` bug#64939: Eli Zaretskii 2024-02-10 10:23 ` bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Kévin Le Gouguec 2024-02-10 17:08 ` Kévin Le Gouguec 2024-02-10 17:23 ` Malcolm Cook 2024-02-17 8:33 ` Eli Zaretskii 2024-02-28 17:57 ` Wilhelm Kirschbaum
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).