unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
@ 2023-07-29 20:08 Wilhelm Kirschbaum
  2023-07-29 21:38 ` Wilhelm Kirschbaum
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Wilhelm Kirschbaum @ 2023-07-29 20:08 UTC (permalink / raw)
  To: 64939


A file without an extension will load ruby-mode if the first line 
is:

    #!/usr/bin/env ruby

but not when the first line is:

    #!/usr/bin/env -S ruby -e 'puts 123'

Is there any reason why the latter should not be matched by the 
default
`auto-mode-interpreter-regexp' value in 'files.el'?


A more useful example I stumbled on today while working on a 
language
server after adding:

`(add-to-list 'interpreter-mode-alist '("elixir" 
. elixir-ts-mode))'


    #!/usr/bin/env -S elixir --erl "-kernel standard_io_encoding 
    latin1"

    Node.start(:"next-ls-#{System.system_time()}", :shortnames)
    ....



Wilhelm





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-29 20:08 bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Wilhelm Kirschbaum
@ 2023-07-29 21:38 ` Wilhelm Kirschbaum
  2023-07-30  5:04   ` Eli Zaretskii
  2023-07-30  4:53 ` Eli Zaretskii
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Wilhelm Kirschbaum @ 2023-07-29 21:38 UTC (permalink / raw)
  To: 64939

> A file without an extension will load ruby-mode if the first 
> line is:
>
>    #!/usr/bin/env ruby
>
> but not when the first line is:
>
>    #!/usr/bin/env -S ruby -e 'puts 123'
>
> Is there any reason why the latter should not be matched by the
> default
> `auto-mode-interpreter-regexp' value in 'files.el'?
>
>
> A more useful example I stumbled on today while working on a 
> language
> server after adding:
>
> `(add-to-list 'interpreter-mode-alist '("elixir" 
> . elixir-ts-mode))'
>
>
>    #!/usr/bin/env -S elixir --erl "-kernel standard_io_encoding
>      latin1"
>
>    Node.start(:"next-ls-#{System.system_time()}", :shortnames)
>    ....
>
>
>
> Wilhelm

This is a very naive solution to the above, but I am probably 
missing some
knowledge here and will break for anyone setting the var to 
something
custom.

modified   lisp/files.el
@@ -3243,7 +3243,7 @@ inhibit-local-variables-p
 
 (defvar auto-mode-interpreter-regexp
   (purecopy "#![ \t]?\\([^ \t\n]*\
-/bin/env[ \t]\\)?\\([^ \t\n]+\\)")
+/bin/env[ \t]\\)?\\(-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)?\\([^ 
\t\n]+\\)")
   "Regexp matching interpreters, for file mode determination.
 This regular expression is matched against the first line of a 
 file
 to determine the file's mode in `set-auto-mode'.  If it matches, 
 the file
@@ -3445,7 +3445,7 @@ set-auto-mode
 	 (setq mode (save-excursion
 		      (goto-char (point-min))
 		      (if (looking-at 
 auto-mode-interpreter-regexp)
-			  (match-string 2))))
+			  (match-string 3))))
 	 ;; Map interpreter name to a mode, signaling we're done 
 at the
 	 ;; same time.
 	 (setq done (assoc-default

modified   lisp/progmodes/sh-script.el
@@ -1481,7 +1481,7 @@ sh--guess-shell
   (cond ((save-excursion
            (goto-char (point-min))
            (looking-at auto-mode-interpreter-regexp))
-         (match-string 2))
+         (match-string 3))
         ((not buffer-file-name) sh-shell-file)
         ;; Checks that use `buffer-file-name' follow.
         ((string-match "\\.m?spec\\'" buffer-file-name) "rpm")





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-29 20:08 bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Wilhelm Kirschbaum
  2023-07-29 21:38 ` Wilhelm Kirschbaum
@ 2023-07-30  4:53 ` Eli Zaretskii
  2023-07-30  8:28   ` Wilhelm Kirschbaum
  2024-01-31 19:52 ` bug#64939: Malcolm Cook
  2024-02-01 18:52 ` bug#64939: Malcolm Cook
  3 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2023-07-30  4:53 UTC (permalink / raw)
  To: Wilhelm Kirschbaum; +Cc: 64939

> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com>
> Date: Sat, 29 Jul 2023 22:08:19 +0200
> 
> 
> A file without an extension will load ruby-mode if the first line 
> is:
> 
>     #!/usr/bin/env ruby
> 
> but not when the first line is:
> 
>     #!/usr/bin/env -S ruby -e 'puts 123'
> 
> Is there any reason why the latter should not be matched by the 
> default
> `auto-mode-interpreter-regexp' value in 'files.el'?

That line _is_ matched by auto-mode-interpreter-regexp:

  (string-match-p auto-mode-interpreter-regexp "#!/usr/bin/env -S ruby -e 'puts 123'") => 0

The problem is how to find the name of the interpreter if the text
after "/usr/bin/env" includes more than one word?  Once we start using
command-line switches and their arguments, and take into consideration
that many GNU/Linux programs can freely intersperse options and
non-option arguments on the command line in any order, where does this
end?

> A more useful example I stumbled on today while working on a 
> language
> server after adding:
> 
> `(add-to-list 'interpreter-mode-alist '("elixir" 
> . elixir-ts-mode))'
> 
> 
>     #!/usr/bin/env -S elixir --erl "-kernel standard_io_encoding 
>     latin1"

How about making a script that invokes elixir with those arguments,
and then augment interpreter-mode-alist to name that script instead?





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-29 21:38 ` Wilhelm Kirschbaum
@ 2023-07-30  5:04   ` Eli Zaretskii
  2023-07-30  9:38     ` Wilhelm Kirschbaum
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2023-07-30  5:04 UTC (permalink / raw)
  To: Wilhelm Kirschbaum; +Cc: 64939

> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com>
> Date: Sat, 29 Jul 2023 23:38:07 +0200
> 
> This is a very naive solution to the above, but I am probably
> missing some knowledge here and will break for anyone setting the
> var to something custom.

Feel free to make this change locally, but I don't see how this can be
general enough for us to install it as the default value.

For starters, 'env' can be invoked with several options, not just with
one.  Also, some 'env' options accept arguments, and how do we know if
the word that follows "env -OPTION" is the command to check against
interpreter-mode-alist or an argument of an option?

IOW, I don't think this is a problem for a regexp-based solution.  If
we want to support such complex shebang lines (btw, does the Posix or
GNU/Linux shell support them?), we should analyze the text after "env"
to find the candidate interpreter.  Not sure whether even that will
provide a robust solution.

Btw, can't you satisfy your needs via file-local variables?





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-30  4:53 ` Eli Zaretskii
@ 2023-07-30  8:28   ` Wilhelm Kirschbaum
  2023-07-30 10:03     ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Wilhelm Kirschbaum @ 2023-07-30  8:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64939


Eli Zaretskii <eliz@gnu.org> writes:

>> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com>
>> Date: Sat, 29 Jul 2023 22:08:19 +0200
>> 
>> 
>> A file without an extension will load ruby-mode if the first 
>> line 
>> is:
>> 
>>     #!/usr/bin/env ruby
>> 
>> but not when the first line is:
>> 
>>     #!/usr/bin/env -S ruby -e 'puts 123'
>> 
>> Is there any reason why the latter should not be matched by the 
>> default
>> `auto-mode-interpreter-regexp' value in 'files.el'?
>
> That line _is_ matched by auto-mode-interpreter-regexp:
>
>   (string-match-p auto-mode-interpreter-regexp "#!/usr/bin/env 
>   -S ruby -e 'puts 123'") => 0
>
> The problem is how to find the name of the interpreter if the 
> text
> after "/usr/bin/env" includes more than one word?  Once we start 
> using
> command-line switches and their arguments, and take into 
> consideration
> that many GNU/Linux programs can freely intersperse options and
> non-option arguments on the command line in any order, where 
> does this
> end?
>

I am hoping there is some way of effectively matching command-line
switches for '/usr/bin/env', but sounds like it is perhaps too 
complex?

This probably ends with a match to some common command-line 
variations
for the '/usr/bin/env' program.  It is not complete now, so make 
it
guess slightly better is perhaps appropriate?

>> A more useful example I stumbled on today while working on a 
>> language
>> server after adding:
>> 
>> `(add-to-list 'interpreter-mode-alist '("elixir" 
>> . elixir-ts-mode))'
>> 
>> 
>>     #!/usr/bin/env -S elixir --erl "-kernel 
>>     standard_io_encoding 
>>     latin1"
>
> How about making a script that invokes elixir with those 
> arguments,
> and then augment interpreter-mode-alist to name that script 
> instead?

That is an option, but then I have to convince the maintainers of
all the packages I work on to make this change, vs my editor 
handling this.
Even locally changing the regexp means that the match index has 
changed, so
can't see this working without a patch.

If this is perhaps too much work and/or uncertainty for negligible
benefit to users, then it won't cause distress on my side :).

Wilhelm





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-30  5:04   ` Eli Zaretskii
@ 2023-07-30  9:38     ` Wilhelm Kirschbaum
  2023-07-30 10:04       ` Eli Zaretskii
  2023-07-31 17:38       ` Juri Linkov
  0 siblings, 2 replies; 20+ messages in thread
From: Wilhelm Kirschbaum @ 2023-07-30  9:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64939


Eli Zaretskii <eliz@gnu.org> writes:

>> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com>
>> Date: Sat, 29 Jul 2023 23:38:07 +0200
>> 
>> This is a very naive solution to the above, but I am probably
>> missing some knowledge here and will break for anyone setting 
>> the
>> var to something custom.
>
> Feel free to make this change locally, but I don't see how this 
> can be
> general enough for us to install it as the default value.
>

The problem is that even with a local change, the match group 2 is 
hard
coded. 

> For starters, 'env' can be invoked with several options, not 
> just with
> one.  Also, some 'env' options accept arguments, and how do we 
> know if
> the word that follows "env -OPTION" is the command to check 
> against
> interpreter-mode-alist or an argument of an option?
>

Understand, it is a bit complex perhaps. 

> IOW, I don't think this is a problem for a regexp-based 
> solution.  If
> we want to support such complex shebang lines (btw, does the 
> Posix or
> GNU/Linux shell support them?), we should analyze the text after 
> "env"
> to find the candidate interpreter.  Not sure whether even that 
> will
> provide a robust solution.
>

I can perhaps have a look if there is something concrete about how 
this
can be interpreted. 

> Btw, can't you satisfy your needs via file-local variables?

Not without convincing the project maintainers to add Emacs 
specific
lines to the code, which I don't really think is appropriate in 
some
cases.

It is not a common occurrence to run into this issue, but thought 
is
strange to not work as expected.  If I am the only one, then happy 
to
close this and just keep a local patch.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-30  8:28   ` Wilhelm Kirschbaum
@ 2023-07-30 10:03     ` Eli Zaretskii
  2023-07-30 10:27       ` Wilhelm Kirschbaum
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2023-07-30 10:03 UTC (permalink / raw)
  To: Wilhelm Kirschbaum; +Cc: 64939

> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com>
> Cc: 64939@debbugs.gnu.org
> Date: Sun, 30 Jul 2023 10:28:14 +0200
> 
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > That line _is_ matched by auto-mode-interpreter-regexp:
> >
> >   (string-match-p auto-mode-interpreter-regexp "#!/usr/bin/env 
> >   -S ruby -e 'puts 123'") => 0
> >
> > The problem is how to find the name of the interpreter if the text
> > after "/usr/bin/env" includes more than one word?  Once we start
> > using command-line switches and their arguments, and take into
> > consideration that many GNU/Linux programs can freely intersperse
> > options and non-option arguments on the command line in any order,
> > where does this end?
> 
> I am hoping there is some way of effectively matching command-line
> switches for '/usr/bin/env', but sounds like it is perhaps too 
> complex?

How can we do that without incurring non-trivial maintenance costs?
'env' is being actively developed; e.g., the old version I have where
I'm typing this doesn't even have the -S option.  Are we supposed to
track every new command-line option added to 'env', and update our
code accordingly?  That is even impractical, because someone could use
2-year old Emacs with Coreutils released just yesterday.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-30  9:38     ` Wilhelm Kirschbaum
@ 2023-07-30 10:04       ` Eli Zaretskii
  2023-07-31  7:11         ` Wilhelm Kirschbaum
  2023-07-31 17:38       ` Juri Linkov
  1 sibling, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2023-07-30 10:04 UTC (permalink / raw)
  To: Wilhelm Kirschbaum; +Cc: 64939

> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com>
> Cc: 64939@debbugs.gnu.org
> Date: Sun, 30 Jul 2023 11:38:06 +0200
> 
> > Btw, can't you satisfy your needs via file-local variables?
> 
> Not without convincing the project maintainers to add Emacs specific
> lines to the code, which I don't really think is appropriate in some
> cases.

Why not?  Many projects add such variables, both for Emacs and for
Vim.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-30 10:03     ` Eli Zaretskii
@ 2023-07-30 10:27       ` Wilhelm Kirschbaum
  0 siblings, 0 replies; 20+ messages in thread
From: Wilhelm Kirschbaum @ 2023-07-30 10:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64939


Eli Zaretskii <eliz@gnu.org> writes:

>> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com>
>> Cc: 64939@debbugs.gnu.org
>> Date: Sun, 30 Jul 2023 10:28:14 +0200
>> 
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > That line _is_ matched by auto-mode-interpreter-regexp:
>> >
>> >   (string-match-p auto-mode-interpreter-regexp 
>> >   "#!/usr/bin/env 
>> >   -S ruby -e 'puts 123'") => 0
>> >
>> > The problem is how to find the name of the interpreter if the 
>> > text
>> > after "/usr/bin/env" includes more than one word?  Once we 
>> > start
>> > using command-line switches and their arguments, and take 
>> > into
>> > consideration that many GNU/Linux programs can freely 
>> > intersperse
>> > options and non-option arguments on the command line in any 
>> > order,
>> > where does this end?
>> 
>> I am hoping there is some way of effectively matching 
>> command-line
>> switches for '/usr/bin/env', but sounds like it is perhaps too 
>> complex?
>
> How can we do that without incurring non-trivial maintenance 
> costs?
> 'env' is being actively developed; e.g., the old version I have 
> where
> I'm typing this doesn't even have the -S option.  Are we 
> supposed to
> track every new command-line option added to 'env', and update 
> our
> code accordingly?  That is even impractical, because someone 
> could use
> 2-year old Emacs with Coreutils released just yesterday.

No, I don't think we should track every command-line option added, 
but
just allow to match command-line options for env ( maybe this is 
the
non-trivial part).  I don't understand why it will have an impact 
for
someone who is using a version 5+ year old version of Coreutils.

If it was possible for a user to add configuration to 
set-auto-mode
script files like the ones mentioned above I think this will be 
less of
an issue.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-30 10:04       ` Eli Zaretskii
@ 2023-07-31  7:11         ` Wilhelm Kirschbaum
  0 siblings, 0 replies; 20+ messages in thread
From: Wilhelm Kirschbaum @ 2023-07-31  7:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 64939


Eli Zaretskii <eliz@gnu.org> writes:

>> From: Wilhelm Kirschbaum <wkirschbaum@gmail.com>
>> Cc: 64939@debbugs.gnu.org
>> Date: Sun, 30 Jul 2023 11:38:06 +0200
>> 
>> > Btw, can't you satisfy your needs via file-local variables?
>> 
>> Not without convincing the project maintainers to add Emacs 
>> specific
>> lines to the code, which I don't really think is appropriate in 
>> some
>> cases.
>
> Why not?  Many projects add such variables, both for Emacs and 
> for
> Vim.

Sure, but its certainly not common on projects I work on where the 
vast
majority of the contributors don't care about Emacs.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-30  9:38     ` Wilhelm Kirschbaum
  2023-07-30 10:04       ` Eli Zaretskii
@ 2023-07-31 17:38       ` Juri Linkov
  2023-08-01  6:20         ` Wilhelm Kirschbaum
  1 sibling, 1 reply; 20+ messages in thread
From: Juri Linkov @ 2023-07-31 17:38 UTC (permalink / raw)
  To: Wilhelm Kirschbaum; +Cc: 64939, Eli Zaretskii

> The problem is that even with a local change, the match group 2 is hard
> coded.

You could use a shy group \(?: ... \)
in your customized value of 'auto-mode-interpreter-regexp'
that doesn't change the match index from 2 to 3.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2023-07-31 17:38       ` Juri Linkov
@ 2023-08-01  6:20         ` Wilhelm Kirschbaum
  0 siblings, 0 replies; 20+ messages in thread
From: Wilhelm Kirschbaum @ 2023-08-01  6:20 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 64939, Eli Zaretskii


Juri Linkov <juri@linkov.net> writes:

>> The problem is that even with a local change, the match group 2 
>> is hard
>> coded.
>
> You could use a shy group \(?: ... \)
> in your customized value of 'auto-mode-interpreter-regexp'
> that doesn't change the match index from 2 to 3.


Fantastic, thanks.  I was not aware that existed. 





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939:
  2023-07-29 20:08 bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Wilhelm Kirschbaum
  2023-07-29 21:38 ` Wilhelm Kirschbaum
  2023-07-30  4:53 ` Eli Zaretskii
@ 2024-01-31 19:52 ` Malcolm Cook
  2024-02-01 18:52 ` bug#64939: Malcolm Cook
  3 siblings, 0 replies; 20+ messages in thread
From: Malcolm Cook @ 2024-01-31 19:52 UTC (permalink / raw)
  To: 64939

[-- Attachment #1: Type: text/plain, Size: 539 bytes --]

I find allowing the shy regexp to match zero or more times
(using a '*' instead of '?') solves not only the use case of
including -S as an option, but also can support other options to env.

I have this now in my init.el:

(setq auto-mode-interpreter-regexp
;; support -S and other options to /bin/env.  See
;; https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
  (purecopy "#![ \t]?\\([^ \t\n]*\
/bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^
\t\n]+\\)"))

Hooray?

Why not patch lisp/files.el accordingly?

~ Malcolm Cook

[-- Attachment #2: Type: text/html, Size: 706 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939:
  2023-07-29 20:08 bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Wilhelm Kirschbaum
                   ` (2 preceding siblings ...)
  2024-01-31 19:52 ` bug#64939: Malcolm Cook
@ 2024-02-01 18:52 ` Malcolm Cook
  2024-02-10  8:27   ` bug#64939: Eli Zaretskii
  3 siblings, 1 reply; 20+ messages in thread
From: Malcolm Cook @ 2024-02-01 18:52 UTC (permalink / raw)
  To: 64939

Regarding [1] allowing emacs to recognize shebang lines containing
calls to /bin/env with options (such as -S as allowed in new core
utils [2])...

I prefer allowing the proposed "shy" regexp to match zero or more
times (using a '*' instead of '?').

To wit, I have this now in my init.el:

(setq auto-mode-interpreter-regexp
      ;; Support shbang line calling `/bin/env` with `-S` (and/or
other options).
      ;; c.f. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
      (purecopy "#![ \t]?\\([^ \t\n]*\
/bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^
\t\n]+\\)"))

[1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
[2] https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation

YMMV?

~ Malcolm Cook





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939:
  2024-02-01 18:52 ` bug#64939: Malcolm Cook
@ 2024-02-10  8:27   ` Eli Zaretskii
  2024-02-10 10:23     ` bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Kévin Le Gouguec
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2024-02-10  8:27 UTC (permalink / raw)
  To: Malcolm Cook, Kévin Le Gouguec; +Cc: 64939

> From: Malcolm Cook <malcolm.cook@gmail.com>
> Date: Thu, 1 Feb 2024 12:52:39 -0600
> 
> Regarding [1] allowing emacs to recognize shebang lines containing
> calls to /bin/env with options (such as -S as allowed in new core
> utils [2])...
> 
> I prefer allowing the proposed "shy" regexp to match zero or more
> times (using a '*' instead of '?').
> 
> To wit, I have this now in my init.el:
> 
> (setq auto-mode-interpreter-regexp
>       ;; Support shbang line calling `/bin/env` with `-S` (and/or
> other options).
>       ;; c.f. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
>       (purecopy "#![ \t]?\\([^ \t\n]*\
> /bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^
> \t\n]+\\)"))
> 
> [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
> [2] https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation
> 
> YMMV?

Kevin, any comments about the proposals in this bug report?





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2024-02-10  8:27   ` bug#64939: Eli Zaretskii
@ 2024-02-10 10:23     ` Kévin Le Gouguec
  2024-02-10 17:08       ` Kévin Le Gouguec
  0 siblings, 1 reply; 20+ messages in thread
From: Kévin Le Gouguec @ 2024-02-10 10:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Wilhelm Kirschbaum, Malcolm Cook, 64939

Thanks for the CC, this report had completely slipped past my notice
when I worked on bug#66902, and so did Malcolm's follow-ups.

Boldly adding Wilhelm as well, since I am not 100% sure Debbugs sends a
copy of every message in a report to their OP.

Comments below.

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Malcolm Cook <malcolm.cook@gmail.com>
>> Date: Thu, 1 Feb 2024 12:52:39 -0600
>> 
>> Regarding [1] allowing emacs to recognize shebang lines containing
>> calls to /bin/env with options (such as -S as allowed in new core
>> utils [2])...
>> 
>> I prefer allowing the proposed "shy" regexp to match zero or more
>> times (using a '*' instead of '?').
>> 
>> To wit, I have this now in my init.el:
>> 
>> (setq auto-mode-interpreter-regexp
>>       ;; Support shbang line calling `/bin/env` with `-S` (and/or
>> other options).
>>       ;; c.f. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
>>       (purecopy "#![ \t]?\\([^ \t\n]*\
>> /bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^
>> \t\n]+\\)"))

IIUC this would be a more lax variant of what we installed for
bug#66902, can you confirm Malcolm?  This is what the current regexp
looks like on the master branch:

  (purecopy
   (concat
    "#![ \t]*"
    ;; Optional group 1: env(1) invocation.
    "\\("
    "[^ \t\n]*/bin/env[ \t]*"
    "\\(?:-S[ \t]*\\|--split-string\\(?:=\\|[ \t]*\\)\\)?"
    "\\)?"
    ;; Group 2: interpreter.
    "\\([^ \t\n]+\\)"))

And the corresponding test cases:

(ert-deftest files-tests-auto-mode-interpreter ()
  "Test that `set-auto-mode' deduces correct modes from shebangs."
  (files-tests--check-shebang "#!/bin/bash" 'sh-mode)
  (files-tests--check-shebang "#!/usr/bin/env bash" 'sh-mode)
  (files-tests--check-shebang "#!/usr/bin/env python" 'python-base-mode)
  (files-tests--check-shebang "#!/usr/bin/env python3" 'python-base-mode)
  (files-tests--check-shebang "#!/usr/bin/env -S awk -v FS=\"\\t\" -v OFS=\"\\t\" -f" 'awk-mode)
  (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode)
  (files-tests--check-shebang "#!/usr/bin/make -f" 'makefile-mode))

Is this Good Enough™ for your purposes (Malcolm, Wilhelm), or should we
sophisticate the regexp further?  FWIW, in no particular order:

(a) env(1) does seem to support mixing up arbitrary options with -S¹, so
    in principle it would make sense to support that;

(b) Eli did not seem too found of the regexp hammer², so I don't know
    which direction we'd want to go between maximally correct (accept
    all arguments, _as long as_ -S|--split-string is in there) or good
    enough (just skip over --everything --that --looks --like -a
    --switch).

(c) FWIW the "maximally correct" regexp might not be _that_ ugly, since
    "-[v]S[OPTION]" must be the *first* token after env; in other words
    no need to support --some-option --split-string --more-options.

>> [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
>> [2] https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation
>> 
>> YMMV?
>
> Kevin, any comments about the proposals in this bug report?

Comments above; footnotes below.  Again, thanks for the heads up.

¹ $ cat demo.sh
  #!/usr/bin/env -vS -uFOOBAR bash -eux

  echo hi
  echo $FOOBAR
  echo bye
  $ FOOBAR=totally-set ./demo.sh
  split -S:  ‘ -uFOOBAR bash -eux’
   into:    ‘-uFOOBAR’
       &    ‘bash’
       &    ‘-eux’
  unset:    FOOBAR
  executing: bash
     arg[0]= ‘bash’
     arg[1]= ‘-eux’
     arg[2]= ‘./foo.sh’
  + echo hi
  hi
  ./foo.sh: line 4: FOOBAR: unbound variable

² https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939#14





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2024-02-10 10:23     ` bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Kévin Le Gouguec
@ 2024-02-10 17:08       ` Kévin Le Gouguec
  2024-02-10 17:23         ` Malcolm Cook
  0 siblings, 1 reply; 20+ messages in thread
From: Kévin Le Gouguec @ 2024-02-10 17:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Wilhelm Kirschbaum, Malcolm Cook, 64939

[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]

Kévin Le Gouguec <kevin.legouguec@gmail.com> writes:

> Is this Good Enough™ for your purposes (Malcolm, Wilhelm), or should we
> sophisticate the regexp further?  FWIW, in no particular order:
>
> (a) env(1) does seem to support mixing up arbitrary options with -S¹, so
>     in principle it would make sense to support that;
>
> (b) Eli did not seem too found of the regexp hammer², so I don't know
>     which direction we'd want to go between maximally correct (accept
>     all arguments, _as long as_ -S|--split-string is in there) or good
>     enough (just skip over --everything --that --looks --like -a
>     --switch).
>
> (c) FWIW the "maximally correct" regexp might not be _that_ ugly, since
>     "-[v]S[OPTION]" must be the *first* token after env; in other words
>     no need to support --some-option --split-string --more-options.

Well, sorry, couldn't resist.  How do the attached patches look?  The
new testcases should tell the whole story.

('make && make -C test files-tests' seems none the worse for wear)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Refine-shebang-tests-bug-64939.patch --]
[-- Type: text/x-patch, Size: 3494 bytes --]

From 0a3dcfa9d5859c8a7ecd4679b748298b0f5d3597 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?K=C3=A9vin=20Le=20Gouguec?= <kevin.legouguec@gmail.com>
Date: Sat, 10 Feb 2024 16:14:08 +0100
Subject: [PATCH 1/3] Refine shebang tests (bug#64939)

* test/lisp/files-tests.el (files-tests--check-shebang): For
shell-script modes, verify that the correct shell is set.
(files-tests-auto-mode-interpreter): Prefer sh-base-mode to
sh-mode to stay tree-sitter-agnostic; re-organize test cases to
make future ones easier to add.
---
 test/lisp/files-tests.el | 45 ++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/test/lisp/files-tests.el b/test/lisp/files-tests.el
index 718ecd51f8b..23516ff0d7d 100644
--- a/test/lisp/files-tests.el
+++ b/test/lisp/files-tests.el
@@ -1656,30 +1656,39 @@ files-tests-file-name-base
   (should (equal (file-name-base "foo") "foo"))
   (should (equal (file-name-base "foo/bar") "bar")))
 
-(defun files-tests--check-shebang (shebang expected-mode)
-  "Assert that mode for SHEBANG derives from EXPECTED-MODE."
-  (let ((actual-mode
-         (ert-with-temp-file script-file
-           :text shebang
-           (find-file script-file)
-           (if (derived-mode-p expected-mode)
-               expected-mode
-             major-mode))))
-    ;; Tuck all the information we need in the `should' form: input
-    ;; shebang, expected mode vs actual.
-    (should
-     (equal (list shebang actual-mode)
-            (list shebang expected-mode)))))
+(defvar sh-shell)
+
+(defun files-tests--check-shebang (shebang expected-mode &optional expected-dialect)
+  "Assert that mode for SHEBANG derives from EXPECTED-MODE.
+
+If EXPECTED-MODE is sh-base-mode, DIALECT says what `sh-shell' should be
+set to."
+  (ert-with-temp-file script-file
+    :text shebang
+    (find-file script-file)
+    (let ((actual-mode (if (derived-mode-p expected-mode)
+                           expected-mode
+                         major-mode)))
+      ;; Tuck all the information we need in the `should' form: input
+      ;; shebang, expected mode vs actual.
+      (should
+       (equal (list shebang actual-mode)
+              (list shebang expected-mode)))
+      (when (eq expected-mode 'sh-base-mode)
+        (should (eq sh-shell expected-dialect))))))
 
 (ert-deftest files-tests-auto-mode-interpreter ()
   "Test that `set-auto-mode' deduces correct modes from shebangs."
-  (files-tests--check-shebang "#!/bin/bash" 'sh-mode)
-  (files-tests--check-shebang "#!/usr/bin/env bash" 'sh-mode)
+  ;; Straightforward interpreter invocation.
+  (files-tests--check-shebang "#!/bin/bash" 'sh-base-mode 'bash)
+  (files-tests--check-shebang "#!/usr/bin/make -f" 'makefile-mode)
+  ;; Invocation through env.
+  (files-tests--check-shebang "#!/usr/bin/env bash" 'sh-base-mode 'bash)
   (files-tests--check-shebang "#!/usr/bin/env python" 'python-base-mode)
   (files-tests--check-shebang "#!/usr/bin/env python3" 'python-base-mode)
+  ;; Invocation through env, with supplementary arguments.
   (files-tests--check-shebang "#!/usr/bin/env -S awk -v FS=\"\\t\" -v OFS=\"\\t\" -f" 'awk-mode)
-  (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode)
-  (files-tests--check-shebang "#!/usr/bin/make -f" 'makefile-mode))
+  (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode))
 
 (ert-deftest files-test-dir-locals-auto-mode-alist ()
   "Test an `auto-mode-alist' entry in `.dir-locals.el'"
-- 
2.43.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-Support-more-complex-env-invocations-in-shebang-line.patch --]
[-- Type: text/x-patch, Size: 3003 bytes --]

From ec011e258fa3a7697dad631d82bbd53e5bc93f50 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?K=C3=A9vin=20Le=20Gouguec?= <kevin.legouguec@gmail.com>
Date: Sat, 10 Feb 2024 17:37:35 +0100
Subject: [PATCH 2/3] Support more complex env invocations in shebang lines

This is not an exact re-implementation of what env accepts, but
hopefully it should be "good enough".

Example of known limitation: we assume that arguments for
--long-options will be set with '=', but that is not
necessarily the case.  '--unset' (mandatory argument) can be
passed as '--unset=VAR' or '--unset VAR', but
'--default-signal' (optional argument) requires an '=' sign.

For bug#64939.

* lisp/files.el (auto-mode-interpreter-regexp): Account for
supplementary arguments passed beside -S/--split-string.
* test/lisp/files-tests.el (files-tests-auto-mode-interpreter):
Test some of these combinations.
---
 lisp/files.el            | 8 +++++++-
 test/lisp/files-tests.el | 8 +++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/lisp/files.el b/lisp/files.el
index f67b650cb92..5098d49048e 100644
--- a/lisp/files.el
+++ b/lisp/files.el
@@ -3274,7 +3274,13 @@ auto-mode-interpreter-regexp
     ;; Optional group 1: env(1) invocation.
     "\\("
     "[^ \t\n]*/bin/env[ \t]*"
-    "\\(?:-S[ \t]*\\|--split-string\\(?:=\\|[ \t]*\\)\\)?"
+    ;; Within group 1: possible -S/--split-string.
+    "\\(?:"
+    ;; -S/--split-string
+    "\\(?:-[0a-z]*S[ \t]*\\|--split-string=\\)"
+    ;; More env arguments.
+    "\\(?:-[^ \t\n]+[ \t]+\\)*"
+    "\\)?"
     "\\)?"
     ;; Group 2: interpreter.
     "\\([^ \t\n]+\\)"))
diff --git a/test/lisp/files-tests.el b/test/lisp/files-tests.el
index 23516ff0d7d..0a5c3b897e4 100644
--- a/test/lisp/files-tests.el
+++ b/test/lisp/files-tests.el
@@ -1687,8 +1687,14 @@ files-tests-auto-mode-interpreter
   (files-tests--check-shebang "#!/usr/bin/env python" 'python-base-mode)
   (files-tests--check-shebang "#!/usr/bin/env python3" 'python-base-mode)
   ;; Invocation through env, with supplementary arguments.
+  (files-tests--check-shebang "#!/usr/bin/env --split-string=bash -eux" 'sh-base-mode 'bash)
+  (files-tests--check-shebang "#!/usr/bin/env --split-string=-iv --default-signal bash -eux" 'sh-base-mode 'bash)
   (files-tests--check-shebang "#!/usr/bin/env -S awk -v FS=\"\\t\" -v OFS=\"\\t\" -f" 'awk-mode)
-  (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode))
+  (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode)
+  (files-tests--check-shebang "#!/usr/bin/env -S-vi bash -eux" 'sh-base-mode 'bash)
+  (files-tests--check-shebang "#!/usr/bin/env -ivS --default-signal=INT bash -eux" 'sh-base-mode 'bash)
+  (files-tests--check-shebang "#!/usr/bin/env -ivS --default-signal bash -eux" 'sh-base-mode 'bash)
+  (files-tests--check-shebang "#!/usr/bin/env -vS -uFOOBAR bash -eux" 'sh-base-mode 'bash))
 
 (ert-deftest files-test-dir-locals-auto-mode-alist ()
   "Test an `auto-mode-alist' entry in `.dir-locals.el'"
-- 
2.43.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: 0003-Support-shebang-lines-with-amended-environment.patch --]
[-- Type: text/x-patch, Size: 2275 bytes --]

From 9e24e7f000011105a89e5ce81cd9f16eb9ef15b5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?K=C3=A9vin=20Le=20Gouguec?= <kevin.legouguec@gmail.com>
Date: Sat, 10 Feb 2024 17:56:57 +0100
Subject: [PATCH 3/3] Support shebang lines with amended environment

For bug#64939.

* lisp/files.el (auto-mode-interpreter-regexp): Account for possible
VARIABLE=[VALUE] operands.
* test/lisp/files-tests.el (files-tests-auto-mode-interpreter):
Add an example from the coreutils manual.
---
 lisp/files.el            | 5 ++++-
 test/lisp/files-tests.el | 4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/lisp/files.el b/lisp/files.el
index 5098d49048e..524385edc84 100644
--- a/lisp/files.el
+++ b/lisp/files.el
@@ -3274,12 +3274,15 @@ auto-mode-interpreter-regexp
     ;; Optional group 1: env(1) invocation.
     "\\("
     "[^ \t\n]*/bin/env[ \t]*"
-    ;; Within group 1: possible -S/--split-string.
+    ;; Within group 1: possible -S/--split-string and environment
+    ;; adjustments.
     "\\(?:"
     ;; -S/--split-string
     "\\(?:-[0a-z]*S[ \t]*\\|--split-string=\\)"
     ;; More env arguments.
     "\\(?:-[^ \t\n]+[ \t]+\\)*"
+    ;; Interpreter environment modifications.
+    "\\(?:[^ \t\n]+=[^ \t\n]*[ \t]+\\)*"
     "\\)?"
     "\\)?"
     ;; Group 2: interpreter.
diff --git a/test/lisp/files-tests.el b/test/lisp/files-tests.el
index 0a5c3b897e4..d4c1ef3ba67 100644
--- a/test/lisp/files-tests.el
+++ b/test/lisp/files-tests.el
@@ -1694,7 +1694,9 @@ files-tests-auto-mode-interpreter
   (files-tests--check-shebang "#!/usr/bin/env -S-vi bash -eux" 'sh-base-mode 'bash)
   (files-tests--check-shebang "#!/usr/bin/env -ivS --default-signal=INT bash -eux" 'sh-base-mode 'bash)
   (files-tests--check-shebang "#!/usr/bin/env -ivS --default-signal bash -eux" 'sh-base-mode 'bash)
-  (files-tests--check-shebang "#!/usr/bin/env -vS -uFOOBAR bash -eux" 'sh-base-mode 'bash))
+  (files-tests--check-shebang "#!/usr/bin/env -vS -uFOOBAR bash -eux" 'sh-base-mode 'bash)
+  ;; Invocation through env, with modified environment.
+  (files-tests--check-shebang "#!/usr/bin/env -S PYTHONPATH=/...:${PYTHONPATH} python" 'python-base-mode))
 
 (ert-deftest files-test-dir-locals-auto-mode-alist ()
   "Test an `auto-mode-alist' entry in `.dir-locals.el'"
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2024-02-10 17:08       ` Kévin Le Gouguec
@ 2024-02-10 17:23         ` Malcolm Cook
  2024-02-17  8:33           ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Malcolm Cook @ 2024-02-10 17:23 UTC (permalink / raw)
  To: Kévin Le Gouguec; +Cc: Wilhelm Kirschbaum, 64939, Eli Zaretskii

Hooray - thanks - all seems perfect to me

On Sat, Feb 10, 2024 at 11:08 AM Kévin Le Gouguec
<kevin.legouguec@gmail.com> wrote:
>
> Kévin Le Gouguec <kevin.legouguec@gmail.com> writes:
>
> > Is this Good Enough™ for your purposes (Malcolm, Wilhelm), or should we
> > sophisticate the regexp further?  FWIW, in no particular order:
> >
> > (a) env(1) does seem to support mixing up arbitrary options with -S¹, so
> >     in principle it would make sense to support that;
> >
> > (b) Eli did not seem too found of the regexp hammer², so I don't know
> >     which direction we'd want to go between maximally correct (accept
> >     all arguments, _as long as_ -S|--split-string is in there) or good
> >     enough (just skip over --everything --that --looks --like -a
> >     --switch).
> >
> > (c) FWIW the "maximally correct" regexp might not be _that_ ugly, since
> >     "-[v]S[OPTION]" must be the *first* token after env; in other words
> >     no need to support --some-option --split-string --more-options.
>
> Well, sorry, couldn't resist.  How do the attached patches look?  The
> new testcases should tell the whole story.
>
> ('make && make -C test files-tests' seems none the worse for wear)
>


-- 
~ Malcolm Cook





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2024-02-10 17:23         ` Malcolm Cook
@ 2024-02-17  8:33           ` Eli Zaretskii
  2024-02-28 17:57             ` Wilhelm Kirschbaum
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2024-02-17  8:33 UTC (permalink / raw)
  To: Malcolm Cook; +Cc: wkirschbaum, 64939-done, kevin.legouguec

> From: Malcolm Cook <malcolm.cook@gmail.com>
> Date: Sat, 10 Feb 2024 11:23:11 -0600
> Cc: Eli Zaretskii <eliz@gnu.org>, Wilhelm Kirschbaum <wkirschbaum@gmail.com>, 64939@debbugs.gnu.org
> 
> Hooray - thanks - all seems perfect to me

Thanks, so I've now installed these changes on the master branch, and
I'm therefore closing this bug.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags
  2024-02-17  8:33           ` Eli Zaretskii
@ 2024-02-28 17:57             ` Wilhelm Kirschbaum
  0 siblings, 0 replies; 20+ messages in thread
From: Wilhelm Kirschbaum @ 2024-02-28 17:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Malcolm Cook, 64939-done, kevin.legouguec

[-- Attachment #1: Type: text/plain, Size: 240 bytes --]

> Thanks, so I've now installed these changes on the master branch, and
> I'm therefore closing this bug.
>

Thank you, this works! ( I am not sure if it's fine for me to reply to a
closed bug, but just wanted to express my gratitude :) ).

[-- Attachment #2: Type: text/html, Size: 500 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-02-28 17:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-29 20:08 bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Wilhelm Kirschbaum
2023-07-29 21:38 ` Wilhelm Kirschbaum
2023-07-30  5:04   ` Eli Zaretskii
2023-07-30  9:38     ` Wilhelm Kirschbaum
2023-07-30 10:04       ` Eli Zaretskii
2023-07-31  7:11         ` Wilhelm Kirschbaum
2023-07-31 17:38       ` Juri Linkov
2023-08-01  6:20         ` Wilhelm Kirschbaum
2023-07-30  4:53 ` Eli Zaretskii
2023-07-30  8:28   ` Wilhelm Kirschbaum
2023-07-30 10:03     ` Eli Zaretskii
2023-07-30 10:27       ` Wilhelm Kirschbaum
2024-01-31 19:52 ` bug#64939: Malcolm Cook
2024-02-01 18:52 ` bug#64939: Malcolm Cook
2024-02-10  8:27   ` bug#64939: Eli Zaretskii
2024-02-10 10:23     ` bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does not match env with flags Kévin Le Gouguec
2024-02-10 17:08       ` Kévin Le Gouguec
2024-02-10 17:23         ` Malcolm Cook
2024-02-17  8:33           ` Eli Zaretskii
2024-02-28 17:57             ` Wilhelm Kirschbaum

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).