From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?UTF-8?Q?K=C3=A9vin?= Le Gouguec Newsgroups: gmane.emacs.bugs Subject: bug#66902: 30.0.50; Recognize env -S/--split-string in shebangs Date: Sun, 12 Nov 2023 18:53:40 +0100 Message-ID: <871qcuuacb.fsf@gmail.com> References: <87ttq3lvpm.fsf@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21406"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) To: 66902@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Nov 12 18:54:39 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1r2EfM-0005MH-Qf for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 12 Nov 2023 18:54:36 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r2Ef8-0004tj-7o; Sun, 12 Nov 2023 12:54:22 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r2Ef7-0004ta-PF for bug-gnu-emacs@gnu.org; Sun, 12 Nov 2023 12:54:21 -0500 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1r2Ef7-0000O8-Gg for bug-gnu-emacs@gnu.org; Sun, 12 Nov 2023 12:54:21 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1r2Efm-0004rZ-Bu for bug-gnu-emacs@gnu.org; Sun, 12 Nov 2023 12:55:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: =?UTF-8?Q?K=C3=A9vin?= Le Gouguec Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 12 Nov 2023 17:55:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66902 X-GNU-PR-Package: emacs Original-Received: via spool by 66902-submit@debbugs.gnu.org id=B66902.169981167318641 (code B ref 66902); Sun, 12 Nov 2023 17:55:02 +0000 Original-Received: (at 66902) by debbugs.gnu.org; 12 Nov 2023 17:54:33 +0000 Original-Received: from localhost ([127.0.0.1]:56962 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r2EfI-0004qb-MU for submit@debbugs.gnu.org; Sun, 12 Nov 2023 12:54:33 -0500 Original-Received: from mail-wr1-x432.google.com ([2a00:1450:4864:20::432]:57673) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r2EfF-0004qI-Od for 66902@debbugs.gnu.org; Sun, 12 Nov 2023 12:54:31 -0500 Original-Received: by mail-wr1-x432.google.com with SMTP id ffacd0b85a97d-32fa7d15f4eso2506048f8f.3 for <66902@debbugs.gnu.org>; Sun, 12 Nov 2023 09:53:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699811622; x=1700416422; darn=debbugs.gnu.org; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:to:from:from:to:cc:subject:date:message-id:reply-to; bh=YnNDP2LlR/jTosuhIHFH/sO4UP/k/VQOHPEsrobpu8Q=; b=GYzGWIpSBBB2Q/J7wfWUKkTtbUGd/WkJJTTYTeBMBQ06h8T8NKhtfEm4H2YYO5foSh TCYtwa0sd+g8tNJefH9j3xbFi+ejFCnxO6zLnTHtPwrK3NTCQPn/ME0+wnVGca/mHPna drc6uMzAGBWAfu908CbKeOvB8rh9n0HHCJKRwEK/eEdQtmnpZamF38ycH18d9I26v7Uy x4D0G4Vo/1ECTzGZ0L3u179m3/HrgkI9Ny+uFKU9+cYRsha3+vUGHlaHfnn9HdQYby3E qqTIaP6/KVaNT4WNh66n/i5meBATQm6XK5aziKtSesr/YQ35eFMhsoMzjyNUdDJ/zUuV ofBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699811622; x=1700416422; h=mime-version:user-agent:message-id:date:references:in-reply-to :subject:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=YnNDP2LlR/jTosuhIHFH/sO4UP/k/VQOHPEsrobpu8Q=; b=eA1u06/Dp3KQqAVTl75m6Vlo/VCGMj5tcijJ6puAIMzDKN6jrMT3LF2dJlsFeeQIkf eV0RMsDiwqcE4AvaoJQINqY4NO+Hllqpllfzn8/4x5JtkxkhQZsGQO2rsVmUUESQur9R TECSQrZfNrMSBZbACfEYEnSY/zpDpcp9NBfntKBBT7JRZEVv0PgciVJNN/2sNns6iYHE dkRGeNDmT0Q+6/mi+roMopeq2mfEDNJ4IttGeouF9PzLBabCAtIZ6VHnAcO4FBen2IW5 dOfmjgnO+EyhBX3CgcR3d0KQkMPuU8Wt0vqjGuKw1nx9DgJeUb7iDepnU7hmsJRPt2bH 7mRA== X-Gm-Message-State: AOJu0YyeddrS5t3O7/Gc78DzMy6R8KM6smHoHaJOtbFHWTOzTe2ydGaV 4F7dDxwjB7Xp9PDaZQGbMy+/1nEQErVVqqfM X-Google-Smtp-Source: AGHT+IFImzbNzGl9VIyFDklZcnYqX4TBRg5YjW5PkrtzrpfA2gD/X6MsEwIICJ8OkC/YX4JRSi/IvA== X-Received: by 2002:a05:6000:1363:b0:331:4bc1:e7f with SMTP id q3-20020a056000136300b003314bc10e7fmr609915wrz.28.1699811621834; Sun, 12 Nov 2023 09:53:41 -0800 (PST) Original-Received: from amdahl30 ([2a01:e0a:253:fe0:2ef0:5dff:fed2:7b49]) by smtp.gmail.com with ESMTPSA id k16-20020adff290000000b00323287186aasm3715662wro.32.2023.11.12.09.53.40 for <66902@debbugs.gnu.org> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Nov 2023 09:53:41 -0800 (PST) In-Reply-To: <87ttq3lvpm.fsf@gmail.com> ("=?UTF-8?Q?K=C3=A9vin?= Le Gouguec"'s message of "Thu, 02 Nov 2023 21:57:25 +0100") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:274216 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable K=C3=A9vin Le Gouguec writes: > Questions before proceeding to ChangeLog entries & regression tests: For better or worse, I ended up proceeding to both these things, and then some. Let me know if the attached patches make sense; tested with make -j8 bootstrap && make -C test files-tests Tentative answers to my questions: > 1. Is this something we would like Emacs to recognize out of the box, or > is it too niche? Assuming yes. > 2. What about the more general forms shown in (info "(coreutils) env > invocation")? > > #!/usr/bin/env -[v]S[OPTION]... [NAME=3DVALUE]... COMMAND [ARGS]... Didn't go as far as handling -v nor NAME=3DVALUE pairs, but that could be added later if we ever feel like it. > 3. Assuming we do want to amend that regexp, would it be possible to use > rx here? OT1H guessing "no" because files.el is pre-reloaded, whereas > rx.el is not; OTOH I see that files.el requires easy-mmode at > compile-time, and that package does not show up in loadup.el, so=E2=80=A6 > settling for "maybe?" Figured rx was similar to pcase in that regard: * They need to be required explicitly despite their macros being "autoloaded", because files.el is loaded during bootstrap before autoloading is set up. * Somehow that does not cause them to be preloaded? At least going by emacs -Q, * featurep returns nil, * preloaded-file-list does not include them. --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0001-Add-basic-tests-for-interpreter-mode-alist.patch >From 8ee71e0c70fa5c16cb802722e8de15af0932773d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?K=C3=A9vin=20Le=20Gouguec?= Date: Sun, 12 Nov 2023 10:55:24 +0100 Subject: [PATCH 1/3] Add basic tests for interpreter-mode-alist * test/lisp/files-tests.el (files-tests--check-shebang): New helper to generate a temporary file with a given interpreter line, and assert that the mode picked by 'set-auto-mode' is derived from an expected mode. Write the 'should' form so that failure reports include useful context; for example: (ert-test-failed ((should (equal (list shebang actual-mode) (list shebang expected-mode))) :form (equal ("#!/usr/bin/env -S make -f" fundamental-mode) ("#!/usr/bin/env -S make -f" makefile-mode)) :value nil :explanation (list-elt 1 (different-atoms fundamental-mode makefile-mode)))) (files-tests-auto-mode-interpreter): New test; exercise some aspects of interpreter-mode-alist. --- test/lisp/files-tests.el | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/test/lisp/files-tests.el b/test/lisp/files-tests.el index 3492bd701b2..233efded945 100644 --- a/test/lisp/files-tests.el +++ b/test/lisp/files-tests.el @@ -1656,6 +1656,29 @@ files-tests-file-name-base (should (equal (file-name-base "foo") "foo")) (should (equal (file-name-base "foo/bar") "bar"))) +(defun files-tests--check-shebang (shebang expected-mode) + "Assert that mode for SHEBANG derives from EXPECTED-MODE." + (let ((actual-mode + (ert-with-temp-file script-file + :text shebang + (find-file script-file) + (if (derived-mode-p expected-mode) + expected-mode + major-mode)))) + ;; Tuck all the information we need in the `should' form: input + ;; shebang, expected mode vs actual. + (should + (equal (list shebang actual-mode) + (list shebang expected-mode))))) + +(ert-deftest files-tests-auto-mode-interpreter () + "Test that `set-auto-mode' deduces correct modes from shebangs." + (files-tests--check-shebang "#!/bin/bash" 'sh-mode) + (files-tests--check-shebang "#!/usr/bin/env bash" 'sh-mode) + (files-tests--check-shebang "#!/usr/bin/env python" 'python-base-mode) + (files-tests--check-shebang "#!/usr/bin/env python3" 'python-base-mode) + (files-tests--check-shebang "#!/usr/bin/make -f" 'makefile-mode)) + (ert-deftest files-test-dir-locals-auto-mode-alist () "Test an `auto-mode-alist' entry in `.dir-locals.el'" (find-file (ert-resource-file "whatever.quux")) -- 2.42.1 --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0002-Convert-auto-mode-interpreter-regexp-to-an-rx-form.patch >From d730ee2108e3bd4d641bce2cb50f61e8fbdfcd09 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?K=C3=A9vin=20Le=20Gouguec?= Date: Sun, 12 Nov 2023 16:51:04 +0100 Subject: [PATCH 2/3] Convert auto-mode-interpreter-regexp to an rx form * lisp/files.el: explicitly require rx even though the macros are autoloaded, since files.el is loaded during bootstrap. (auto-mode-interpreter-regexp): re-write using rx. A subsequent patch will add support for env's -S/--split-string argument, which will complicate the pattern past my personal threshold for bare regexps. Allow multiple spaces between #!, interpreter and first argument: empirically, Linux's execve allows it. --- lisp/files.el | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/lisp/files.el b/lisp/files.el index 3d838cd3b8c..dc301bea3c5 100644 --- a/lisp/files.el +++ b/lisp/files.el @@ -30,6 +30,7 @@ (eval-when-compile (require 'pcase) + (require 'rx) (require 'easy-mmode)) ; For `define-minor-mode'. (defvar font-lock-keywords) @@ -3245,8 +3246,14 @@ inhibit-local-variables-p temp)) (defvar auto-mode-interpreter-regexp - (purecopy "#![ \t]?\\([^ \t\n]*\ -/bin/env[ \t]\\)?\\([^ \t\n]+\\)") + (purecopy + (rx-let ((ascii-blank (any " \t")) + (non-blank (not (any " \t\n")))) + (rx "#!" + (* ascii-blank) + (? (group (* non-blank) "/bin/env" + (* ascii-blank))) + (group (+ non-blank))))) "Regexp matching interpreters, for file mode determination. This regular expression is matched against the first line of a file to determine the file's mode in `set-auto-mode'. If it matches, the file -- 2.42.1 --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0003-Recognize-shebang-lines-that-pass-S-split-string-to-.patch >From 0287f84a3ab6b767cc99b91356a96f2162c6a099 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?K=C3=A9vin=20Le=20Gouguec?= Date: Sun, 12 Nov 2023 17:46:34 +0100 Subject: [PATCH 3/3] Recognize shebang lines that pass -S/--split-string to env * lisp/files.el (auto-mode-interpreter-regexp): Add optional -S switch to the ignored group capturing the env invocation. * test/lisp/files-tests.el (files-test-auto-mode-interpreter): Add a couple of testcases; one from (info "(coreutils) env invocation"), the other from a personal project. --- lisp/files.el | 4 +++- test/lisp/files-tests.el | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/lisp/files.el b/lisp/files.el index dc301bea3c5..56bdcf9d08b 100644 --- a/lisp/files.el +++ b/lisp/files.el @@ -3252,7 +3252,9 @@ auto-mode-interpreter-regexp (rx "#!" (* ascii-blank) (? (group (* non-blank) "/bin/env" - (* ascii-blank))) + (* ascii-blank) + (? (or (: "-S" (* ascii-blank)) + (: "--split-string" (or ?= (* ascii-blank))))))) (group (+ non-blank))))) "Regexp matching interpreters, for file mode determination. This regular expression is matched against the first line of a file diff --git a/test/lisp/files-tests.el b/test/lisp/files-tests.el index 233efded945..3e499fff468 100644 --- a/test/lisp/files-tests.el +++ b/test/lisp/files-tests.el @@ -1677,6 +1677,8 @@ files-tests-auto-mode-interpreter (files-tests--check-shebang "#!/usr/bin/env bash" 'sh-mode) (files-tests--check-shebang "#!/usr/bin/env python" 'python-base-mode) (files-tests--check-shebang "#!/usr/bin/env python3" 'python-base-mode) + (files-tests--check-shebang "#!/usr/bin/env -S awk -v FS=\"\\t\" -v OFS=\"\\t\" -f" 'awk-mode) + (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode) (files-tests--check-shebang "#!/usr/bin/make -f" 'makefile-mode)) (ert-deftest files-test-dir-locals-auto-mode-alist () -- 2.42.1 --=-=-=--