unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: haj@posteo.de (Harald Jörg)
To: 23461@debbugs.gnu.org
Cc: Stefan Monnier <monnier@iro.umontreal.ca>
Subject: bug#23461: perl-mode: Displaying HERE-docs as strings instead of comments [PATCH]
Date: Wed, 23 Dec 2020 03:19:15 +0100	[thread overview]
Message-ID: <87sg7xxo0s.fsf@hajtower> (raw)
In-Reply-To: <87a8k4tdxp.fsf@jidanni.org>

[-- Attachment #1: Type: text/plain, Size: 2277 bytes --]

This is a detour from my work on CPerl mode bugs ... while trying to
steal some syntax concepts from perl-mode.pl, I stumbled over this old
report.

I guess I can explain what's going on.  Short story: Perl mode marks
HERE-docs syntactically as c-style comments, hence font-lock-mode
selects the comments face.

Investigating how to fix this leads to the longer story.  There are two
possible approaches:

  1) use a string-style syntax (generic string) instead of c-style
     comments to flag HERE-documents.  That way, font-lock picks up the
     correct face automagically.

  2) Keep HERE_docs as c-style comments, but change the face mapping by
     injecting a function into font-lock-defaults which applies the
     string face to c-style comments.

Both approaches work, but both are a bit whacky.  For 1), changing the
syntax code is easy but opens a can of worms: Indentation after the
HERE-doc doesn't work any more.  The reason is that Perl mode needs to
go "back" to find out whether a statement is a continuation line.
"Back" includes skipping back over comments, but that HERE-doc is no
longer a comment, so it blocks the way to find whether the line before
the HERE-doc ends a statement.  To fix that, all calls to
perl-backward-to-noncomment must be checked whether they need to skip
backward over HERE-docs, too.  I added a function
perl-backward-to-noncomment-nonhere, and eventually it turned out that
the simple perl-backward-to-noncomment seems to be superfluous.

For 2), it feels wrong to have strings marked as comments, and it is a
bit of a hack to insert a function into font-lock-keywords which doesn't
even search for keywords.  CPerl mode uses a similar trick, but CPerl
mode is renowned for being whacky.  Also, __DATA__ sections in Perl mode
are marked generic strings, so there ought to be some disambiguation.

The patch uses the first approach, and also adds tests which are
independent of the chosen solution.


As a bycatch, it also fixes the case where the line starting a HERE-doc
ends in a comment, which was messed up by perl-mode.  I could not find a
bug report for that but test cases are included.

Perhaps Stefan has an opinion on this, and chances are good that he can
point to a better solution...
-- 
Happy winter solstice,
haj

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: perl-mode: Treat HERE-docs as strings --]
[-- Type: text/x-diff, Size: 11766 bytes --]

From a17f2323d9018fa312b6721fa7ea5744edc79039 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Harald=20J=C3=B6rg?= <haj@posteo.de>
Date: Wed, 23 Dec 2020 02:34:33 +0100
Subject: [PATCH] ; perl-mode: Display here-docs as strings instead of
 comments.

* lisp/progmodes/perl-mode.el (perl-syntax-propertize-function):
Make HERE-doc start a generic string instead of a c-style comment.
Handle the case where the line starting a HERE-doc ends with a
comment.
(perl--beginning-of-here-doc): New function.
(perl-backward-to-noncomment-nonhere): New function.
(perl-syntax-propertize-special-constructs): Make HERE-terminators
end generic strings instead of c-style comments, using the new
functions.

* test/lisp/progmodes/cperl-mode-tests.el (cperl-test-heredocs):
New test (30 should-forms) for various aspects of HERE-documents.
Works for CPerl mode, and with the patch also for Perl mode.

* test/lisp/progmodes/cperl-mode-resources/here-docs.pl: New file
with test cases.
---
 lisp/progmodes/perl-mode.el                   |  56 +++++++--
 .../cperl-mode-resources/here-docs.pl         | 111 ++++++++++++++++++
 test/lisp/progmodes/cperl-mode-tests.el       |  29 +++++
 3 files changed, 187 insertions(+), 9 deletions(-)
 create mode 100644 test/lisp/progmodes/cperl-mode-resources/here-docs.pl

diff --git a/lisp/progmodes/perl-mode.el b/lisp/progmodes/perl-mode.el
index fd8a51b5a5..a961f723d5 100644
--- a/lisp/progmodes/perl-mode.el
+++ b/lisp/progmodes/perl-mode.el
@@ -324,14 +324,29 @@ perl-syntax-propertize-function
         ;; disambiguate with the left-bitshift operator.
         "\\|" perl--syntax-exp-intro-regexp "<<\\(?2:\\sw+\\)\\)"
         ".*\\(\n\\)")
-       (4 (let* ((st (get-text-property (match-beginning 4) 'syntax-table))
+       (4 (let* ((eol (match-beginning 4))
+                 (st (get-text-property eol 'syntax-table))
                  (name (match-string 2))
                  (indented (match-beginning 1)))
             (goto-char (match-end 2))
             (if (save-excursion (nth 8 (syntax-ppss (match-beginning 0))))
+                ;; '>>' occurred in a string, or in a comment.
                 ;; Leave the property of the newline unchanged.
                 st
-              (cons (car (string-to-syntax "< c"))
+              ;; Before changing the syntax to generic string, let's
+              ;; check whether we are in an end-of-line comment, and
+              ;; if so, cheat by shifting the comment markers one char
+              ;; to the left.
+              (when (nth 4 (save-excursion (syntax-ppss eol)))
+                (when (equal (car (syntax-after (1- eol)))
+                             (car (string-to-syntax "<")))
+                  ;; yet another edge case: "#" is the last character
+                  ;; in that line, so there's actually no comment.
+                  (put-text-property (- eol 2) (1- eol)
+                                     'syntax-table (string-to-syntax "<")))
+                (put-text-property (1- eol) eol
+                                   'syntax-table (string-to-syntax ">")))
+              (cons (car (string-to-syntax "|"))
                     ;; Remember the names of heredocs found on this line.
                     (cons (cons (pcase (aref name 0)
                                   (?\\ (substring name 1))
@@ -342,7 +357,7 @@ perl-syntax-propertize-function
       ;; We don't call perl-syntax-propertize-special-constructs directly
       ;; from the << rule, because there might be other elements (between
       ;; the << and the \n) that need to be propertized.
-      ("\\(?:$\\)\\s<"
+      ("\\(?:$\\)\\s|"
        (0 (ignore (perl-syntax-propertize-special-constructs end))))
       )
      (point) end)))
@@ -364,12 +379,24 @@ perl-quote-syntax-table
       (modify-syntax-entry close ")" st))
     st))
 
+(defun perl--beginning-of-here-doc (state)
+  "If STATE describes a here-document, return its start, else return nil."
+  ;; We need to distinguish here-docs from normal strings, and from
+  ;; quote-like constructs like q//.
+  (let ((in-string-p (nth 3 state))
+        (string-start (nth 8 state)))
+  (and in-string-p
+       (= (syntax-class (syntax-after string-start)) 15) ; generic string
+       ;; here-doc strings have a syntax table cdr for the terminator(s)
+       (cdr-safe (get-text-property string-start 'syntax-table))
+       string-start))) ; return the start position if all other tests are t
+
 (defun perl-syntax-propertize-special-constructs (limit)
   "Propertize special constructs like regexps and formats."
   (let ((state (syntax-ppss))
         char)
     (cond
-     ((eq 2 (nth 7 state))
+     ((perl--beginning-of-here-doc state)
       ;; A Here document.
       (let ((names (cdr (get-text-property (nth 8 state) 'syntax-table))))
         (when (cdr names)
@@ -386,7 +413,7 @@ perl-syntax-propertize-special-constructs
                      limit 'move))
           (unless names
             (put-text-property (1- (point)) (point) 'syntax-table
-                               (string-to-syntax "> c"))))))
+                               (string-to-syntax "|"))))))
      ((or (null (setq char (nth 3 state)))
           (and (characterp char)
                (null (get-text-property (nth 8 state) 'syntax-table))))
@@ -910,14 +937,14 @@ perl-continuation-line-p
   "Move to end of previous line and return non-nil if continued."
   ;; Statement level.  Is it a continuation or a new statement?
   ;; Find previous non-comment character.
-  (perl-backward-to-noncomment)
+  (perl-backward-to-noncomment-nonhere)
   ;; Back up over label lines, since they don't
   ;; affect whether our line is a continuation.
   (while (and (eq (preceding-char) ?:)
               (memq (char-syntax (char-after (- (point) 2)))
                     '(?w ?_)))
     (beginning-of-line)
-    (perl-backward-to-noncomment))
+    (perl-backward-to-noncomment-nonhere))
   ;; Now we get the answer.
   (unless (memq (preceding-char) '(?\; ?\} ?\{))
     (preceding-char)))
@@ -959,7 +986,7 @@ perl-calculate-indent
 	   (state (syntax-ppss))
 	   (containing-sexp (nth 1 state))
 	   ;; Don't auto-indent in a quoted string or a here-document.
-	   (unindentable (or (nth 3 state) (eq 2 (nth 7 state)))))
+	   (unindentable (or (nth 3 state) (perl--beginning-of-here-doc state))))
       (when (and (eq t (nth 3 state))
                  (save-excursion
                    (goto-char (nth 8 state))
@@ -976,7 +1003,7 @@ perl-calculate-indent
                   (if perl-indent-parens-as-block '(?\{ ?\( ?\[) '(?\{)))
             0          ; move to beginning of line if it starts a function body
           ;; indent a little if this is a continuation line
-          (perl-backward-to-noncomment)
+          (perl-backward-to-noncomment-nonhere)
           (if (or (bobp)
                   (memq (preceding-char) '(?\; ?\})))
               0 perl-continued-statement-offset)))
@@ -1076,6 +1103,17 @@ perl-backward-to-noncomment
   "Move point backward to after the first non-white-space, skipping comments."
   (forward-comment (- (point-max))))
 
+(defun perl-backward-to-noncomment-nonhere ()
+  "Move point backward, skipping comments and here-docs."
+  ;; Comments can appear after a here-doc, but also at the end of the
+  ;; line containing the here-doc delimiter(s).
+  (forward-comment (- (point-max)))
+  (unless (equal (point) (point-min))
+    (let ((here-start (perl--beginning-of-here-doc
+                       (save-excursion (syntax-ppss (1- (point)))))))
+      (when here-start (goto-char here-start)))
+    (forward-comment (- (point-max)))))
+
 (defun perl-backward-to-start-of-continued-exp ()
   (while
       (let ((c (preceding-char)))
diff --git a/test/lisp/progmodes/cperl-mode-resources/here-docs.pl b/test/lisp/progmodes/cperl-mode-resources/here-docs.pl
new file mode 100644
index 0000000000..39e4fe06ba
--- /dev/null
+++ b/test/lisp/progmodes/cperl-mode-resources/here-docs.pl
@@ -0,0 +1,111 @@
+use 5.020;
+
+=head1 NAME
+
+here-docs.pl - resource file for cperl-test-here-docs
+
+=head1 DESCRIPTION
+
+This file holds a couple of HERE documents, with a variety of normal
+and edge cases.  For a formatted view of this description, run:
+
+   (cperl-perldoc "here-docs.pl")
+
+For each of the HERE documents, the following checks will done:
+
+=over 4
+
+=item *
+
+All occurrences of the string "look-here" are fontified as
+'font-lock-string-face.  Note that we deliberately test the face, not
+the syntax property: Users won't care for the syntax property, but
+they see the face.  Different implementations with different syntax
+properties have been seen in the past.
+
+=item *
+
+Indentation of the line(s) containing "look-here" is 0, i.e. there are no
+leading spaces.
+
+=item *
+
+Indentation of the following perl statement containing "indent" should
+be 0 if the statement contains "noindent", and according to the mode's
+continued-statement-offset otherwise.
+
+=back
+
+=cut
+
+# Prologue to make the test file valid without warnings
+
+my $text;
+my $any;
+my $indentation;
+my $anywhere = 'back again';
+
+=head1 The Tests
+
+=head2 Test Case 1
+
+We have two HERE documents in one line with different quoting styles.
+
+=cut
+
+## test case
+
+$text = <<"HERE" . <<'THERE' . $any;
+#look-here and
+HERE
+$tlook-here and
+THERE
+
+my $noindent = "This should be left-justified";
+
+=head2 Test case 2
+
+A HERE document followed by a continuation line
+
+=cut
+
+## test case
+
+$text = <<HERE
+look-here
+HERE
+
+. 'indent-level'; # Continuation, should be indented
+
+=head2 Test case 3
+
+A here document with a line-end comment in the starter line,
+after a complete statement
+
+=cut
+
+## test case
+
+$text = <<HERE; # start here
+look-here
+HERE
+
+my $noindent = "New statement in this line";
+
+=head2 Test case 4
+
+A HERE document with a to-be-continued statement and a comment in the
+starter line.
+
+=cut
+
+## test case
+
+$text = <<HERE # start here
+look-here
+HERE
+
+. 'indent-level'; # Continuation, should be indented
+
+
+__END__
diff --git a/test/lisp/progmodes/cperl-mode-tests.el b/test/lisp/progmodes/cperl-mode-tests.el
index cb2d067a61..d9b090896d 100644
--- a/test/lisp/progmodes/cperl-mode-tests.el
+++ b/test/lisp/progmodes/cperl-mode-tests.el
@@ -135,6 +135,35 @@ cperl-test-fontify-punct-vars
         (should (equal (nth 3 (syntax-ppss)) nil))
         (should (equal (nth 4 (syntax-ppss)) t))))))
 
+(ert-deftest cperl-test-heredocs ()
+  "Test that HERE-docs are fontified as strings."
+  (let ((file (ert-resource-file "here-docs.pl"))
+        (cperl-continued-statement-offset perl-continued-statement-offset)
+        (case-fold-search nil))
+    (with-temp-buffer
+      (insert-file-contents file)
+      (goto-char (point-min))
+      (funcall cperl-test-mode)
+      (indent-region (point-min) (point-max))
+      (font-lock-ensure (point-min) (point-max))
+      (while (search-forward "## test case" nil t)
+        (save-excursion
+          (while (search-forward "look-here" nil t)
+            (should (equal
+                     (get-text-property (match-beginning 0) 'face)
+                     'font-lock-string-face))
+            (beginning-of-line)
+            (should (null (looking-at "[ \t]")))
+            (forward-line 1)))
+        (should (re-search-forward
+                 (concat "^\\([ \t]*\\)" ; the actual indentation amount
+                         "\\([^ \t\n].*?\\)\\(no\\)?indent")
+                 nil t))
+        (should (equal (- (match-end 1) (match-beginning 1))
+                       (if (match-beginning 3) 0
+                         perl-indent-level))))
+      )))
+
 ;;; Tests for issues reported in the Bug Tracker
 
 (defun cperl-test--run-bug-10483 ()
-- 
2.20.1


  parent reply	other threads:[~2020-12-23  2:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-05 21:03 bug#23461: perl mode uses same color for comments and here documents 積丹尼 Dan Jacobson
2019-10-09  5:52 ` Lars Ingebrigtsen
2020-12-23  2:19 ` Harald Jörg [this message]
2020-12-23  4:00   ` bug#23461: perl-mode: Displaying HERE-docs as strings instead of comments [PATCH] Stefan Monnier
2020-12-23 14:37     ` Harald Jörg
2020-12-23 16:34       ` Stefan Monnier
2020-12-23 18:46         ` Harald Jörg
2020-12-23 19:04           ` Stefan Monnier
2020-12-23 19:06             ` Stefan Monnier
2020-12-24 15:29             ` Harald Jörg
2020-12-24 17:32               ` Stefan Monnier
2021-01-04 23:43                 ` Harald Jörg
2021-01-05  9:14                   ` Lars Ingebrigtsen
2021-01-05 12:30                     ` Harald Jörg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sg7xxo0s.fsf@hajtower \
    --to=haj@posteo.de \
    --cc=23461@debbugs.gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).