From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: haj@posteo.de (Harald =?UTF-8?Q?J=C3=B6rg?=) Newsgroups: gmane.emacs.bugs Subject: bug#23461: perl-mode: Displaying HERE-docs as strings instead of comments [PATCH] Date: Wed, 23 Dec 2020 19:46:19 +0100 Message-ID: <87r1ng5pj8.fsf@hajtower> References: <87sg7xxo0s.fsf@hajtower> <87zh24611c.fsf@hajtower> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27064"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) Cc: 23461@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Dec 23 19:47:15 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ks9AI-0006vk-7V for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 23 Dec 2020 19:47:14 +0100 Original-Received: from localhost ([::1]:34788 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ks9AH-0005a7-A1 for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 23 Dec 2020 13:47:13 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37250) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ks9A9-0005Yp-AH for bug-gnu-emacs@gnu.org; Wed, 23 Dec 2020 13:47:05 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:42617) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ks9A6-0007G9-GJ for bug-gnu-emacs@gnu.org; Wed, 23 Dec 2020 13:47:05 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ks9A6-0006P2-DK for bug-gnu-emacs@gnu.org; Wed, 23 Dec 2020 13:47:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: haj@posteo.de (Harald =?UTF-8?Q?J=C3=B6rg?=) Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 23 Dec 2020 18:47:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23461 X-GNU-PR-Package: emacs Original-Received: via spool by 23461-submit@debbugs.gnu.org id=B23461.160874918924563 (code B ref 23461); Wed, 23 Dec 2020 18:47:02 +0000 Original-Received: (at 23461) by debbugs.gnu.org; 23 Dec 2020 18:46:29 +0000 Original-Received: from localhost ([127.0.0.1]:54163 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ks99Z-0006O7-7I for submit@debbugs.gnu.org; Wed, 23 Dec 2020 13:46:29 -0500 Original-Received: from mout02.posteo.de ([185.67.36.66]:41019) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ks99W-0006Nv-Fa for 23461@debbugs.gnu.org; Wed, 23 Dec 2020 13:46:27 -0500 Original-Received: from submission (posteo.de [89.146.220.130]) by mout02.posteo.de (Postfix) with ESMTPS id BFD1B2400FD for <23461@debbugs.gnu.org>; Wed, 23 Dec 2020 19:46:20 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.de; s=2017; t=1608749180; bh=7fmv/0G0zpESJibzQu3K8gYXAoePqFJb2iF7vZ4ias4=; h=From:To:Cc:Subject:Date:From; b=ncC7VsyGZASY8JF2qRJz+Sgw8hLe1BA939byxJ/HBu2khXw8bNbRbO1OPq8zducWT 8uZqrO3aoje7F4i0lDBlnnqivCVGUsrk0do0hJhMdasaGUg4cZKeok73V8ch2WNv2w oU8ryzfq4BnkT1lHeuU3VXaVp3bz6YxXnmGrg3QbrRu7nzeqcjmfTXYbVuVRKuoyrp hZRcGgdHfRSI9JJxF5Hr67ZS7nSmf/Ss3viFg5uTmwP0ZlyI8o8nXnpAWAPd7ed5aJ 1eTxcYhvzQ30UqeDWaVxGKL/gokStgXv9xc7LtvgaeipMkUW1Oq36m47jBQdweq36H TlyuueB7zu5ug== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4D1McC71fDz6tmB; Wed, 23 Dec 2020 19:46:19 +0100 (CET) In-Reply-To: (Stefan Monnier's message of "Wed, 23 Dec 2020 11:34:12 -0500") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:196631 Archived-At: Stefan Monnier writes: >> This looks like to be an improved variation of 2): HERE-docs remain >> marked as c-style comments, and `font-lock-syntactic-face-function` is >> used to display them as strings. >> >> A patch for this variation is attached. > > Looks good, thanks. > See nitpicks below, Many thanks for your review! >> (if (save-excursion (nth 8 (syntax-ppss (match-beginning 0)))) >> + ;; '>>' occurred in a string, or in a comment. >> ;; Leave the property of the newline unchanged. > > Is think this `>>` is a type for `<<`, or am I missing something? Ouch. That is a typo, of course. *I* have missed something. >> + ;; Before changing the syntax to c-style comment, let's >> + ;; check whether we are in an end-of-line comment, and >> + ;; if so, cheat by shifting the comment markers one char >> + ;; to the left. > > I jump straight to reading the code before reading your email's text and > it took me a bit of time to understand what this was about. > I think part of the reason is the "we are in an end-of-line comment" > since this is actually not about the case where the "<<" (which is > where "we" are at this moment, in my mind) is inside a comment. > So, I think the comment would be better if it just gave a straight > example, like > > ;; Beware of `foo <<'BAR' #baz` because > ;; the newline needs to close the comment > ;; and can't be used to start the here-doc. That's better, thank you! > Also rather than "shifting the comment markers one char to the left" > I'd just say that "we terminate the comment *just before* the newline". >> + (when (nth 4 (save-excursion (syntax-ppss eol))) >> + (when (equal (car (syntax-after (1- eol))) >> + (car (string-to-syntax "<"))) > > This is a pessimistic test, because it will misfire when you have > > foo <<'BAR' #baz# > > I think we should compare (1- eol) with (nth 8 (syntax-ppss eol)) > instead. You are right. I'm still not very experienced with this syntax stuff, as it turnes out. In hindsight it is clear that a "#" will still be a comment starter, even if it is already in a comment. >> + ;; yet another edge case: "#" is the last character >> + ;; in that line, so there's actually no comment. >> + (put-text-property (- eol 2) (1- eol) >> + 'syntax-table (string-to-syntax "<"))) > > Indeed, terminating the comment just before the newline is a problem if > "just before the newline" is the comment starter. I see that in that > case, you mark the char before the # but that can also be a problem with > things like: > > foo <<'BAR' "baz"# ...or, as I've discovered in the meantime, also bad, foo << BAR# foo << BAR;# (The latter breaks indentation after the HERE-doc because the ";" became a comment) So, back to the drawing board. If that edge case of an "empty comment" is just ignored, then the single # loses its comment-face (that is, by the way, the treatment of HERE-docs in sh-script.el), which is ugly but as far as I can tell harmless. Maybe this can be covered by assigning a font-lock-face property to these single comment starters... I'll check that. > An alternative is to leave the comment alone and start the heredoc just > after the newline instead (that approach suffers from the fact that we > need to be careful we don't accidentally throw away that `syntax-table` > text property when the next line is edited. We can do that using the > `syntax-multiline` text property). This is similar to how CPerl mode does it, and the fact that text properties can get lost when text is edited brought me here. The bugs Bug#14343 and Bug#28962 in CPerl mode come from lost text properties, and Bug#24101 is a consequence of starting a syntax property with the first character. So, I wanted to check how Perl mode covers that. I'll check whether `syntax-multiline` can add some more robustness. This points to the deeper problem: In Perl, we have some occasions where one character has two different roles. A line end ends a comment *and* starts a here-doc, and in s/foo/bar/ge the middle slash ends the search string *and* starts the replacement string. The programming modes seem to treat this by distributing the roles on two neighbouring characters, which comes with some ... inaccuracies if the characters nearby have roles of their own. >> (defun perl-font-lock-syntactic-face-function (state) >> (cond >> + ((and (eq 2 (nth 7 state)) ; c-style comment >> + (cdr-safe (get-text-property (nth 8 state) 'syntax-table))) ; HERE doc >> + 'font-lock-string-face) > > I think some people won't like the string-face property for it. > How 'bout we (require 'sh-script) and use the `sh-heredoc` face? I'd hesitate to do that. I think that in shell-script mode it is well justified to use a different face for HERE-docs, but in Perl it isn't. In Perl, a HERE-doc is just a string and can be used wherever a string can be used. So, string-face seems quite appropriate. In Shell, a HERE-doc is sort of an I/O redirection. You can't, for example, assign a HERE-doc to a shell variable. So, a different face seems appropriate. BTW: CPerl also mode uses string-face for HERE-docs. -- Cheers, haj