From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.bugs Subject: bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. Date: Thu, 24 Sep 2020 18:50:31 +0000 Message-ID: <20200924185031.GB4714@ACM> References: <20200923144824.GD6178@ACM> <20200924102022.GA4714@ACM> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22889"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 43558@debbugs.gnu.org, Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Sep 24 20:51:24 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kLWKx-0005oa-Au for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 24 Sep 2020 20:51:23 +0200 Original-Received: from localhost ([::1]:51162 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kLWKw-0006Qj-CY for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 24 Sep 2020 14:51:22 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43108) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kLWKd-0006Og-4W for bug-gnu-emacs@gnu.org; Thu, 24 Sep 2020 14:51:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:58429) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kLWKc-000251-6f for bug-gnu-emacs@gnu.org; Thu, 24 Sep 2020 14:51:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kLWKc-0004aM-4X for bug-gnu-emacs@gnu.org; Thu, 24 Sep 2020 14:51:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Alan Mackenzie Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 24 Sep 2020 18:51:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 43558 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 43558-submit@debbugs.gnu.org id=B43558.160097344117599 (code B ref 43558); Thu, 24 Sep 2020 18:51:02 +0000 Original-Received: (at 43558) by debbugs.gnu.org; 24 Sep 2020 18:50:41 +0000 Original-Received: from localhost ([127.0.0.1]:41742 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kLWKH-0004Zn-4x for submit@debbugs.gnu.org; Thu, 24 Sep 2020 14:50:41 -0400 Original-Received: from colin.muc.de ([193.149.48.1]:38311 helo=mail.muc.de) by debbugs.gnu.org with smtp (Exim 4.84_2) (envelope-from ) id 1kLWKF-0004ZX-Dx for 43558@debbugs.gnu.org; Thu, 24 Sep 2020 14:50:40 -0400 Original-Received: (qmail 34526 invoked by uid 3782); 24 Sep 2020 18:50:32 -0000 Original-Received: from acm.muc.de (p4fe15b66.dip0.t-ipconnect.de [79.225.91.102]) by localhost.muc.de (tmda-ofmipd) with ESMTP; Thu, 24 Sep 2020 20:50:31 +0200 Original-Received: (qmail 30966 invoked by uid 1000); 24 Sep 2020 18:50:31 -0000 Content-Disposition: inline In-Reply-To: X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:188897 Archived-At: Hello, Stefan. On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote: > > As already said, this is a(n ugly) workaround. syntax.c should handle > > comments in all their generality. With a bit of consideration, the > > method to do this is clear: > In my world, it's quite normal for a specific language's lexical rules > not to line up 100% with syntax tables (whether for strings, comments, > younameit). I don't see anything very special here. Normally when there's a mismatch, it's because a character is syntactically ambiguous. There's nothing syntax.c can do about this. In the current situation, this isn't the case: syntax.c is unable to handle a comment scenario where there is no ambiguity. > A `syntax-propertize` rule for "\*/" should be very easy to implement > and fairly cheap since the regexp is simple and will almost never match. Well, the rule would actually be for escaped newlines, but this would be quite expensive (compared with a syntax.c solution) since every comment near a change region would need scanning at each change. > So, yeah, you can add yet-another-hack on top of the other syntax.c > hacks if you want, but there's a good chance it will only ever be used > by CC-mode. It will take a lot more code changes in syntax.c than > a quick tweak to your Elisp code to search for "\*/". I've hacked up a working, but as yet unsatisfactory, change to syntax.c. It is surely better, where possible, to fix bugs at their point of causation rather than by workarounds elsewhere. As you note, CC Mode modes will be the only known users at the moment. Just as an aside, the project where I was working ~four years ago banned a proprietory editor after a mammoth search for a bug caused by an unintentional escaped NL on a line comment. The banned editor didn't fontify the continuation line in comment face. I was able to demonstrate to the project manager that Emacs fontified that comment correctly. > I do think it would be good to handle this without `syntax-table` > text-property hacks, but I think that should come with an overhaul of > syntax.c based on a major-mode provided DFA (or something like that) so > it can accommodate all the various oddball cases without even the need > to introduce the notion of escaping comment markers. That sounds almost more like a rewrite than an overhaul. You mean, I think, that the syntax of language expressions would be defined using something a bit like (but more powerful than) regular expressions. And with that, the need for syntactic analysis in Lisp would be much reduced. We would need to make sure that this wouldn't run more slowly than the current syntax.c/Lisp combination. > Stefan -- Alan Mackenzie (Nuremberg, Germany).