From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.bugs Subject: bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. Date: Thu, 24 Sep 2020 18:43:04 -0400 Message-ID: References: <20200923144824.GD6178@ACM> <20200924102022.GA4714@ACM> <20200924185031.GB4714@ACM> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="40316"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: 43558@debbugs.gnu.org, Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= To: Alan Mackenzie Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Sep 25 00:44:11 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kLZyE-000AP0-RZ for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 25 Sep 2020 00:44:10 +0200 Original-Received: from localhost ([::1]:51022 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kLZyD-0001Hm-QP for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 24 Sep 2020 18:44:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60334) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kLZy6-0001Fz-TI for bug-gnu-emacs@gnu.org; Thu, 24 Sep 2020 18:44:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:58666) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kLZy6-0004rp-JN for bug-gnu-emacs@gnu.org; Thu, 24 Sep 2020 18:44:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kLZy6-00080o-H5 for bug-gnu-emacs@gnu.org; Thu, 24 Sep 2020 18:44:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 24 Sep 2020 22:44:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 43558 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 43558-submit@debbugs.gnu.org id=B43558.160098739930742 (code B ref 43558); Thu, 24 Sep 2020 22:44:02 +0000 Original-Received: (at 43558) by debbugs.gnu.org; 24 Sep 2020 22:43:19 +0000 Original-Received: from localhost ([127.0.0.1]:41979 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kLZxP-0007zm-5F for submit@debbugs.gnu.org; Thu, 24 Sep 2020 18:43:19 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:18767) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kLZxJ-0007zV-HV for 43558@debbugs.gnu.org; Thu, 24 Sep 2020 18:43:17 -0400 Original-Received: from pmg1.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id BF71A10022C; Thu, 24 Sep 2020 18:43:07 -0400 (EDT) Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id CF3F510001F; Thu, 24 Sep 2020 18:43:05 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1600987385; bh=3wV5sFDgj3Czxfyv5P1FY4gFnJ5zDFJNt/56La2tabA=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=b8u19j7cZ0PQo9sXbFgeSJO7dkP4Kii0y54gsrJxkn/7ubjBdDTB/t/7rY1Gy+q/q YisKPn1F0oLoTpfeSir8TT/FVtZ5OO/12eoxI3yaw34r6PVhnSwYE6WDnvSRSeWmKA ZyD+5LLW8BA+HnV3fG/Z1755j1LDciW7Ehz+cLXsAvKIlFrHVR/AgBaphL5HPY6wnH 2LjmLrdNw7mi9QDkiF+9Rz39Lwc0lBZ+5OZHhnt2mAi0q5rIPmOgy+e7LM9qt9xp87 ufg26fGy6EwEiyjgKnV/BSdK47pbyGQFTcDKfeoABqV1+Iatl0UZxgJgYfL1ikZi1w lVSD8dy0eSEww== Original-Received: from alfajor (unknown [45.72.232.131]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 938F11203B2; Thu, 24 Sep 2020 18:43:05 -0400 (EDT) In-Reply-To: <20200924185031.GB4714@ACM> (Alan Mackenzie's message of "Thu, 24 Sep 2020 18:50:31 +0000") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:188917 Archived-At: >> > As already said, this is a(n ugly) workaround. syntax.c should handle >> > comments in all their generality. With a bit of consideration, the >> > method to do this is clear: >> In my world, it's quite normal for a specific language's lexical rules >> not to line up 100% with syntax tables (whether for strings, comments, >> younameit). I don't see anything very special here. > Normally when there's a mismatch, it's because a character is > syntactically ambiguous. There's nothing syntax.c can do about this. Oh, no, there are many more situations than just "a character is syntactically ambiguous" (or alternatively you could argue that all cases are "a character is syntactically ambiguous", including your cases of escaped newline and escaped */). >> A `syntax-propertize` rule for "\*/" should be very easy to implement >> and fairly cheap since the regexp is simple and will almost never match. > Well, the rule would actually be for escaped newlines, It doesn't have to be if you set `comment-end-can-be-escaped` to non-nil, in which case you only need to tweak the \*/ case, AFAICT. > but this would be quite expensive (compared with a syntax.c solution) > since every comment near a change region would need scanning at > each change. I don't know what you mean by scanning, but yes you'd need to search for all "\\\\\n" or "\\\\\\*/" (depending on how you set `comment-end-can-be-escaped) and mark the second char accordingly. Seems pretty cheap in either case. > I've hacked up a working, but as yet unsatisfactory, change to syntax.c. > It is surely better, where possible, to fix bugs at their point of > causation rather than by workarounds elsewhere. I don't think it's a bug in `syntax.c`. `syntax.c` is not defined to support the syntax of C, it's only defined to handle a particular set of comment and string styles, which correspond to a common subset of what is in use in most languages, but IME most languages need some extra tweaks handled via the `syntax-table` text property. It's only a question of time until we add a `syntax-propertize-function` for Elisp mode to properly handle some corner cases, for example. >> I do think it would be good to handle this without `syntax-table` >> text-property hacks, but I think that should come with an overhaul of >> syntax.c based on a major-mode provided DFA (or something like that) so >> it can accommodate all the various oddball cases without even the need >> to introduce the notion of escaping comment markers. > That sounds almost more like a rewrite than an overhaul. Tomato tomahto. > We would need to make sure that this wouldn't run more slowly than the > current syntax.c/Lisp combination. I don't think that would be required, as long as it runs fast enough. In any case, the resulting performance is probably not the main worry (I suspect it will/would be easy to make it fast enough). Stefan