From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#73771: 30.0.91; etags generates broken TAGS file for multi-line regex match Date: Sat, 12 Oct 2024 19:22:16 +0300 Message-ID: <86wmiduw93.fsf@gnu.org> References: <87zfn9tmf4.fsf@ice9.digital> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28654"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 73771@debbugs.gnu.org To: Morgan Willcock , Francesco =?UTF-8?Q?Potort=C3=AC?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Oct 12 18:23:15 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1szete-0007HK-HK for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 12 Oct 2024 18:23:14 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1szetG-0005mV-74; Sat, 12 Oct 2024 12:22:50 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1szetE-0005mM-PJ for bug-gnu-emacs@gnu.org; Sat, 12 Oct 2024 12:22:49 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1szetE-0001a5-GZ for bug-gnu-emacs@gnu.org; Sat, 12 Oct 2024 12:22:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-version:References:In-Reply-To:From:Date:To:Subject; bh=31HvsoB3wc15iNmFyvWtLBMIItFZNmhmt+H60PVrHFw=; b=PCIsPRbegIzj8IhuDiZuCvDCaVvDO5WbNRIkCOzq7sx6MdOCMRLR3lA10VQSSlskSJjvPs2s5dQDuQcyIIRgurTHQ+6AOg6fQWBMY994zhSwOLdYwKZO4O+2h2xvmhZbTUb2mKgCLp5Ev22WYBoNtL60bPB8Kq9DhqNWXnWx62GZGlreQj/u2/taOZQLJUkqaCw1frXsete4THW3gPhpLbuYzXFLj14FKRt3TfTb0CL4+hVc4K2829sDx/qmTsT+5hL8xX/n2O39irPfqmv0yKETbeHPHxWDLeEJ8DDM1Q4ERB2NdyMZpuvfzqk3IzWDzUGUX7HfaboIAJp9IpD8+g==; Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1szetS-00029x-Aw for bug-gnu-emacs@gnu.org; Sat, 12 Oct 2024 12:23:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 12 Oct 2024 16:23:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 73771 X-GNU-PR-Package: emacs Original-Received: via spool by 73771-submit@debbugs.gnu.org id=B73771.17287501648256 (code B ref 73771); Sat, 12 Oct 2024 16:23:02 +0000 Original-Received: (at 73771) by debbugs.gnu.org; 12 Oct 2024 16:22:44 +0000 Original-Received: from localhost ([127.0.0.1]:52017 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1szet9-000296-OU for submit@debbugs.gnu.org; Sat, 12 Oct 2024 12:22:44 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:48176) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1szet7-00028h-F4 for 73771@debbugs.gnu.org; Sat, 12 Oct 2024 12:22:43 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1szesm-0001Wr-V5; Sat, 12 Oct 2024 12:22:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=31HvsoB3wc15iNmFyvWtLBMIItFZNmhmt+H60PVrHFw=; b=WbTE9fw0/L3b8TcmPb9P 7GKY41Zpoko5+2qH5BvpBH4UakVFT/xMBCDY/Rw6SPZxORL70u0QNVceJvzsDt0L63jsXdTpi/Sfk wG5TuYchw+oPgdMSKeHDvlquROXlySux0BglE47wdzvGeLuKrH4ACTpZA5c29a5zNczfAUP19gCxJ pgLZdlg0TzWW6Vg1jWMKmWnZkBy2usPVE7Z5Oz+V4heYabCFzhir6DpW6MqvTuD2sLST/VsbOHyx8 9KHYgmMjtEgl66DVF288Deh88TcslDMIfhk8MF2DFUoiU/pzBlSyoQ1aEKKiF1DKrhyMEDGYM6ES6 +jXzNf/b5ZQAkw==; In-Reply-To: <87zfn9tmf4.fsf@ice9.digital> (message from Morgan Willcock on Sat, 12 Oct 2024 15:39:59 +0100) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:293458 Archived-At: > From: Morgan Willcock > Date: Sat, 12 Oct 2024 15:39:59 +0100 > > It appears as though using the multi-line regex matching feature of > etags generates a TAGS file which accidentally contains additional > newline characters. > > To test, create a source file for a test language which has some > variable definitions to match, where variables are defined using a > keyword "define" followed by a variable name that begins with "$": > > ## Create a test source file. > > /tmp/source printf 'Top of file\n\n' > ## Write a multi-line variable definition that ends at a newline > >>/tmp/source printf 'define\n$a\n' > ## Write a multi-line variable definition that doesn't end at a newline. > >>/tmp/source printf 'define\n$b;\n' > > Now create a TAGS file based on multi-line matching the variable > definitions: > > etags --lang=none --regex='/define[ \t\n]+\(\$[a-z]+\)/\1/m' -o /tmp/TAGS /tmp/source > > Note that the capture group does not include any newline characters, but > the contents of the TAGS file seems have inserted an additional newline > character in the line which locates $a: > > cat /tmp/TAGS > > /tmp/source,24 > $a > $a4,20 > $b;$b6,30 This is because etags always records one extra character with the regexp match, which is harmless, unless that extra character is a newline. The patch below fixes it: diff --git a/lib-src/etags.c b/lib-src/etags.c index a822a82..848d8ea 100644 --- a/lib-src/etags.c +++ b/lib-src/etags.c @@ -7420,7 +7420,7 @@ regex_tag_multiline (void) /* Force explicit tag name, if a name is there. */ pfnote (name, true, buffer + linecharno, - charno - linecharno + 1, lineno, linecharno); + charno - linecharno, lineno, linecharno); if (debug) fprintf (stderr, "%s on %s:%"PRIdMAX": %s\n", Francesco, why does the code add one more character there? It looks to me like an off-by-one error, because "charno - linecharno + 1" is interpreted by pfnote as the length of the portion of the line to record the regexp match. This code was there since you first introduced multi-line regexps back in 2002. Am I missing something here? (Removing the +1" part will need to update the expected results in the test suite, as they currently include that extra character, but that is okay.)