From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps Date: Mon, 30 Nov 2020 04:25:31 +0200 Message-ID: <08c0bbce-051e-7a49-106a-d6d0629b2224@yandex.ru> References: <10120030-8b8d-b702-add4-8f099f934ed5@chalmers.se> <831rgivl7l.fsf@gnu.org> <83lfequ30g.fsf@gnu.org> <83a6v6tss9.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4356"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 Cc: abela@chalmers.se, 31796@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Nov 30 03:26:16 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kjYtL-00010a-Sd for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 30 Nov 2020 03:26:15 +0100 Original-Received: from localhost ([::1]:40444 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kjYtK-0008JQ-Uq for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 29 Nov 2020 21:26:14 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:34818) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kjYt8-0008IV-Hr for bug-gnu-emacs@gnu.org; Sun, 29 Nov 2020 21:26:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:41456) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kjYt8-0001Mh-AD for bug-gnu-emacs@gnu.org; Sun, 29 Nov 2020 21:26:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kjYt8-0007wG-6K for bug-gnu-emacs@gnu.org; Sun, 29 Nov 2020 21:26:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 30 Nov 2020 02:26:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 31796 X-GNU-PR-Package: emacs Original-Received: via spool by 31796-submit@debbugs.gnu.org id=B31796.160670314230492 (code B ref 31796); Mon, 30 Nov 2020 02:26:02 +0000 Original-Received: (at 31796) by debbugs.gnu.org; 30 Nov 2020 02:25:42 +0000 Original-Received: from localhost ([127.0.0.1]:53002 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kjYso-0007vj-4B for submit@debbugs.gnu.org; Sun, 29 Nov 2020 21:25:42 -0500 Original-Received: from mail-wm1-f42.google.com ([209.85.128.42]:38225) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kjYsl-0007vU-W2 for 31796@debbugs.gnu.org; Sun, 29 Nov 2020 21:25:40 -0500 Original-Received: by mail-wm1-f42.google.com with SMTP id g185so13109592wmf.3 for <31796@debbugs.gnu.org>; Sun, 29 Nov 2020 18:25:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=t0EwoCK3jwdoQHFmkGdVdc7SlZtLG1/RFdbt+IzBA5c=; b=oQNtdRfb53HhJX/ZKtLV/H6PRlys5E2B/yKk+CZBDsIqRb6/YcZeYkhIjFe1l8lkAx WvmfSQN3B4ZdJeQEUUufvGv2srT4P/RgzMQDUYI/lDY/nihS7JJD/4qkspL4jVLBxpQU yNn3RxiNdUAyvZYRlwUchjUunCzbxHahHPzHGIDzMtE7PM3lLATEHSFPgFJ/4JgcD2B9 0JZ2kePjPYfGltPPKToycvDkMvu8qe2ebqUeED6RBXm0TaSTwNJeASCThPGNmzcsaKlB Rz2rhUzFTTm81iaIohpWNQHKgmm2qeAakK6PXyycDR1aCP/U17Kv/9lXe5y7O+56NXzo hv5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=t0EwoCK3jwdoQHFmkGdVdc7SlZtLG1/RFdbt+IzBA5c=; b=guZGlRknERpAaQTfC7Q9XBThnmn2NMjWJ/rf341K7zZZY9gIk1e33NM8vftiNn6PjC gzpInDHnNFu53ibyMJdeymOMtxVayS1xALE6N5HU7++burKLetYOgkVAQhxMtQ9wN9HB o1kPtpBeMzqNeo7o6g9sKwMXj/sio3bRMi7Q53NRdzGjNHZWJBc6VRdNWM/HwJWMwEuG y6GQAIsQPMBZL/fchyR9k6gm2a1reN8e5G9pODqHs3NVCct1TVj2/Vx7GZvPz1OLl+c8 IvfefAZ0IH6cbZ1yB8PdeY5ZlfdIEQksx2HHfgvIA01SLurzB8MCphif2PpXb9XuDS0B ERzQ== X-Gm-Message-State: AOAM530tN3AUcGUEdvc9BUmp+JxriF7fz+fIeyBMmwq2R05lXiCQ6DTY ejvolaBLO7SIUDVDFNY830FmlORf3W4enA== X-Google-Smtp-Source: ABdhPJy8RqovfzxBBVI8Di29FnY/TI8NqFwLzweJPbruoTh1HQMIEPpmMLbh19Vbktk4tSLUVyn/UQ== X-Received: by 2002:a1c:1d85:: with SMTP id d127mr4631050wmd.39.1606703133934; Sun, 29 Nov 2020 18:25:33 -0800 (PST) Original-Received: from [192.168.0.4] ([66.205.71.3]) by smtp.googlemail.com with ESMTPSA id c131sm43177709wmf.3.2020.11.29.18.25.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 29 Nov 2020 18:25:33 -0800 (PST) In-Reply-To: <83a6v6tss9.fsf@gnu.org> Content-Language: en-US X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:194587 Archived-At: On 24.11.2020 22:16, Eli Zaretskii wrote: >> Cc: abela@chalmers.se, 31796@debbugs.gnu.org >> From: Dmitry Gutov >> Date: Tue, 24 Nov 2020 21:43:22 +0200 >> >> How about https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31796#23 ? > > The idea sounds fine to me. > >> Someone more familiar with existing ports of Grep on different systems >> should weigh in on it. > > I don't think it's necessary. We just need to probe Grep for support > of these switches, and then use it. The result cannot be worse than > it is now. Now that I've dug in a little, the situation seems difficult. -Pz does work, but it forces Grep to consider the file as one long string. As a consequence, if we ask it to output the line number, the number will always be 1. That's not a helpful mode of operation. Even if it worked differently, -P imposes a significant performance penalty from what I see, even when the extra syntax is not actually used. So we couldn't enable it by default. There is a similar program called pcregrep which outputs in the expected format: $ pcregrep -MHn "names\"\n *" lisp/progmodes/project.el lisp/progmodes/project.el:772: :type '(choice (const :tag "Read with completion from relative names" project--read-file-cpd-relative) lisp/progmodes/project.el:774: (const :tag "Read with completion from absolute names" project--read-file-absolute) ...but it doesn't seem to have a way to reliably detect where a match result ends. When we're talking multiline, perhaps the searched file includes a string like "file-name/etc:number"? Some of our tests probably do. Grep has an flag -Z (or --null) which adds a null byte after file names, but pcregrep doesn't. And anyway, pcregrep isn't usually installed by default. ripgrep, OTOH, seems to combine both good features here: $ rg -Hn --multiline --null "names\"\n *" lisp/progmodes/project.el lisp/progmodes/project.el772: :type '(choice (const :tag "Read with completion from relative names" 773: project--read-file-cpd-relative) 774: (const :tag "Read with completion from absolute names" 775: project--read-file-absolute) And it also disables the multiline mode automatically if the regexp can't match a newline (the multiline mode is significantly slower). To sum up, there are options, but I don't see a working solution that is based on GNU Grep. And that's the most portable search program we have, I think. The other recommendations I see (here: https://unix.stackexchange.com/questions/112132/how-can-i-grep-patterns-across-multiple-lines) include bespoke scripts in sed or perl in command mode. These seem less portable, but if someone would like to try their hand at one that would also output file names and line numbers in the expected format, I'd be happy to benchmark it.