bug#31796: 26.1; dired-do-find-regexp-and-replace fails to find multiline regexps

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#31796: 26.1; dired-do-find-regexp-and-replace fails to find multiline regexps
@ 2018-06-11 18:58 Žygimantas Bruzgys
  2018-06-12 10:17 ` Noam Postavsky
  2020-11-23  9:09 ` bug#31796: 27.1; " Andreas Abel
  0 siblings, 2 replies; 61+ messages in thread
From: Žygimantas Bruzgys @ 2018-06-11 18:58 UTC (permalink / raw)
  To: 31796

[-- Attachment #1: Type: text/plain, Size: 3874 bytes --]

1) Create ~/test with a file with following contents:
multi
line

2) Visit directory using dired: C-f ~/test
3) Initiate regexp-replace by hitting Q
4) multi[[:space:]]line RET singeline RET
5) See that dired regexp replace failed reporting that no results were
found
6) Visit a file you have just created.
7) Initiate query-replace-regexp with C-M-%
8) Accept the suggested (previous) query-replace by hitting RET
9) See that the query is actually correct and finds the result.


In GNU Emacs 26.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.22.30)
 of 2018-05-29 built on juergen
Windowing system distributor 'The X.Org Foundation', version 11.0.12000000
Recent messages:
user-error: No matches for: multi[[:space:]]line
Mark set
Replaced 1 occurrence

Configured using:
 'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
 --localstatedir=/var --with-x-toolkit=gtk3 --with-xft --with-modules
 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong
 -fno-plt' CPPFLAGS=-D_FORTIFY_SOURCE=2
 LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS NOTIFY
ACL GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS
GTK3 X11 MODULES THREADS LIBSYSTEMD LCMS2

Important settings:
  value of $LC_COLLATE: de_CH.UTF-8
  value of $LC_MONETARY: de_CH.UTF-8
  value of $LC_NUMERIC: de_CH.UTF-8
  value of $LC_TIME: lt_LT.UTF-8
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Dired by name

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq format-spec rfc822
mml mml-sec password-cache epa derived epg epg-config gnus-util rmail
rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils cl-extra help-mode easymenu find-dired
semantic/fw mode-local find-func xref cl-seq project eieio byte-opt
bytecomp byte-compile cconv eieio-core cl-macs gv eieio-loaddefs grep
compile comint ansi-color ring thingatpt dired-aux cl-loaddefs cl-lib
dired dired-loaddefs elec-pair leuven-theme time-date mule-util tooltip
eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode
lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote dbusbind inotify lcms2 dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)

Memory information:
((conses 16 123229 14374)
 (symbols 48 22491 4)
 (miscs 40 103 176)
 (strings 32 34463 1497)
 (string-bytes 1 948663)
 (vectors 16 16681)
 (vector-slots 8 526030 13780)
 (floats 8 78 135)
 (intervals 56 300 0)
 (buffers 992 14))

[-- Attachment #2: Type: text/html, Size: 4275 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 26.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2018-06-11 18:58 bug#31796: 26.1; " Žygimantas Bruzgys
@ 2018-06-12 10:17 ` Noam Postavsky
  2020-11-23 21:25   ` Dmitry Gutov
  2020-11-23  9:09 ` bug#31796: 27.1; " Andreas Abel
  1 sibling, 1 reply; 61+ messages in thread
From: Noam Postavsky @ 2018-06-12 10:17 UTC (permalink / raw)
  To: Žygimantas Bruzgys; +Cc: 31796

Žygimantas Bruzgys <me@zygi.xyz> writes:

> 1) Create ~/test with a file with following contents:
> multi
> line
>
> 2) Visit directory using dired: C-f ~/test
> 3) Initiate regexp-replace by hitting Q
> 4) multi[[:space:]]line RET singeline RET
> 5) See that dired regexp replace failed reporting that no results were
> found
> 6) Visit a file you have just created.
> 7) Initiate query-replace-regexp with C-M-%
> 8) Accept the suggested (previous) query-replace by hitting RET
> 9) See that the query is actually correct and finds the result.

As the docstring of dired-do-find-regexp-and-replace says:

    REGEXP should use constructs supported by your local ‘grep’ command.

grep matches single lines, so multiline matching won't work.






^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2018-06-11 18:58 bug#31796: 26.1; " Žygimantas Bruzgys
  2018-06-12 10:17 ` Noam Postavsky
@ 2020-11-23  9:09 ` Andreas Abel
  2020-11-23 15:23   ` Eli Zaretskii
                     ` (2 more replies)
  1 sibling, 3 replies; 61+ messages in thread
From: Andreas Abel @ 2020-11-23  9:09 UTC (permalink / raw)
  To: 31796

The dired-do-find-regexp-and-replace command does not seem to parse the 
regex entered by the user correctly.  If the regex string contains a 
newline character (^Q^J), it seems that the parsing stops there.  At 
least I have seen errors like "unmatched bracket" and the like.

Anyhow, I did not get it to replace multiline text.  I found an answer 
here:

https://emacs.stackexchange.com/questions/30437/dired-search-and-replace-is-throwing-no-results

The solution is to manually invoke dired-do-query-replace-regexp 
(instead of pressing just Q).

However, this solution is hard to discover, because it is unexpected 
that the official regex-replace feature (key Q) contains such a blunder.

- Why isn't the more robust

   dired-do-query-replace-regexp

bound to Q?

- Why not fix the bug in dired-do-find-regexp-and-replace?  It has been 
reported for version 26 already, and it is not a minor issue.  Replacing 
interactively in several files is an **extremely** useful feature, and I 
would not want to do something like that outside of emacs.

Thanks for all the good work going into emacs.

Best,
Andreas

P.S.: Your approach to issue tracking (by email) must be considered 
stone-age by now.  How about switching to GitHub / GitLab or the like?

(Unless you want to keep the bar up, of course.  But this is hardly in 
the spirit of open source.)

-- 
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

andreas.abel@gu.se
http://www.cse.chalmers.se/~abela/

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-23  9:09 ` bug#31796: 27.1; " Andreas Abel
@ 2020-11-23 15:23   ` Eli Zaretskii
  2020-11-23 16:16   ` Drew Adams
  2020-11-23 21:28   ` Dmitry Gutov
  2 siblings, 0 replies; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-23 15:23 UTC (permalink / raw)
  To: Andreas Abel; +Cc: 31796

> From: Andreas Abel <abela@chalmers.se>
> Date: Mon, 23 Nov 2020 10:09:38 +0100
> 
> - Why not fix the bug in dired-do-find-regexp-and-replace?  It has been 
> reported for version 26 already, and it is not a minor issue.

I think we'd love to fix this, but we don't know how.  Patches are
welcome.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-23  9:09 ` bug#31796: 27.1; " Andreas Abel
  2020-11-23 15:23   ` Eli Zaretskii
@ 2020-11-23 16:16   ` Drew Adams
  2020-11-23 21:22     ` Dmitry Gutov
  2020-11-24 19:28     ` Juri Linkov
  2020-11-23 21:28   ` Dmitry Gutov
  2 siblings, 2 replies; 61+ messages in thread
From: Drew Adams @ 2020-11-23 16:16 UTC (permalink / raw)
  To: Andreas Abel, 31796

> The dired-do-find-regexp-and-replace command does not seem
> to parse the regex entered by the user correctly.  If the
> regex string contains a newline character (^Q^J), it seems
> that the parsing stops there.  At least I have seen errors
> like "unmatched bracket" and the like.
> 
> Anyhow, I did not get it to replace multiline text.
> I found an answer here:
> 
> https://emacs.stackexchange.com/questions/30437/dired-search-and-replace-is-throwing-no-results
> 
> The solution is to manually invoke
> dired-do-query-replace-regexp (instead of pressing just Q).
>
> However, this solution is hard to discover, because it is
> unexpected that the official regex-replace feature (key Q)
> contains such a blunder.
> 
> - Why isn't the more robust dired-do-query-replace-regexp
>   bound to Q?

It _was_ bound to `Q' - for decades.  But the inventor
of `dired-do-find-regexp-and-replace' decided to give
that binding to his command.  (I argued in vain in
favor of giving the new command a different binding,
keeping `Q' as it was.  Similarly for `A'.)

> - Why not fix the bug in dired-do-find-regexp-and-replace?
> It has been reported for version 26 already, and it is not
> a minor issue. Replacing interactively in several files is
> an **extremely** useful feature, and I would not want to
> do something like that outside of emacs.

+1.

___

FWIW, Dired+ binds `dired-do-query-replace-regexp'
to `M-q' (respecting the new binding of `Q' to
`dired-do-find-regexp-and-replace', though I
disagree with it).  And Dired+ has both commands
on the menus:

 Multiple > Search >
   Query Replace Using TAGS Table...   M-q
   Query Replace Using `find'...       Q

https://www.emacswiki.org/emacs/DiredPlus





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-23 16:16   ` Drew Adams
@ 2020-11-23 21:22     ` Dmitry Gutov
  2020-11-24 19:28     ` Juri Linkov
  1 sibling, 0 replies; 61+ messages in thread
From: Dmitry Gutov @ 2020-11-23 21:22 UTC (permalink / raw)
  To: Drew Adams, Andreas Abel, 31796

On 23.11.2020 18:16, Drew Adams wrote:
> But the inventor
> of `dired-do-find-regexp-and-replace' decided to give
> that binding to his command.

It wasn't me who made this decision.

> (I argued in vain in
> favor of giving the new command a different binding,
> keeping `Q' as it was.  Similarly for `A'.)

...but there would be no reason for me to write it, if that was the 
change proposed.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 26.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2018-06-12 10:17 ` Noam Postavsky
@ 2020-11-23 21:25   ` Dmitry Gutov
  0 siblings, 0 replies; 61+ messages in thread
From: Dmitry Gutov @ 2020-11-23 21:25 UTC (permalink / raw)
  To: Noam Postavsky, Žygimantas Bruzgys; +Cc: 31796

On 12.06.2018 13:17, Noam Postavsky wrote:
> As the docstring of dired-do-find-regexp-and-replace says:
> 
>      REGEXP should use constructs supported by your local ‘grep’ command.
> 
> grep matches single lines, so multiline matching won't work.

*Apparently* 'grep -P -z' can do multiline matches. But I don't know how 
portable that is, and the grep manual calls this combination "experimental".

But if we can, and if we change grep-regexp-alist somehow to support 
\0-delimited results (-P without -z doesn't do multiline), 
xref-matches-in-files could use these flags and get multiline results.

[[:space:]] still wouldn't work, though: it's an Emacs-only extension.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-23  9:09 ` bug#31796: 27.1; " Andreas Abel
  2020-11-23 15:23   ` Eli Zaretskii
  2020-11-23 16:16   ` Drew Adams
@ 2020-11-23 21:28   ` Dmitry Gutov
  2020-11-23 23:49     ` Andreas Abel
  2020-11-24 19:29     ` Juri Linkov
  2 siblings, 2 replies; 61+ messages in thread
From: Dmitry Gutov @ 2020-11-23 21:28 UTC (permalink / raw)
  To: Andreas Abel, 31796

On 23.11.2020 11:09, Andreas Abel wrote:
> - Why isn't the more robust
> 
>    dired-do-query-replace-regexp
> 
> bound to Q?

Which is the "more robust", though? dired-do-query-replace-regexp 
doesn't work with Tramp. dired-do-find-regexp-and-replace does.

And even if the former is fixed to work, the latter will work much 
faster remotely. It's also going to be faster in many "local" cases too.

If we don't manage to find a portable enough solution to do multiline 
searches, we could at least warn the user interactively about 
unsupported features, though.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-23 21:28   ` Dmitry Gutov
@ 2020-11-23 23:49     ` Andreas Abel
  2020-11-24  0:13       ` Dmitry Gutov
  2020-11-24 15:16       ` Eli Zaretskii
  2020-11-24 19:29     ` Juri Linkov
  1 sibling, 2 replies; 61+ messages in thread
From: Andreas Abel @ 2020-11-23 23:49 UTC (permalink / raw)
  To: Dmitry Gutov, 31796

With a software as old as emacs the most important feature is

   1. backwards-compatibility

The second most important feature is

   2. backwards-compatibility

The third most important feature is

   3. backwards-compatibility

It is like with C and LaTeX.  If you cannot ensure that things keep 
working as they did, don't change anything.

Tramp?  I had to google this term.

How often do programmers work on their local files in their day-to-day 
business, how often with remote files via tramp?

If you contribute a new feature for 0.1% percent of the use cases but 
disrupt something (even minor) for 99.9% of the use cases, then with an 
old tool like emacs the choice is: don't replace the old functionality 
with your new functionality.

Just don't break things.  Please.

If you want fancy functionality that works with remote files, this is 
fine.  There are enough keys on the keyboard you can bind the new 
functionality to.

Please don't break things that worked.

There are gazillion emacs users out there that dread each new emacs 
version because it will break their setup, their workflows, their 
habits.  We do not want to spend days after upgrades to get our work 
environment back.

We value stability and conservativity over everything else.

Thanks to everyone who contributes to emacs.  --Andreas

On 2020-11-23 22:28, Dmitry Gutov wrote:
> On 23.11.2020 11:09, Andreas Abel wrote:
>> - Why isn't the more robust
>>
>>    dired-do-query-replace-regexp
>>
>> bound to Q?
> 
> Which is the "more robust", though? dired-do-query-replace-regexp 
> doesn't work with Tramp. dired-do-find-regexp-and-replace does.
> 
> And even if the former is fixed to work, the latter will work much 
> faster remotely. It's also going to be faster in many "local" cases too.
> 
> If we don't manage to find a portable enough solution to do multiline 
> searches, we could at least warn the user interactively about 
> unsupported features, though.

-- 
Andreas Abel  <><      Du bist der geliebte Mensch.

Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden

andreas.abel@gu.se
http://www.cse.chalmers.se/~abela/

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-23 23:49     ` Andreas Abel
@ 2020-11-24  0:13       ` Dmitry Gutov
  2020-11-24  1:19         ` Dmitry Gutov
  2020-11-24 15:16       ` Eli Zaretskii
  1 sibling, 1 reply; 61+ messages in thread
From: Dmitry Gutov @ 2020-11-24  0:13 UTC (permalink / raw)
  To: Andreas Abel, 31796

On 24.11.2020 01:49, Andreas Abel wrote:
> With a software as old as emacs the most important feature is
> 
>    1. backwards-compatibility
> 
> The second most important feature is
> 
>    2. backwards-compatibility
> 
> The third most important feature is
> 
>    3. backwards-compatibility

No.

That's a road toward irrelevance.

> It is like with C and LaTeX.  If you cannot ensure that things keep 
> working as they did, don't change anything.
> 
> Tramp?  I had to google this term.

Tramp has been with us for ~20 years, and ~10 years a part of Emacs. It 
has a significant number of users.

Anyway, that Tramp fix was a happy side-effect. Now that I think back, 
the main reason was the switch to the new interface which removed the 
default binding for tags-loop-continue (now called fileloop-continue).

Which made using dired-do-search a little less convenient, and people 
asked for analogous commands which used the xref UI. The original 
commands are still with us, though.

> How often do programmers work on their local files in their day-to-day 
> business, how often with remote files via tramp?
> 
> If you contribute a new feature for 0.1% percent of the use cases but 
> disrupt something (even minor) for 99.9% of the use cases, then with an 
> old tool like emacs the choice is: don't replace the old functionality 
> with your new functionality.
> 
> Just don't break things.  Please.

I'm sorry for the inconvenience, really. But not being able to break 
anything, even, is an ever-growing cost on keeping Emacs relevant toward 
contemporary expectations, or otherwise making it better.

> If you want fancy functionality that works with remote files, this is 
> fine.  There are enough keys on the keyboard you can bind the new 
> functionality to.
> 
> Please don't break things that worked.
> 
> There are gazillion emacs users out there that dread each new emacs 
> version because it will break their setup, their workflows, their 
> habits.  We do not want to spend days after upgrades to get our work 
> environment back.

But you still upgrade to the new version? Expecting something new from 
it, right?

> We value stability and conservativity over everything else.

And then Emacs users get older, change jobs, or entirely leave the 
profession. If Emacs stays as it was 30 years ago, it will appeal only 
to users who started with it 30+ years ago. And many of those have 
already left.

Emacs users are an admirably faithful bunch, but there are forces of 
nature we have to contend with as well.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24  0:13       ` Dmitry Gutov
@ 2020-11-24  1:19         ` Dmitry Gutov
  0 siblings, 0 replies; 61+ messages in thread
From: Dmitry Gutov @ 2020-11-24  1:19 UTC (permalink / raw)
  To: Andreas Abel, 31796

On 24.11.2020 02:13, Dmitry Gutov wrote:
> switch to the new  interface which removed the default binding for 
                     ^ xref

Specifically, the new bindings for 'M-.' and 'M-,'.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-23 23:49     ` Andreas Abel
  2020-11-24  0:13       ` Dmitry Gutov
@ 2020-11-24 15:16       ` Eli Zaretskii
  2020-11-24 15:43         ` Dmitry Gutov
  1 sibling, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-24 15:16 UTC (permalink / raw)
  To: Andreas Abel; +Cc: 31796, dgutov

> From: Andreas Abel <abela@chalmers.se>
> Date: Tue, 24 Nov 2020 00:49:24 +0100
> 
> We value stability and conservativity over everything else.

We do, too.  If you stick around for a while, you will see how many
discussions here are due to the determination not to introduce even
the slightest risk of breaking compatibility with existing behavior.
In fact, some of the passion in Dmitry's response wasn't directed at
you, it was directed at myself and other senior maintainers who
frequently object to changes and/or request complicated
backward-compatibility shims, for that very reason.

So please don't assume we don't care about stability, or don't care
enough.  It would be simply unfair to make such assumptions.  We
certainly don't need lectures about keeping Emacs stable and
compatible.

What you see in this case is not the result of negligence or
carelessness, it is the result of not being aware of this (relatively
rare) use case becoming broken when we changed the UI of this and
similar commands to a more convenient one.  It took time for people to
report the problem, and it takes us more time to come up with a good
solution.  That's all.

If you have practical ideas for how to support these use cases with
the current command, please describe them.  TIA.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 15:16       ` Eli Zaretskii
@ 2020-11-24 15:43         ` Dmitry Gutov
  2020-11-24 16:35           ` Eli Zaretskii
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Gutov @ 2020-11-24 15:43 UTC (permalink / raw)
  To: Eli Zaretskii, Andreas Abel; +Cc: 31796

On 24.11.2020 17:16, Eli Zaretskii wrote:

> In fact, some of the passion in Dmitry's response wasn't directed at
> you, it was directed at myself and other senior maintainers who
> frequently object to changes and/or request complicated
> backward-compatibility shims, for that very reason.

In a way, perhaps. Even though I've been on the other side of these 
discussions as well.

But I was mostly pointing out a logical incompatibility to a user who 
installs a new release, but doesn't want to see anything change, ever.

> So please don't assume we don't care about stability, or don't care
> enough.  It would be simply unfair to make such assumptions.  We
> certainly don't need lectures about keeping Emacs stable and
> compatible.

That's true.

> What you see in this case is not the result of negligence or
> carelessness, it is the result of not being aware of this (relatively
> rare) use case becoming broken when we changed the UI of this and
> similar commands to a more convenient one.  It took time for people to
> report the problem, and it takes us more time to come up with a good
> solution.  That's all.

We've been aware of it for at least two years now. So what are we, then, 
negligent, careless, or incompetent?

If you're saying we can't afford to break even a minor feature like 
this, I don't think there are a lot of options.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 15:43         ` Dmitry Gutov
@ 2020-11-24 16:35           ` Eli Zaretskii
  2020-11-24 19:43             ` Dmitry Gutov
  0 siblings, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-24 16:35 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, 31796

> Cc: 31796@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 24 Nov 2020 17:43:53 +0200
> 
> We've been aware of it for at least two years now. So what are we, then, 
> negligent, careless, or incompetent?

Busy.  That, and the fact that no one came up with a clear idea of how
to fix this (at least IIRC).

> If you're saying we can't afford to break even a minor feature like 
> this, I don't think there are a lot of options.

We should try not to break any features, yes.  AFAIK, no one has yet
claimed that this cannot be fixed.  So the decision whether we can or
cannot stay with this broken doesn't have to be made yet.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-23 16:16   ` Drew Adams
  2020-11-23 21:22     ` Dmitry Gutov
@ 2020-11-24 19:28     ` Juri Linkov
  2020-11-24 20:12       ` Drew Adams
  2020-11-24 20:19       ` Eli Zaretskii
  1 sibling, 2 replies; 61+ messages in thread
From: Juri Linkov @ 2020-11-24 19:28 UTC (permalink / raw)
  To: Drew Adams; +Cc: Andreas Abel, 31796

>  Multiple > Search >
>    Query Replace Using TAGS Table...   M-q
>    Query Replace Using `find'...       Q

dired-do-find-regexp-and-replace could be left bound to Q, but
dired-do-query-replace-regexp could be bound to M-% in Dired.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-23 21:28   ` Dmitry Gutov
  2020-11-23 23:49     ` Andreas Abel
@ 2020-11-24 19:29     ` Juri Linkov
  2020-11-24 19:39       ` Dmitry Gutov
  1 sibling, 1 reply; 61+ messages in thread
From: Juri Linkov @ 2020-11-24 19:29 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Andreas Abel, 31796

> dired-do-query-replace-regexp doesn't work with Tramp.

Really?  I checked it and see no problems.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 19:29     ` Juri Linkov
@ 2020-11-24 19:39       ` Dmitry Gutov
  0 siblings, 0 replies; 61+ messages in thread
From: Dmitry Gutov @ 2020-11-24 19:39 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Andreas Abel, 31796

On 24.11.2020 21:29, Juri Linkov wrote:
>> dired-do-query-replace-regexp doesn't work with Tramp.
> Really?  I checked it and see no problems

Sorry, a clarification: it doesn't work on directories.

Which seems to be a conscious choice because with how dired-do-search 
and dired-do-query-replace-regexp are implementented, it would take a 
lot of time even when there are not too many files in such a directory. 
It has to copy each file to the local machine before doing the search.

dired-do-find-regexp and dired-do-find-regexp-and-replace handle 
directories just fine, however.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 16:35           ` Eli Zaretskii
@ 2020-11-24 19:43             ` Dmitry Gutov
  2020-11-24 20:16               ` Eli Zaretskii
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Gutov @ 2020-11-24 19:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, 31796

On 24.11.2020 18:35, Eli Zaretskii wrote:
>> Cc: 31796@debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Tue, 24 Nov 2020 17:43:53 +0200
>>
>> We've been aware of it for at least two years now. So what are we, then,
>> negligent, careless, or incompetent?
> 
> Busy.  That, and the fact that no one came up with a clear idea of how
> to fix this (at least IIRC).

How about https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31796#23 ?

Someone more familiar with existing ports of Grep on different systems 
should weigh in on it.

>> If you're saying we can't afford to break even a minor feature like
>> this, I don't think there are a lot of options.
> 
> We should try not to break any features, yes.

That's just common sense.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 19:28     ` Juri Linkov
@ 2020-11-24 20:12       ` Drew Adams
  2020-11-25  7:31         ` Juri Linkov
  2020-11-24 20:19       ` Eli Zaretskii
  1 sibling, 1 reply; 61+ messages in thread
From: Drew Adams @ 2020-11-24 20:12 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Andreas Abel, 31796

> >  Multiple > Search >
> >    Query Replace Using TAGS Table...   M-q
> >    Query Replace Using `find'...       Q
> 
> dired-do-find-regexp-and-replace could be left bound to Q, but
> dired-do-query-replace-regexp could be bound to M-% in Dired.

For the latter, I use `M-q' (not `M-%').
I suggest that vanilla Emacs do the same.

These two commands have quite similar purposes.
I suggest that they have similar keys.

Also, `M-%' has its normal meaning when Dired
has been toggled to writable (WDired).  That
key should be kept for its normal purpose, IMO.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 19:43             ` Dmitry Gutov
@ 2020-11-24 20:16               ` Eli Zaretskii
  2020-11-30  2:25                 ` Dmitry Gutov
  0 siblings, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-24 20:16 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, 31796

> Cc: abela@chalmers.se, 31796@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 24 Nov 2020 21:43:22 +0200
> 
> How about https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31796#23 ?

The idea sounds fine to me.

> Someone more familiar with existing ports of Grep on different systems 
> should weigh in on it.

I don't think it's necessary.  We just need to probe Grep for support
of these switches, and then use it.  The result cannot be worse than
it is now.

> >> If you're saying we can't afford to break even a minor feature like
> >> this, I don't think there are a lot of options.
> > 
> > We should try not to break any features, yes.
> 
> That's just common sense.

Of course.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 19:28     ` Juri Linkov
  2020-11-24 20:12       ` Drew Adams
@ 2020-11-24 20:19       ` Eli Zaretskii
  2020-11-24 20:31         ` Juri Linkov
  1 sibling, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-24 20:19 UTC (permalink / raw)
  To: Juri Linkov; +Cc: abela, 31796

> From: Juri Linkov <juri@linkov.net>
> Date: Tue, 24 Nov 2020 21:28:29 +0200
> Cc: Andreas Abel <abela@chalmers.se>, 31796@debbugs.gnu.org
> 
> >  Multiple > Search >
> >    Query Replace Using TAGS Table...   M-q
> >    Query Replace Using `find'...       Q
> 
> dired-do-find-regexp-and-replace could be left bound to Q, but
> dired-do-query-replace-regexp could be bound to M-% in Dired.

How will this help when the command to continue the loop is not bound
to any key?

We didn't just change the binding of Q without a good reason.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 20:19       ` Eli Zaretskii
@ 2020-11-24 20:31         ` Juri Linkov
  2020-11-24 20:51           ` Drew Adams
  2020-11-24 21:07           ` Eli Zaretskii
  0 siblings, 2 replies; 61+ messages in thread
From: Juri Linkov @ 2020-11-24 20:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, 31796

>> dired-do-find-regexp-and-replace could be left bound to Q, but
>> dired-do-query-replace-regexp could be bound to M-% in Dired.
>
> How will this help when the command to continue the loop is not bound
> to any key?

dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
keys and automatically moves to the next file on multiple files.
So it seemes it doesn't need a key to continue the loop.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
       [not found]       ` <<838saqtsm9.fsf@gnu.org>
@ 2020-11-24 20:32         ` Drew Adams
       [not found]         ` <<87mtz64htw.fsf@mail.linkov.net>
  1 sibling, 0 replies; 61+ messages in thread
From: Drew Adams @ 2020-11-24 20:32 UTC (permalink / raw)
  To: Eli Zaretskii, Juri Linkov; +Cc: abela, 31796

> > >  Multiple > Search >
> > >    Query Replace Using TAGS Table...   M-q
> > >    Query Replace Using `find'...       Q
> >
> > dired-do-find-regexp-and-replace could be left bound to Q, but
> > dired-do-query-replace-regexp could be bound to M-% in Dired.
> 
> How will this help when the command to continue the loop is not bound
> to any key?

I don't understand the question.  And which
command?  Are you asking how to use `M-q'
(`dired-do-query-replace-regexp')?

Are you saying that even though Emacs has kept
`dired-do-query-replace-regexp' it's no longer
usable for some reason?

> We didn't just change the binding of Q without a good reason.

So you say.  I've already disagreed that the
reason given was a good one.  IMHO, the new
command should have been given a new key.

Regardless of whether the existing key `Q'
should have been usurped, its previous command
still exists, and it seems to still be usable
and useful.  If so, what is wrong with giving
it its own key binding (`M-q' in my case)?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 20:31         ` Juri Linkov
@ 2020-11-24 20:51           ` Drew Adams
  2020-11-24 21:07           ` Eli Zaretskii
  1 sibling, 0 replies; 61+ messages in thread
From: Drew Adams @ 2020-11-24 20:51 UTC (permalink / raw)
  To: Juri Linkov, Eli Zaretskii; +Cc: abela, 31796

> >> dired-do-find-regexp-and-replace could be left bound to Q, but
> >> dired-do-query-replace-regexp could be bound to M-% in Dired.
> >
> > How will this help when the command to continue the loop is not bound
> > to any key?
> 
> dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
> keys and automatically moves to the next file on multiple files.
> So it seemes it doesn't need a key to continue the loop.

Yes.  (Now I understand the question. Thx.)





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 20:31         ` Juri Linkov
  2020-11-24 20:51           ` Drew Adams
@ 2020-11-24 21:07           ` Eli Zaretskii
  2020-11-25  7:28             ` Juri Linkov
  1 sibling, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-24 21:07 UTC (permalink / raw)
  To: Juri Linkov; +Cc: abela, 31796

> From: Juri Linkov <juri@linkov.net>
> Cc: drew.adams@oracle.com,  abela@chalmers.se,  31796@debbugs.gnu.org
> Date: Tue, 24 Nov 2020 22:31:55 +0200
> 
> >> dired-do-find-regexp-and-replace could be left bound to Q, but
> >> dired-do-query-replace-regexp could be bound to M-% in Dired.
> >
> > How will this help when the command to continue the loop is not bound
> > to any key?
> 
> dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
> keys and automatically moves to the next file on multiple files.
> So it seemes it doesn't need a key to continue the loop.

AFAIR, it does need a way to continue the loop if the user exits the
loop.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
       [not found]           ` <<831rgitqe2.fsf@gnu.org>
@ 2020-11-24 21:35             ` Drew Adams
  0 siblings, 0 replies; 61+ messages in thread
From: Drew Adams @ 2020-11-24 21:35 UTC (permalink / raw)
  To: Eli Zaretskii, Juri Linkov; +Cc: abela, 31796

> > >> dired-do-find-regexp-and-replace could be left bound to Q, but
> > >> dired-do-query-replace-regexp could be bound to M-% in Dired.
> > >
> > > How will this help when the command to continue the loop is not
> > > bound to any key?
> >
> > dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
> > keys and automatically moves to the next file on multiple files.
> > So it seemes it doesn't need a key to continue the loop.
> 
> AFAIR, it does need a way to continue the loop if the user exits the
> loop.

If that feature is needed and broken, then that's true
for the command itself (`dired-do-query-replace-regexp'),
right?

It has nothing to do with whether or not that command
has a key binding, and even less to do with whether it
has the key binding `Q'.  No?

I guess you're (not saying but hinting?) that the
decision to take key `Q' away from that command also
took away the ability to continue the loop if the user
exits it.  If so, that too is (apparently) unfortunate.

But what does that have to do with giving that command a
key binding (e.g. `M-q')?

What am I missing?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 21:07           ` Eli Zaretskii
@ 2020-11-25  7:28             ` Juri Linkov
  2020-11-25 15:48               ` Eli Zaretskii
  0 siblings, 1 reply; 61+ messages in thread
From: Juri Linkov @ 2020-11-25  7:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, 31796

>> >> dired-do-find-regexp-and-replace could be left bound to Q, but
>> >> dired-do-query-replace-regexp could be bound to M-% in Dired.
>> >
>> > How will this help when the command to continue the loop is not bound
>> > to any key?
>>
>> dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
>> keys and automatically moves to the next file on multiple files.
>> So it seemes it doesn't need a key to continue the loop.
>
> AFAIR, it does need a way to continue the loop if the user exits the
> loop.

Where would the users get the idea that it's possible to interrupt
query-replace and resume it anytime later, if single-file query-replace
doesn't support this feature?  I can't find where this feature
of continuing the loop is documented.  (info "(emacs) Query Replace")
only says:

     To restart a ‘query-replace’ once it is exited, use ‘C-x <ESC>
  <ESC>’, which repeats the ‘query-replace’ because it used the minibuffer
  to read its arguments.  *Note C-x <ESC> <ESC>: Repetition.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 20:12       ` Drew Adams
@ 2020-11-25  7:31         ` Juri Linkov
  2020-11-25 17:37           ` Drew Adams
  0 siblings, 1 reply; 61+ messages in thread
From: Juri Linkov @ 2020-11-25  7:31 UTC (permalink / raw)
  To: Drew Adams; +Cc: Andreas Abel, 31796

[-- Attachment #1: Type: text/plain, Size: 824 bytes --]

>> >  Multiple > Search >
>> >    Query Replace Using TAGS Table...   M-q
>> >    Query Replace Using `find'...       Q
>>
>> dired-do-find-regexp-and-replace could be left bound to Q, but
>> dired-do-query-replace-regexp could be bound to M-% in Dired.
>
> For the latter, I use `M-q' (not `M-%').
> I suggest that vanilla Emacs do the same.
>
> These two commands have quite similar purposes.
> I suggest that they have similar keys.
>
> Also, `M-%' has its normal meaning when Dired
> has been toggled to writable (WDired).  That
> key should be kept for its normal purpose, IMO.

'M-q' has its normal meaning of filling the paragraph,
so it would be confusing to use other meaning in Dired.

While finding a good short key would be nice, here is a patch
that for consistency with 'M-s a M-C-s' also adds 'M-s a M-C-%':


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: query-replace-regexp.patch --]
[-- Type: text/x-diff, Size: 2423 bytes --]

diff --git a/lisp/vc/vc-dir.el b/lisp/vc/vc-dir.el
index cdf8ab984e..bed779104c 100644
--- a/lisp/vc/vc-dir.el
+++ b/lisp/vc/vc-dir.el
@@ -308,6 +308,7 @@ vc-dir-mode-map
     (define-key map "Q" 'vc-dir-query-replace-regexp)
     (define-key map (kbd "M-s a C-s")   'vc-dir-isearch)
     (define-key map (kbd "M-s a M-C-s") 'vc-dir-isearch-regexp)
+    (define-key map (kbd "M-s a M-C-%") 'vc-dir-query-replace-regexp)
     (define-key map "G" 'vc-dir-ignore)
 
     (let ((branch-map (make-sparse-keymap)))
diff --git a/lisp/dired.el b/lisp/dired.el
index 08b19a0225..6cbcc17852 100644
--- a/lisp/dired.el
+++ b/lisp/dired.el
@@ -1932,6 +1932,7 @@ dired-mode-map
     ;; isearch
     (define-key map (kbd "M-s a C-s")   'dired-do-isearch)
     (define-key map (kbd "M-s a M-C-s") 'dired-do-isearch-regexp)
+    (define-key map (kbd "M-s a M-C-%") 'dired-do-query-replace-regexp)
     (define-key map (kbd "M-s f C-s")   'dired-isearch-filenames)
     (define-key map (kbd "M-s f M-C-s") 'dired-isearch-filenames-regexp)
     ;; misc
@@ -2214,9 +2215,12 @@ dired-mode-map
     (define-key map [menu-bar operate dashes-3]
       '("--"))
 
-    (define-key map [menu-bar operate query-replace]
-      '(menu-item "Query Replace in Files..." dired-do-find-regexp-and-replace
-		  :help "Replace regexp matches in marked files"))
+    (define-key map [menu-bar operate find-regexp-and-replace]
+      '(menu-item "Replace Regexp in Files..." dired-do-find-regexp-and-replace
+        	  :help "Replace regexp matches in marked files"))
+    (define-key map [menu-bar operate query-replace-regexp]
+      '(menu-item "Query Replace in Files..." dired-do-query-replace-regexp
+        	  :help "Replace regexp matches in marked files"))
     (define-key map [menu-bar operate search]
       '(menu-item "Search Files..." dired-do-find-regexp
 		  :help "Search marked files for matches of regexp"))
diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index a395453491..7b8dcc2096 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -598,7 +598,7 @@ project-prefix-map
     (define-key map "p" 'project-switch-project)
     (define-key map "g" 'project-find-regexp)
     (define-key map "G" 'project-or-external-find-regexp)
-    (define-key map "r" 'project-query-replace-regexp)
+    (define-key map [?\C-\M-%] 'project-query-replace-regexp)
     map)
   "Keymap for project commands.")
 

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-25  7:28             ` Juri Linkov
@ 2020-11-25 15:48               ` Eli Zaretskii
  2020-11-25 20:18                 ` Juri Linkov
  0 siblings, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-25 15:48 UTC (permalink / raw)
  To: Juri Linkov; +Cc: abela, 31796

> From: Juri Linkov <juri@linkov.net>
> Cc: drew.adams@oracle.com,  abela@chalmers.se,  31796@debbugs.gnu.org
> Date: Wed, 25 Nov 2020 09:28:57 +0200
> 
> >> >> dired-do-find-regexp-and-replace could be left bound to Q, but
> >> >> dired-do-query-replace-regexp could be bound to M-% in Dired.
> >> >
> >> > How will this help when the command to continue the loop is not bound
> >> > to any key?
> >>
> >> dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
> >> keys and automatically moves to the next file on multiple files.
> >> So it seemes it doesn't need a key to continue the loop.
> >
> > AFAIR, it does need a way to continue the loop if the user exits the
> > loop.
> 
> Where would the users get the idea that it's possible to interrupt
> query-replace and resume it anytime later, if single-file query-replace
> doesn't support this feature?

In the manual.  And in their muscle memory: we are talking about users
who knew about the original binding of Q in Dired, so we should assume
they also know about the possibility of exiting the loop and then
resuming it.

the command that was previously bound to Q used the UI that is very
similar to find-tag: you are presented with the first hit, and then go
to the next one, and the one after it, etc.  "Exiting the loop" can be
as simple as moving point or switching to another buffer to consult
some other part of Emacs.  It is very natural.  Once you've done that,
you'd want to resume the loop.

> I can't find where this feature of continuing the loop is
> documented.  (info "(emacs) Query Replace") only says:
> 
>      To restart a ‘query-replace’ once it is exited, use ‘C-x <ESC>
>   <ESC>’, which repeats the ‘query-replace’ because it used the minibuffer
>   to read its arguments.  *Note C-x <ESC> <ESC>: Repetition.

Wrong part of the manual, and the text which described that was
removed from the manual when we changed the binding.  Visit the Emacs
24 manual and go to "Operating on Files", a section of the "Dired"
chapter.  There you will see this text:

  `Q REGEXP <RET> TO <RET>'
       Perform `query-replace-regexp' on each of the specified files,
       replacing matches for REGEXP with the string TO
       (`dired-do-query-replace-regexp').

       This command is a variant of `tags-query-replace'.  If you exit the
       query replace loop, you can use `M-,' to resume the scan and
       replace more matches.  *Note Tags Search::.

The new UI presents all the hits in a separate window, so you can
easily use that to go to any hit you want even if you exit the loop.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-25  7:31         ` Juri Linkov
@ 2020-11-25 17:37           ` Drew Adams
  0 siblings, 0 replies; 61+ messages in thread
From: Drew Adams @ 2020-11-25 17:37 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Andreas Abel, 31796

> >> >  Multiple > Search >
> >> >    Query Replace Using TAGS Table...   M-q
> >> >    Query Replace Using `find'...       Q
> >>
> >> dired-do-find-regexp-and-replace could be left bound to Q, but
> >> dired-do-query-replace-regexp could be bound to M-% in Dired.
> >
> > For the latter, I use `M-q' (not `M-%').
> > I suggest that vanilla Emacs do the same.
> >
> > These two commands have quite similar purposes.
> > I suggest that they have similar keys.
> >
> > Also, `M-%' has its normal meaning when Dired
> > has been toggled to writable (WDired).  That
> > key should be kept for its normal purpose, IMO.
> 
> 'M-q' has its normal meaning of filling the paragraph,
> so it would be confusing to use other meaning in Dired.

How do you think filling a paragraph is useful
in Dired (or WDired)?  I don't follow you, here.

> While finding a good short key would be nice, here is a patch
> that for consistency with 'M-s a M-C-s' also adds 'M-s a M-C-%':

Count me out as favorable for that suggestion.
(Just one opinion.)





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-25 15:48               ` Eli Zaretskii
@ 2020-11-25 20:18                 ` Juri Linkov
  2020-11-25 20:30                   ` Eli Zaretskii
  0 siblings, 1 reply; 61+ messages in thread
From: Juri Linkov @ 2020-11-25 20:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, 31796

> the command that was previously bound to Q used the UI that is very
> similar to find-tag: you are presented with the first hit, and then go
> to the next one, and the one after it, etc.  "Exiting the loop" can be
> as simple as moving point or switching to another buffer to consult
> some other part of Emacs.  It is very natural.  Once you've done that,
> you'd want to resume the loop.

Would adding `M-s a M-C-%' help users who want the old behavior back?
Or a keybinding for `fileloop-continue' is needed as well?

> This command is a variant of `tags-query-replace'.  If you exit the
> query replace loop, you can use `M-,' to resume the scan and
> replace more matches.  *Note Tags Search::.

Maybe `M-s M-,' is not bad for `fileloop-continue'?





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-25 20:18                 ` Juri Linkov
@ 2020-11-25 20:30                   ` Eli Zaretskii
  2020-11-29  2:30                     ` Dmitry Gutov
  0 siblings, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-25 20:30 UTC (permalink / raw)
  To: Juri Linkov; +Cc: abela, 31796

> From: Juri Linkov <juri@linkov.net>
> Cc: drew.adams@oracle.com,  abela@chalmers.se,  31796@debbugs.gnu.org
> Date: Wed, 25 Nov 2020 22:18:28 +0200
> 
> > the command that was previously bound to Q used the UI that is very
> > similar to find-tag: you are presented with the first hit, and then go
> > to the next one, and the one after it, etc.  "Exiting the loop" can be
> > as simple as moving point or switching to another buffer to consult
> > some other part of Emacs.  It is very natural.  Once you've done that,
> > you'd want to resume the loop.
> 
> Would adding `M-s a M-C-%' help users who want the old behavior back?
> Or a keybinding for `fileloop-continue' is needed as well?

I'd prefer not to add fileloop-continue back in any shape or form.
I'd like us to fix the current binding of Q so that it supports
everything the previous command did.  Bringing back the commands we
obsoleted is counter-productive.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-25 20:30                   ` Eli Zaretskii
@ 2020-11-29  2:30                     ` Dmitry Gutov
  2020-11-29 15:22                       ` Eli Zaretskii
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Gutov @ 2020-11-29  2:30 UTC (permalink / raw)
  To: Eli Zaretskii, Juri Linkov; +Cc: abela, 31796

On 25.11.2020 22:30, Eli Zaretskii wrote:
> I'd like us to fix the current binding of Q so that it supports
> everything the previous command did.

Just how much of "everything" are we talking about?

For instance, a number of character classes in Emacs regexps are 
dependent on the syntax table. Like [:word:], for instance.

Even [:space:] is dependent on syntax, while it matches a fixed set of 
characters in Grep. So when searching across different file types we 
can't even "expand" such constructs into concrete characters to search for.

One approach I've considered is replacing such unsupported constructs 
with '.', or removing them entirely for constructs like \< and \_<. And 
then post-filter the resulting matches in Emacs.

For example, xref-references-in-directory uses a special case of this 
approach. In the general case though, I worry users would sometimes 
create regexps that result in an exponentially slow or just match-all 
regexp being passed to Grep, which would never finish, for no obvious 
reason.

Someone should try it, but it's a fair amount of work to handle all 
supported constructs, and to catch all (most?) the regexps which we 
can't support in this mode.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-29  2:30                     ` Dmitry Gutov
@ 2020-11-29 15:22                       ` Eli Zaretskii
  0 siblings, 0 replies; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-29 15:22 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, 31796, juri

> Cc: abela@chalmers.se, 31796@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sun, 29 Nov 2020 04:30:14 +0200
> 
> For instance, a number of character classes in Emacs regexps are 
> dependent on the syntax table. Like [:word:], for instance.
> 
> Even [:space:] is dependent on syntax, while it matches a fixed set of 
> characters in Grep. So when searching across different file types we 
> can't even "expand" such constructs into concrete characters to search for.

It isn't clear to me which interpretation users will want.  I don't
think there's a single answer.

> Someone should try it, but it's a fair amount of work to handle all 
> supported constructs, and to catch all (most?) the regexps which we 
> can't support in this mode.

FWIW, I think this is much less important than the embedded newline
support.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-24 20:16               ` Eli Zaretskii
@ 2020-11-30  2:25                 ` Dmitry Gutov
  2020-11-30  8:49                   ` Juri Linkov
                                     ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Dmitry Gutov @ 2020-11-30  2:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, 31796

On 24.11.2020 22:16, Eli Zaretskii wrote:
>> Cc: abela@chalmers.se, 31796@debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Tue, 24 Nov 2020 21:43:22 +0200
>>
>> How about https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31796#23 ?
> 
> The idea sounds fine to me.
> 
>> Someone more familiar with existing ports of Grep on different systems
>> should weigh in on it.
> 
> I don't think it's necessary.  We just need to probe Grep for support
> of these switches, and then use it.  The result cannot be worse than
> it is now.

Now that I've dug in a little, the situation seems difficult.

-Pz does work, but it forces Grep to consider the file as one long 
string. As a consequence, if we ask it to output the line number, the 
number will always be 1. That's not a helpful mode of operation.

Even if it worked differently, -P imposes a significant performance 
penalty from what I see, even when the extra syntax is not actually 
used. So we couldn't enable it by default.

There is a similar program called pcregrep which outputs in the expected 
format:

$ pcregrep -MHn "names\"\n *" lisp/progmodes/project.el
lisp/progmodes/project.el:772:  :type '(choice (const :tag "Read with 
completion from relative names"
                         project--read-file-cpd-relative)
lisp/progmodes/project.el:774:                 (const :tag "Read with 
completion from absolute names"
                         project--read-file-absolute)

...but it doesn't seem to have a way to reliably detect where a match 
result ends. When we're talking multiline, perhaps the searched file 
includes a string like "file-name/etc:number"? Some of our tests 
probably do. Grep has an flag -Z (or --null) which adds a null byte 
after file names, but pcregrep doesn't.

And anyway, pcregrep isn't usually installed by default.

ripgrep, OTOH, seems to combine both good features here:

$ rg -Hn --multiline --null "names\"\n *" lisp/progmodes/project.el
lisp/progmodes/project.el772:  :type '(choice (const :tag "Read with 
completion from relative names"
773:                        project--read-file-cpd-relative)
774:                 (const :tag "Read with completion from absolute names"
775:                        project--read-file-absolute)

And it also disables the multiline mode automatically if the regexp 
can't match a newline (the multiline mode is significantly slower).

To sum up, there are options, but I don't see a working solution that is 
based on GNU Grep. And that's the most portable search program we have, 
I think.

The other recommendations I see (here: 
https://unix.stackexchange.com/questions/112132/how-can-i-grep-patterns-across-multiple-lines) 
include bespoke scripts in sed or perl in command mode. These seem less 
portable, but if someone would like to try their hand at one that would 
also output file names and line numbers in the expected format, I'd be 
happy to benchmark it.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-30  2:25                 ` Dmitry Gutov
@ 2020-11-30  8:49                   ` Juri Linkov
  2020-12-01  2:21                     ` Dmitry Gutov
  2020-11-30 15:30                   ` Eli Zaretskii
  2020-12-01  5:20                   ` Richard Stallman
  2 siblings, 1 reply; 61+ messages in thread
From: Juri Linkov @ 2020-11-30  8:49 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, 31796

> Now that I've dug in a little, the situation seems difficult.
>
> -Pz does work, but it forces Grep to consider the file as one long
>  string. As a consequence, if we ask it to output the line number, the
>  number will always be 1. That's not a helpful mode of operation.
>
> Even if it worked differently, -P imposes a significant performance penalty
> from what I see, even when the extra syntax is not actually used. So we
> couldn't enable it by default.

When a grep input pattern contains a newline, then xref could use
the same algorithm as is used for 'M-.', i.e. run 'grep -Pzl'
to get the file names that contain the pattern, then return
these file names without line numbers.  This works exactly
like a new feature of extending xref-show-xrefs-function
with a new completion function was proposed recently on emacs-devel
(BTW, why it's not installed yet?)

So like this feature presenting such completions without line numbers:

  lisp/progmodes/project.el:(cl-defgeneric project-root)
  lisp/progmodes/project.el:(cl-defmethod project-root ((project (head transient))))
  lisp/progmodes/project.el:(cl-defmethod project-root ((project (head vc))))

xref for grep could work the same way without line numbers:

  lisp/progmodes/project.el:names"^Jproject--read-file-cpd-relative)
  lisp/progmodes/project.el:names"^Jproject--read-file-absolute)

Then visiting such grep hit should use Emacs search functions
to find the grep hit in the visited file.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-30  2:25                 ` Dmitry Gutov
  2020-11-30  8:49                   ` Juri Linkov
@ 2020-11-30 15:30                   ` Eli Zaretskii
  2020-11-30 15:39                     ` Jean Louis
                                       ` (2 more replies)
  2020-12-01  5:20                   ` Richard Stallman
  2 siblings, 3 replies; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-30 15:30 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, 31796

> Cc: abela@chalmers.se, 31796@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 30 Nov 2020 04:25:31 +0200
> 
> To sum up, there are options, but I don't see a working solution that is 
> based on GNU Grep. And that's the most portable search program we have, 
> I think.

Maybe we should say that if someone wants to be able to find multiline
regexp, they should install ripgrep?





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-30 15:30                   ` Eli Zaretskii
@ 2020-11-30 15:39                     ` Jean Louis
  2020-11-30 16:36                       ` Eli Zaretskii
  2020-11-30 15:42                     ` Jean Louis
  2020-12-01  1:24                     ` Dmitry Gutov
  2 siblings, 1 reply; 61+ messages in thread
From: Jean Louis @ 2020-11-30 15:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, 31796, Dmitry Gutov

* Eli Zaretskii <eliz@gnu.org> [2020-11-30 18:31]:
> > Cc: abela@chalmers.se, 31796@debbugs.gnu.org
> > From: Dmitry Gutov <dgutov@yandex.ru>
> > Date: Mon, 30 Nov 2020 04:25:31 +0200
> > 
> > To sum up, there are options, but I don't see a working solution that is 
> > based on GNU Grep. And that's the most portable search program we have, 
> > I think.
> 
> Maybe we should say that if someone wants to be able to find multiline
> regexp, they should install ripgrep?

Does this help?

https://stackoverflow.com/questions/3717772/regex-grep-for-multi-line-search-needed#7167115






^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-30 15:30                   ` Eli Zaretskii
  2020-11-30 15:39                     ` Jean Louis
@ 2020-11-30 15:42                     ` Jean Louis
  2020-12-01  1:23                       ` Dmitry Gutov
  2020-12-01  1:24                     ` Dmitry Gutov
  2 siblings, 1 reply; 61+ messages in thread
From: Jean Louis @ 2020-11-30 15:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, 31796, Dmitry Gutov

* Eli Zaretskii <eliz@gnu.org> [2020-11-30 18:31]:
> > Cc: abela@chalmers.se, 31796@debbugs.gnu.org
> > From: Dmitry Gutov <dgutov@yandex.ru>
> > Date: Mon, 30 Nov 2020 04:25:31 +0200
> > 
> > To sum up, there are options, but I don't see a working solution that is 
> > based on GNU Grep. And that's the most portable search program we have, 
> > I think.
> 
> Maybe we should say that if someone wants to be able to find multiline
> regexp, they should install ripgrep?

It is possible to combine with sed:
https://www.gnu.org/software/sed/manual/html_node/Text-search-across-multiple-lines.html

https://www.gnu.org/software/sed/manual/html_node/Multiline-techniques.html#Multiline-techniques






^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-30 15:39                     ` Jean Louis
@ 2020-11-30 16:36                       ` Eli Zaretskii
  0 siblings, 0 replies; 61+ messages in thread
From: Eli Zaretskii @ 2020-11-30 16:36 UTC (permalink / raw)
  To: Jean Louis; +Cc: abela, 31796, dgutov

> Date: Mon, 30 Nov 2020 18:39:33 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: Dmitry Gutov <dgutov@yandex.ru>, abela@chalmers.se,
>   31796@debbugs.gnu.org
> 
> * Eli Zaretskii <eliz@gnu.org> [2020-11-30 18:31]:
> > > Cc: abela@chalmers.se, 31796@debbugs.gnu.org
> > > From: Dmitry Gutov <dgutov@yandex.ru>
> > > Date: Mon, 30 Nov 2020 04:25:31 +0200
> > > 
> > > To sum up, there are options, but I don't see a working solution that is 
> > > based on GNU Grep. And that's the most portable search program we have, 
> > > I think.
> > 
> > Maybe we should say that if someone wants to be able to find multiline
> > regexp, they should install ripgrep?
> 
> Does this help?
> 
> https://stackoverflow.com/questions/3717772/regex-grep-for-multi-line-search-needed#7167115

I think this was already discussed up-thread?





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-30 15:42                     ` Jean Louis
@ 2020-12-01  1:23                       ` Dmitry Gutov
  2020-12-01  8:36                         ` Juri Linkov
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Gutov @ 2020-12-01  1:23 UTC (permalink / raw)
  To: Jean Louis, Eli Zaretskii; +Cc: abela, 31796

On 30.11.2020 17:42, Jean Louis wrote:
> It is possible to combine with sed:
> https://www.gnu.org/software/sed/manual/html_node/Text-search-across-multiple-lines.html
> 
> https://www.gnu.org/software/sed/manual/html_node/Multiline-techniques.html#Multiline-techniques

It's pretty much Chinese to me, sorry.

Can you write a sed search script like that that outputs in the expected 
format?

Meaning,

   FILE_NAME\0LINE_NUMBER_1:MATCH_LINE_1
   ...
   LINE_NUMBER_N:MATCH_LINE_N





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-30 15:30                   ` Eli Zaretskii
  2020-11-30 15:39                     ` Jean Louis
  2020-11-30 15:42                     ` Jean Louis
@ 2020-12-01  1:24                     ` Dmitry Gutov
  2 siblings, 0 replies; 61+ messages in thread
From: Dmitry Gutov @ 2020-12-01  1:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, 31796

On 30.11.2020 17:30, Eli Zaretskii wrote:
>> Cc:abela@chalmers.se,31796@debbugs.gnu.org
>> From: Dmitry Gutov<dgutov@yandex.ru>
>> Date: Mon, 30 Nov 2020 04:25:31 +0200
>>
>> To sum up, there are options, but I don't see a working solution that is
>> based on GNU Grep. And that's the most portable search program we have,
>> I think.
> Maybe we should say that if someone wants to be able to find multiline
> regexp, they should install ripgrep?

We could do that, indeed.

Certainly better than not having that feature at all.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-30  8:49                   ` Juri Linkov
@ 2020-12-01  2:21                     ` Dmitry Gutov
  2020-12-01  8:39                       ` Juri Linkov
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Gutov @ 2020-12-01  2:21 UTC (permalink / raw)
  To: Juri Linkov; +Cc: abela, 31796

On 30.11.2020 10:49, Juri Linkov wrote:

>> Even if it worked differently, -P imposes a significant performance penalty
>> from what I see, even when the extra syntax is not actually used. So we
>> couldn't enable it by default.
> 
> When a grep input pattern contains a newline, then xref could use
> the same algorithm as is used for 'M-.', i.e. run 'grep -Pzl'
> to get the file names that contain the pattern, then return
> these file names without line numbers.

Do you mean the xref items backed by find-func.el? There are a 
particular kind of references which are usually unique enough that 
special navigation logic can work. It's also implemented this way 
because the search can be performed without reading file contents (which 
would be slower).

> This works exactly
> like a new feature of extending xref-show-xrefs-function
> with a new completion function was proposed recently on emacs-devel

For Grep results, I think the line number is important because we're 
even more likely to have multiple lines with the same contents in one file.

What we *could* do, is run Grep, then take just the list of files names 
that it returns, visit them all in Emacs and repeat the search in all of 
them. But that would require a more complex abstraction than just 
"search command", as well as some juggling of buffers that weren't open 
before (we don't want to add more open buffers just because the user has 
run a search, right?).

I'm not sure cost/benefit is worth it, but if you'd like to try your 
hand at writing it, please go ahead. Just let me add ripgrep support first.

BTW, that approach could fit project-search and 
project-query-replace-regexp well, I think. Perhaps the dired-do-* 
functions, too. Should improve their performance in a number of scenarios.

> (BTW, why it's not installed yet?)

Waiting for the feedback.

It went through several minor revisions. Do you like the most recent 
version? If so, please reply to the message containing it. If you don't, 
please also reply and say why.

> So like this feature presenting such completions without line numbers:
> 
>    lisp/progmodes/project.el:(cl-defgeneric project-root)
>    lisp/progmodes/project.el:(cl-defmethod project-root ((project (head transient))))
>    lisp/progmodes/project.el:(cl-defmethod project-root ((project (head vc))))
> 
> xref for grep could work the same way without line numbers:
> 
>    lisp/progmodes/project.el:names"^Jproject--read-file-cpd-relative)
>    lisp/progmodes/project.el:names"^Jproject--read-file-absolute)
> 
> Then visiting such grep hit should use Emacs search functions
> to find the grep hit in the visited file.

These are two substrings inside that file that matched the search 
regexp. But there could be substrings in the same file that are equal to 
either of these but don't match said regexp, e.g. because they are 
preceded or followed by some different contents.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-11-30  2:25                 ` Dmitry Gutov
  2020-11-30  8:49                   ` Juri Linkov
  2020-11-30 15:30                   ` Eli Zaretskii
@ 2020-12-01  5:20                   ` Richard Stallman
  2020-12-01 15:46                     ` Eli Zaretskii
  2020-12-03  2:23                     ` Dmitry Gutov
  2 siblings, 2 replies; 61+ messages in thread
From: Richard Stallman @ 2020-12-01  5:20 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, 31796

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > To sum up, there are options, but I don't see a working solution that is 
  > based on GNU Grep.

Can people think of a new feature that would be easy to add to GNU grep
that would make it easy for Dired to handle all cases correctly?

I don't know what the problem is, but if it has to do with parsing the
grep output, here's an idea: an option to tell GNU grep to use quoting
on file names and the match strings, Perhaps in the same way GNU ls
does.

Another idea is an option to output numerical byte positions in the
file instead of the lines that are matched.  Emacs can feed those byte
positions into byte-to-position to convert them into buffer positions.

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-01  1:23                       ` Dmitry Gutov
@ 2020-12-01  8:36                         ` Juri Linkov
  2020-12-01 15:20                           ` Dmitry Gutov
  0 siblings, 1 reply; 61+ messages in thread
From: Juri Linkov @ 2020-12-01  8:36 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, 31796, Jean Louis

>> It is possible to combine with sed:
>> https://www.gnu.org/software/sed/manual/html_node/Text-search-across-multiple-lines.html
>> https://www.gnu.org/software/sed/manual/html_node/Multiline-techniques.html#Multiline-techniques
>
> It's pretty much Chinese to me, sorry.

When I need to grep in multi-line mode I use Ruby, but its modifiers
differ from Perl:

https://regular-expressions.mobi/ruby.html
  /m makes the dot match newlines.  Ruby indeed uses /m, whereas Perl and
  many other programming languages use /s for “dot matches newlines”.

https://www.regular-expressions.info/modifiers.html
  (?s) for “single line mode” makes the dot match all characters,
       including line breaks.  Not supported by Ruby or JavaScript.
  (?m) for “multi-line mode” makes the caret and dollar match at the start
       and end of each line in the subject string.  In Ruby, (?m) makes the
       dot match all characters, without affecting the caret and dollar which
       always match at the start and end of each line in Ruby.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-01  2:21                     ` Dmitry Gutov
@ 2020-12-01  8:39                       ` Juri Linkov
  2020-12-03  2:46                         ` Dmitry Gutov
  0 siblings, 1 reply; 61+ messages in thread
From: Juri Linkov @ 2020-12-01  8:39 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, 31796

>> When a grep input pattern contains a newline, then xref could use
>> the same algorithm as is used for 'M-.', i.e. run 'grep -Pzl'
>> to get the file names that contain the pattern, then return
>> these file names without line numbers.
>
> Do you mean the xref items backed by find-func.el? There are a particular
> kind of references which are usually unique enough that special navigation
> logic can work. It's also implemented this way because the search can be
> performed without reading file contents (which would be slower).

I meant xref-matches-in-files.  It could also use another regexp
for the output of 'grep -Pzo' without line numbers.

>> This works exactly
>> like a new feature of extending xref-show-xrefs-function
>> with a new completion function was proposed recently on emacs-devel
>
> For Grep results, I think the line number is important because we're even
> more likely to have multiple lines with the same contents in one file.

Yes, sometimes this might cause inconvenience when the user wants to visit
the second occurrence of exactly the same line.

> What we *could* do, is run Grep, then take just the list of files names
> that it returns, visit them all in Emacs and repeat the search in all of
> them. But that would require a more complex abstraction than just "search
> command", as well as some juggling of buffers that weren't open before (we
> don't want to add more open buffers just because the user has run a search,
> right?).

dired-do-find-regexp uses 'ignores' to filter out ignored files.
You could add another filter to filter out files without matches
using 'grep -PzL'.

>> (BTW, why it's not installed yet?)
>
> Waiting for the feedback.
>
> It went through several minor revisions. Do you like the most recent
> version? If so, please reply to the message containing it. If you don't,
> please also reply and say why.

I suggest to create a new bug-number for it.

>> So like this feature presenting such completions without line numbers:
>>    lisp/progmodes/project.el:(cl-defgeneric project-root)
>>    lisp/progmodes/project.el:(cl-defmethod project-root ((project (head transient))))
>>    lisp/progmodes/project.el:(cl-defmethod project-root ((project (head vc))))
>> xref for grep could work the same way without line numbers:
>>    lisp/progmodes/project.el:names"^Jproject--read-file-cpd-relative)
>>    lisp/progmodes/project.el:names"^Jproject--read-file-absolute)
>> Then visiting such grep hit should use Emacs search functions
>> to find the grep hit in the visited file.
>
> These are two substrings inside that file that matched the search
> regexp. But there could be substrings in the same file that are equal to
> either of these but don't match said regexp, e.g. because they are preceded
> or followed by some different contents.

How is this possible?  Please show examples.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-01  8:36                         ` Juri Linkov
@ 2020-12-01 15:20                           ` Dmitry Gutov
  0 siblings, 0 replies; 61+ messages in thread
From: Dmitry Gutov @ 2020-12-01 15:20 UTC (permalink / raw)
  To: Juri Linkov; +Cc: abela, 31796, Jean Louis

On 01.12.2020 10:36, Juri Linkov wrote:
>>> It is possible to combine with sed:
>>> https://www.gnu.org/software/sed/manual/html_node/Text-search-across-multiple-lines.html
>>> https://www.gnu.org/software/sed/manual/html_node/Multiline-techniques.html#Multiline-techniques
>>
>> It's pretty much Chinese to me, sorry.
> 
> When I need to grep in multi-line mode I use Ruby, but its modifiers
> differ from Perl:
> 
> https://regular-expressions.mobi/ruby.html
>    /m makes the dot match newlines.  Ruby indeed uses /m, whereas Perl and
>    many other programming languages use /s for “dot matches newlines”.
> 
> https://www.regular-expressions.info/modifiers.html
>    (?s) for “single line mode” makes the dot match all characters,
>         including line breaks.  Not supported by Ruby or JavaScript.
>    (?m) for “multi-line mode” makes the caret and dollar match at the start
>         and end of each line in the subject string.  In Ruby, (?m) makes the
>         dot match all characters, without affecting the caret and dollar which
>         always match at the start and end of each line in Ruby.

Ruby's much easier for me, of course, but it doesn't have the same 
advantage of ubiquity that awk (and, to a lesser extent, perl) have.

Either way, someone would need to write that script.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-01  5:20                   ` Richard Stallman
@ 2020-12-01 15:46                     ` Eli Zaretskii
  2020-12-02  4:26                       ` Richard Stallman
  2020-12-03  2:23                     ` Dmitry Gutov
  1 sibling, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-12-01 15:46 UTC (permalink / raw)
  To: rms; +Cc: abela, 31796, dgutov

> From: Richard Stallman <rms@gnu.org>
> Cc: eliz@gnu.org, abela@chalmers.se, 31796@debbugs.gnu.org
> Date: Tue, 01 Dec 2020 00:20:12 -0500
> 
> Can people think of a new feature that would be easy to add to GNU grep
> that would make it easy for Dired to handle all cases correctly?

Yes: it should detect encoding of each input file (and have a way of
letting the user specify encoding for each file), convert the file's
contents to some internal encoding (probably UTF-8), then report the
hits encoded in UTF-8, regardless of the file's original encoding (and
regardless of the current locale's codeset).

> I don't know what the problem is, but if it has to do with parsing the
> grep output, here's an idea: an option to tell GNU grep to use quoting
> on file names and the match strings, Perhaps in the same way GNU ls
> does.

The problem is not with file names, it's with the matches.  But since
you mention it: Grep should, in this new mode, report file names also
recoded into UTF-8.  In a word, it should arrange for its output be in
a single encoding known in advance, so that front ends like Emacs
won't need to guess the encoding.

> Another idea is an option to output numerical byte positions in the
> file instead of the lines that are matched.  Emacs can feed those byte
> positions into byte-to-position to convert them into buffer positions.

AFAIU, there's already such an option: -b.  However, byte-to-position
works only with UTF-8 encoded files; we need filepos-to-bufferpos
(which requires to know the file's encoding, so we are back at the
same problem).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-01 15:46                     ` Eli Zaretskii
@ 2020-12-02  4:26                       ` Richard Stallman
  2020-12-02 14:56                         ` Eli Zaretskii
  0 siblings, 1 reply; 61+ messages in thread
From: Richard Stallman @ 2020-12-02  4:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, 31796, dgutov

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > AFAIU, there's already such an option: -b.  However, byte-to-position
  > works only with UTF-8 encoded files; we need filepos-to-bufferpos

Oops.

  > (which requires to know the file's encoding, so we are back at the
  > same problem).

If you're going to look at the contents of the file, you have to
visit it, which means you'll know which encoding to use for that file.

Does that make it work?

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)







^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-02  4:26                       ` Richard Stallman
@ 2020-12-02 14:56                         ` Eli Zaretskii
  2020-12-02 17:17                           ` Dmitry Gutov
  0 siblings, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-12-02 14:56 UTC (permalink / raw)
  To: rms; +Cc: abela, 31796, dgutov

> From: Richard Stallman <rms@gnu.org>
> Cc: abela@chalmers.se, 31796@debbugs.gnu.org, dgutov@yandex.ru
> Date: Tue, 01 Dec 2020 23:26:10 -0500
> 
>   > AFAIU, there's already such an option: -b.  However, byte-to-position
>   > works only with UTF-8 encoded files; we need filepos-to-bufferpos
> 
> Oops.
> 
>   > (which requires to know the file's encoding, so we are back at the
>   > same problem).
> 
> If you're going to look at the contents of the file, you have to
> visit it, which means you'll know which encoding to use for that file.

The point is that our heuristics for detecting encoding is not
perfect, so it could fail.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-02 14:56                         ` Eli Zaretskii
@ 2020-12-02 17:17                           ` Dmitry Gutov
  2020-12-02 17:39                             ` Eli Zaretskii
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Gutov @ 2020-12-02 17:17 UTC (permalink / raw)
  To: Eli Zaretskii, rms; +Cc: abela, 31796

On 02.12.2020 16:56, Eli Zaretskii wrote:
> The point is that our heuristics for detecting encoding is not
> perfect, so it could fail.

Do you imagine Grep could use a more reliable detection algorithm?

Although... since it has to scan the full file anyway, it could first do 
a quick detection, and then maybe rescan from the beginning if the 
encoding turns out to be something else.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-02 17:17                           ` Dmitry Gutov
@ 2020-12-02 17:39                             ` Eli Zaretskii
  2020-12-02 17:43                               ` Dmitry Gutov
  0 siblings, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-12-02 17:39 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, rms, 31796

> Cc: abela@chalmers.se, 31796@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 2 Dec 2020 19:17:06 +0200
> 
> On 02.12.2020 16:56, Eli Zaretskii wrote:
> > The point is that our heuristics for detecting encoding is not
> > perfect, so it could fail.
> 
> Do you imagine Grep could use a more reliable detection algorithm?

No, I don't.  But it could allow the user to specify a different
encoding for each file, as in

   grep --encoding=FOO FILES1* --encoding=BAR FILES2*

etc.  And even if it just did the job of the same quality as we do, it
will do it faster, which is why we use Grep in the first place, right?

The important part of the "enhancement" I described is actually the
fact that the output gets encoded in a single encoding, no matter what
was the encoding of the original files.  This makes reading and
decoding the output simple and always correct.

> Although... since it has to scan the full file anyway, it could first do 
> a quick detection, and then maybe rescan from the beginning if the 
> encoding turns out to be something else.

That'd be too late, as some matches were already output.

Grep does begin by scanning a small portion of the file (at least it
did, back when I was familiar with its code), so detection in the same
style as Emacs does should be a natural addition, I think.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-02 17:39                             ` Eli Zaretskii
@ 2020-12-02 17:43                               ` Dmitry Gutov
  2020-12-02 17:47                                 ` Eli Zaretskii
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Gutov @ 2020-12-02 17:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, rms, 31796

On 02.12.2020 19:39, Eli Zaretskii wrote:
>> Cc: abela@chalmers.se, 31796@debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Wed, 2 Dec 2020 19:17:06 +0200
>>
>> On 02.12.2020 16:56, Eli Zaretskii wrote:
>>> The point is that our heuristics for detecting encoding is not
>>> perfect, so it could fail.
>>
>> Do you imagine Grep could use a more reliable detection algorithm?
> 
> No, I don't.  But it could allow the user to specify a different
> encoding for each file, as in
> 
>     grep --encoding=FOO FILES1* --encoding=BAR FILES2*

Not sure we can call it like that in an automated fashion (i.e. in 
project-find-regexp). But hey, somebody else could.

> etc.  And even if it just did the job of the same quality as we do, it
> will do it faster, which is why we use Grep in the first place, right?

That's true.

> The important part of the "enhancement" I described is actually the
> fact that the output gets encoded in a single encoding, no matter what
> was the encoding of the original files.  This makes reading and
> decoding the output simple and always correct.

Yes, OK.

>> Although... since it has to scan the full file anyway, it could first do
>> a quick detection, and then maybe rescan from the beginning if the
>> encoding turns out to be something else.
> 
> That'd be too late, as some matches were already output.

It could buffer them until the full file has been parsed. Encoding 
detection and conversion must add a certain overhead anyway, so I'm not 
sure how expensive the extra buffering would be in comparison.

As a bonus, per-file buffering like that would allow easier 
parallelization of searches.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-02 17:43                               ` Dmitry Gutov
@ 2020-12-02 17:47                                 ` Eli Zaretskii
  2020-12-03  5:26                                   ` Richard Stallman
  0 siblings, 1 reply; 61+ messages in thread
From: Eli Zaretskii @ 2020-12-02 17:47 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, rms, 31796

> Cc: rms@gnu.org, abela@chalmers.se, 31796@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 2 Dec 2020 19:43:52 +0200
> 
> >> Although... since it has to scan the full file anyway, it could first do
> >> a quick detection, and then maybe rescan from the beginning if the
> >> encoding turns out to be something else.
> > 
> > That'd be too late, as some matches were already output.
> 
> It could buffer them until the full file has been parsed. Encoding 
> detection and conversion must add a certain overhead anyway, so I'm not 
> sure how expensive the extra buffering would be in comparison.
> 
> As a bonus, per-file buffering like that would allow easier 
> parallelization of searches.

Buffering means you don't output matches as soon as you find them,
which might be regarded as a kind of regression -- see Richard's bug
reports a few days ago.  And since you never know where in the file
the telltale byte sequences will appear, you will need to always wait
until the entire file is read -- which could be prohibitive for very
large files.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-01  5:20                   ` Richard Stallman
  2020-12-01 15:46                     ` Eli Zaretskii
@ 2020-12-03  2:23                     ` Dmitry Gutov
  1 sibling, 0 replies; 61+ messages in thread
From: Dmitry Gutov @ 2020-12-03  2:23 UTC (permalink / raw)
  To: rms; +Cc: abela, 31796

On 01.12.2020 07:20, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> 
>    > To sum up, there are options, but I don't see a working solution that is
>    > based on GNU Grep.
> 
> Can people think of a new feature that would be easy to add to GNU grep
> that would make it easy for Dired to handle all cases correctly?
> 
> I don't know what the problem is, but if it has to do with parsing the
> grep output, here's an idea: an option to tell GNU grep to use quoting
> on file names and the match strings, Perhaps in the same way GNU ls
> does.

Grep already has that, more or less, with --null. pcregrep doesn't 
(which was my other example).

What Grep could add, however, is a "multiline" matching mode similar to 
what pcregrep and ripgrep have. Meaning, it would allow matches to cross 
newlines (with certain rules on whether "." matches a newline) but 
without requiring the -z mode. So it would still report correct line 
numbers for the matches.

> Another idea is an option to output numerical byte positions in the
> file instead of the lines that are matched.  Emacs can feed those byte
> positions into byte-to-position to convert them into buffer positions.

Like Eli said, that's -b.

But considering Emacs would have to visit each file, to post-process the 
results with byte-to-position, this might turn out to be not much faster 
or easier to implement than simply visiting every file that (according 
to Grep) has matches and repeating the search in Emacs.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-01  8:39                       ` Juri Linkov
@ 2020-12-03  2:46                         ` Dmitry Gutov
  2020-12-06 21:00                           ` Juri Linkov
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Gutov @ 2020-12-03  2:46 UTC (permalink / raw)
  To: Juri Linkov; +Cc: abela, 31796

On 01.12.2020 10:39, Juri Linkov wrote:
>>> When a grep input pattern contains a newline, then xref could use
>>> the same algorithm as is used for 'M-.', i.e. run 'grep -Pzl'
>>> to get the file names that contain the pattern, then return
>>> these file names without line numbers.
>>
>> Do you mean the xref items backed by find-func.el? There are a particular
>> kind of references which are usually unique enough that special navigation
>> logic can work. It's also implemented this way because the search can be
>> performed without reading file contents (which would be slower).
> 
> I meant xref-matches-in-files.

'M-.' doesn't use xref-matches-in-files.

> It could also use another regexp
> for the output of 'grep -Pzo' without line numbers.

Not 100% sure I understand you here, but hopefully this line of 
discussion is continued below.

>>> This works exactly
>>> like a new feature of extending xref-show-xrefs-function
>>> with a new completion function was proposed recently on emacs-devel
>>
>> For Grep results, I think the line number is important because we're even
>> more likely to have multiple lines with the same contents in one file.
> 
> Yes, sometimes this might cause inconvenience when the user wants to visit
> the second occurrence of exactly the same line.

Or 5th or 10th. Where this would be more important, though, is when the 
user will want to change all these lines at once with 
xref-query-replace-in-results.

Also, it'd probably be surprising to see Grep search results without 
line numbers.

>> What we *could* do, is run Grep, then take just the list of files names
>> that it returns, visit them all in Emacs and repeat the search in all of
>> them. But that would require a more complex abstraction than just "search
>> command", as well as some juggling of buffers that weren't open before (we
>> don't want to add more open buffers just because the user has run a search,
>> right?).
> 
> dired-do-find-regexp uses 'ignores' to filter out ignored files.
> You could add another filter to filter out files without matches
> using 'grep -PzL'.

Right. This is sorta a backup plan. Although, when the number of files 
to search can be counted on one hand, there's nothing too bad in doing 
the search in Emacs.

>>> (BTW, why it's not installed yet?)
>>
>> Waiting for the feedback.
>>
>> It went through several minor revisions. Do you like the most recent
>> version? If so, please reply to the message containing it. If you don't,
>> please also reply and say why.
> 
> I suggest to create a new bug-number for it.

If you think it's best. The original thread author decided to write to 
emacs-devel, maybe they're more comfortable there. *shrug*

>>> So like this feature presenting such completions without line numbers:
>>>     lisp/progmodes/project.el:(cl-defgeneric project-root)
>>>     lisp/progmodes/project.el:(cl-defmethod project-root ((project (head transient))))
>>>     lisp/progmodes/project.el:(cl-defmethod project-root ((project (head vc))))
>>> xref for grep could work the same way without line numbers:
>>>     lisp/progmodes/project.el:names"^Jproject--read-file-cpd-relative)
>>>     lisp/progmodes/project.el:names"^Jproject--read-file-absolute)
>>> Then visiting such grep hit should use Emacs search functions
>>> to find the grep hit in the visited file.
>>
>> These are two substrings inside that file that matched the search
>> regexp. But there could be substrings in the same file that are equal to
>> either of these but don't match said regexp, e.g. because they are preceded
>> or followed by some different contents.
> 
> How is this possible?  Please show examples.

Hmm, apparently no examples possible with Grep (which treats all lines 
as independent strings), but if we take ripgrep, or other regexp 
engines, they can use anchors like \A (counterpart to \` in Emacs), or 
PCRE's lookahead/lookbehind. As long as dired-do-find-regexp is 
documented to simply "use constructs supported by your local [search] 
command", the user could take advantage of some advances syntax like that.

Though we might have to limit that capability if the idea of 
post-filtering search results using Emacs's own engine comes to life.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-02 17:47                                 ` Eli Zaretskii
@ 2020-12-03  5:26                                   ` Richard Stallman
  0 siblings, 0 replies; 61+ messages in thread
From: Richard Stallman @ 2020-12-03  5:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: abela, 31796, dgutov

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Subject: bug#31796: 27.1;
  >  dired-do-find-regexp-and-replace fails to find multiline regexps
  > Resent-From: Eli Zaretskii <eliz@gnu.org>
  > Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
  > Resent-CC: bug-gnu-emacs@gnu.org
  > Resent-Sender: help-debbugs@gnu.org
  > To: Dmitry Gutov <dgutov@yandex.ru>
  > Date: Wed, 02 Dec 2020 19:47:43 +0200
  > Message-Id: <83wny0f6bk.fsf@gnu.org>
  > From: Eli Zaretskii <eliz@gnu.org>
  > In-Reply-To: <0646a65f-db21-b377-6897-caeb6ff3e10c@yandex.ru> (message from
  >  Dmitry Gutov on Wed, 2 Dec 2020 19:43:52 +0200)
  > Cc: abela@chalmers.se, rms@gnu.org, 31796@debbugs.gnu.org

  > > Cc: rms@gnu.org, abela@chalmers.se, 31796@debbugs.gnu.org
  > > From: Dmitry Gutov <dgutov@yandex.ru>
  > > Date: Wed, 2 Dec 2020 19:43:52 +0200
  > > 
  > > >> Although... since it has to scan the full file anyway, it could first do
  > > >> a quick detection, and then maybe rescan from the beginning if the
  > > >> encoding turns out to be something else.
  > > > 
  > > > That'd be too late, as some matches were already output.
  > > 
  > > It could buffer them until the full file has been parsed. Encoding 
  > > detection and conversion must add a certain overhead anyway, so I'm not 
  > > sure how expensive the extra buffering would be in comparison.
  > > 
  > > As a bonus, per-file buffering like that would allow easier 
  > > parallelization of searches.

  > Buffering means you don't output matches as soon as you find them,
  > which might be regarded as a kind of regression -- see Richard's bug
  > reports a few days ago.  And since you never know where in the file
  > the telltale byte sequences will appear, you will need to always wait
  > until the entire file is read -- which could be prohibitive for very
  > large files.

In my case, I was definitely going to wait until the search finished,
to see all the responses.

But it is mudh easier to look at them if they come out one by one,
rather than all at once due to buffering.

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)







^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-03  2:46                         ` Dmitry Gutov
@ 2020-12-06 21:00                           ` Juri Linkov
  2020-12-16  3:00                             ` Dmitry Gutov
  0 siblings, 1 reply; 61+ messages in thread
From: Juri Linkov @ 2020-12-06 21:00 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, 31796

>> dired-do-find-regexp uses 'ignores' to filter out ignored files.
>> You could add another filter to filter out files without matches
>> using 'grep -PzL'.
>
> Right. This is sorta a backup plan. Although, when the number of files to
> search can be counted on one hand, there's nothing too bad in doing the
> search in Emacs.

Another backup plan is to use ripgrep.  Its multiline handling with -U
also allows to search words ignoring any whitespace, even newlines.
This is like isearch-lax-whitespace using search-whitespace-regexp
when it contains a newline, e.g. "[ \t\r\n]+".





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-06 21:00                           ` Juri Linkov
@ 2020-12-16  3:00                             ` Dmitry Gutov
  2020-12-16 20:32                               ` Juri Linkov
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Gutov @ 2020-12-16  3:00 UTC (permalink / raw)
  To: Juri Linkov; +Cc: abela, 31796

[-- Attachment #1: Type: text/plain, Size: 1028 bytes --]

On 06.12.2020 23:00, Juri Linkov wrote:
>>> dired-do-find-regexp uses 'ignores' to filter out ignored files.
>>> You could add another filter to filter out files without matches
>>> using 'grep -PzL'.
>> Right. This is sorta a backup plan. Although, when the number of files to
>> search can be counted on one hand, there's nothing too bad in doing the
>> search in Emacs.
> Another backup plan is to use ripgrep.  Its multiline handling with -U
> also allows to search words ignoring any whitespace, even newlines.
> This is like isearch-lax-whitespace using search-whitespace-regexp
> when it contains a newline, e.g. "[ \t\r\n]+".

Right. It has a problem of its own, though: it still outputs a file name 
per line, even when a match is spread across several lines (unlike 
pcregrep). So we're left guessing where a given multiline match ends.

Also, 'sort' doesn't seem to be able to treat both : and \0 as 
separators at the same time.

Here's a rough patch, for illustration. It's kind of working, but I'm 
not loving it.

[-- Attachment #2: ripgrep-multiline.diff --]
[-- Type: text/x-patch, Size: 3754 bytes --]

diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 6e99e9d8ac..6bc03ee727 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -1340,7 +1340,7 @@ xref-search-program-alist
      ;; without the '| sort ...' part if GNU sort is not available on
      ;; your system and/or stable ordering is not important to you.
      ;; Note#2: '!*/' is there to filter out dirs (e.g. submodules).
-     "xargs -0 rg <C> -nH --no-messages -g '!*/' -e <R> | sort -t: -k1,1 -k2n,2"
+     "xargs -0 rg <C> -nH --sort path --no-messages -g '!*/' -e <R>"
      ))
   "Associative list mapping program identifiers to command templates.
 
@@ -1390,6 +1390,7 @@ xref-matches-in-files
        ;; The 'auto' default would be fine too, but ripgrep can't handle
        ;; the options we pass in that case.
        (grep-highlight-matches nil)
+       (multiline (string-match-p "\n" regexp))
        (command (grep-expand-template (cdr
                                        (or
                                         (assoc
@@ -1397,7 +1398,14 @@ xref-matches-in-files
                                          xref-search-program-alist)
                                         (user-error "Unknown search program `%s'"
                                                     xref-search-program)))
-                                      (xref--regexp-to-extended regexp))))
+                                      (xref--regexp-to-extended regexp)
+                                      nil
+                                      nil
+                                      nil
+                                      (when multiline '("-U" "--null")))))
+    (if (and multiline (not (eq xref-search-program 'ripgrep)))
+        (user-error "Sorry, multiline searches are not supported with `%s'"
+                    xref-search-program))
     (when remote-id
       (require 'tramp)
       (setq files (mapcar
@@ -1425,6 +1433,27 @@ xref-matches-in-files
                  (not (looking-at "Binary file .* matches")))
         (user-error "Search failed with status %d: %s" status
                     (buffer-substring (point-min) (line-end-position))))
+      (if multiline
+          (let (match line last-line file)
+            (while (re-search-forward "^\\([^\0]+\\)\\(?:\0\\)\\([0-9]+\\):" nil t)
+              (if (and match
+                       (equal file (match-string 1))
+                       (= (string-to-number (match-string 2))
+                          (1+ last-line)))
+                  (progn
+                    (setq last-line (string-to-number (match-string 2))
+                          match (concat match
+                                        "\n"
+                                        (buffer-substring
+                                         (match-end 0)
+                                         (line-end-position)))))
+                (when match
+                  (push (list line file match) hits))
+                (setq match (buffer-substring (match-end 0) (line-end-position))
+                      file (match-string 1)
+                      line (string-to-number (match-string 2))
+                      last-line line)))
+            (push (list line file match) hits)))
       (while (re-search-forward grep-re nil t)
         (push (list (string-to-number (match-string line-group))
                     (match-string file-group)
@@ -1541,7 +1570,7 @@ xref--collect-matches
                (file (and file (concat remote-id file)))
                (buf (xref--find-file-buffer file))
                (syntax-needed (xref--regexp-syntax-dependent-p regexp)))
-    (if buf
+    (if nil
         (with-current-buffer buf
           (save-excursion
             (goto-char (point-min))

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-16  3:00                             ` Dmitry Gutov
@ 2020-12-16 20:32                               ` Juri Linkov
  2020-12-17  0:40                                 ` Dmitry Gutov
  0 siblings, 1 reply; 61+ messages in thread
From: Juri Linkov @ 2020-12-16 20:32 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: abela, 31796

>> Another backup plan is to use ripgrep.  Its multiline handling with -U
>> also allows to search words ignoring any whitespace, even newlines.
>> This is like isearch-lax-whitespace using search-whitespace-regexp
>> when it contains a newline, e.g. "[ \t\r\n]+".
>
> Right. It has a problem of its own, though: it still outputs a file name
> per line, even when a match is spread across several lines (unlike
> pcregrep). So we're left guessing where a given multiline match ends.
>
> Also, 'sort' doesn't seem to be able to treat both : and \0 as separators
> at the same time.
>
> Here's a rough patch, for illustration.

Thanks, now finally it's possible to search text ignoring whitespace
between words, for example:

  Find regexp: file[ 	
]+names

finds everything correctly, even though current implementation maybe
not the most elegant.

> It's kind of working, but I'm not loving it.

What do you think about using the option `rg --json`?
Emacs has the fast JSON parsing library now, so using
JSON output would be more reliable.





^ permalink raw reply	[flat|nested] 61+ messages in thread

* bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
  2020-12-16 20:32                               ` Juri Linkov
@ 2020-12-17  0:40                                 ` Dmitry Gutov
  0 siblings, 0 replies; 61+ messages in thread
From: Dmitry Gutov @ 2020-12-17  0:40 UTC (permalink / raw)
  To: Juri Linkov; +Cc: abela, 31796

[-- Attachment #1: Type: text/plain, Size: 2225 bytes --]

On 16.12.2020 22:32, Juri Linkov wrote:
>>> Another backup plan is to use ripgrep.  Its multiline handling with -U
>>> also allows to search words ignoring any whitespace, even newlines.
>>> This is like isearch-lax-whitespace using search-whitespace-regexp
>>> when it contains a newline, e.g. "[ \t\r\n]+".
>>
>> Right. It has a problem of its own, though: it still outputs a file name
>> per line, even when a match is spread across several lines (unlike
>> pcregrep). So we're left guessing where a given multiline match ends.
>>
>> Also, 'sort' doesn't seem to be able to treat both : and \0 as separators
>> at the same time.
>>
>> Here's a rough patch, for illustration.
> 
> Thanks, now finally it's possible to search text ignoring whitespace
> between words, for example:
> 
>    Find regexp: file[ 	
> ]+names
> 
> finds everything correctly, even though current implementation maybe
> not the most elegant.
> 
>> It's kind of working, but I'm not loving it.
> 
> What do you think about using the option `rg --json`?
> Emacs has the fast JSON parsing library now, so using
> JSON output would be more reliable.

Very interesting. It returns better data, each multiline match is wholly 
in one entry instead of being spread across lines. Even the matches are 
annotated with match string/length/absolute position.

We should really investigate it, but perhaps a bit later, including our 
capability to parse it quickly when there are a lot of matches (>1000), 
how said byte offsets interact with different file encodings.

Also, its output is not one JSON document but a series of them 
(including ones with just search statistics which we'll want to skip), 
but some re-search-forward followed by (json-parse-buffer) should do the 
trick.

In the meantime, here's a smaller patch using the traditional output 
format. I figure since there is a file name on each line anyway, --null 
doesn't help much. So it can be simplified a little (see attached).

Unfortunately, xref-replace-in-matches is broken for such multiline 
matches. And, of course, it merges together matches on adjacent lines, 
whether they are one match or several (that hasn't changed from the 
previous match). So more investigation is needed.

[-- Attachment #2: ripgrep-multiline.diff --]
[-- Type: text/x-patch, Size: 3815 bytes --]

diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index 6e99e9d8ac..7c0c54e6eb 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -1390,6 +1390,7 @@ xref-matches-in-files
        ;; The 'auto' default would be fine too, but ripgrep can't handle
        ;; the options we pass in that case.
        (grep-highlight-matches nil)
+       (multiline (string-match-p "\n" regexp))
        (command (grep-expand-template (cdr
                                        (or
                                         (assoc
@@ -1397,7 +1398,14 @@ xref-matches-in-files
                                          xref-search-program-alist)
                                         (user-error "Unknown search program `%s'"
                                                     xref-search-program)))
-                                      (xref--regexp-to-extended regexp))))
+                                      (xref--regexp-to-extended regexp)
+                                      nil
+                                      nil
+                                      nil
+                                      (when multiline '("-U")))))
+    (if (and multiline (not (eq xref-search-program 'ripgrep)))
+        (user-error "Sorry, multiline searches are not supported with `%s'"
+                    xref-search-program))
     (when remote-id
       (require 'tramp)
       (setq files (mapcar
@@ -1425,6 +1433,27 @@ xref-matches-in-files
                  (not (looking-at "Binary file .* matches")))
         (user-error "Search failed with status %d: %s" status
                     (buffer-substring (point-min) (line-end-position))))
+      (if multiline
+          (let (match line last-line file)
+            (while (re-search-forward grep-re nil t)
+              (if (and match
+                       (equal file (match-string 1))
+                       (= (string-to-number (match-string 2))
+                          (1+ last-line)))
+                  (progn
+                    (setq last-line (string-to-number (match-string 2))
+                          match (concat match
+                                        "\n"
+                                        (buffer-substring
+                                         (match-end 0)
+                                         (line-end-position)))))
+                (when match
+                  (push (list line file match (1+ (- last-line line))) hits))
+                (setq match (buffer-substring (match-end 0) (line-end-position))
+                      file (match-string 1)
+                      line (string-to-number (match-string 2))
+                      last-line line)))
+            (push (list line file match (1+ (- last-line line))) hits)))
       (while (re-search-forward grep-re nil t)
         (push (list (string-to-number (match-string line-group))
                     (match-string file-group)
@@ -1536,7 +1565,7 @@ xref--convert-hits
       (kill-buffer tmp-buffer))))
 
 (defun xref--collect-matches (hit regexp tmp-buffer)
-  (pcase-let* ((`(,line ,file ,text) hit)
+  (pcase-let* ((`(,line ,file ,text ,lines-num) hit)
                (remote-id (file-remote-p default-directory))
                (file (and file (concat remote-id file)))
                (buf (xref--find-file-buffer file))
@@ -1548,7 +1577,7 @@ xref--collect-matches
             (forward-line (1- line))
             (xref--collect-matches-1 regexp file line
                                      (line-beginning-position)
-                                     (line-end-position)
+                                     (line-end-position (or lines-num 1))
                                      syntax-needed)))
       ;; Using the temporary buffer is both a performance and a buffer
       ;; management optimization.

^ permalink raw reply related	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2020-12-17  0:40 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <<CADy8Bt=f=LOE6ODLhhW7ZS6qXRQCzd15Hd0eFKVO8qok98ni8w@mail.gmail.com>
     [not found] ` <<10120030-8b8d-b702-add4-8f099f934ed5@chalmers.se>
     [not found]   ` <<91c98791-9df2-43ee-9aac-205c5b0de9c2@default>
     [not found]     ` <<87blfm6922.fsf@mail.linkov.net>
     [not found]       ` <<838saqtsm9.fsf@gnu.org>
2020-11-24 20:32         ` bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps Drew Adams
     [not found]         ` <<87mtz64htw.fsf@mail.linkov.net>
     [not found]           ` <<831rgitqe2.fsf@gnu.org>
2020-11-24 21:35             ` Drew Adams
2018-06-11 18:58 bug#31796: 26.1; " Žygimantas Bruzgys
2018-06-12 10:17 ` Noam Postavsky
2020-11-23 21:25   ` Dmitry Gutov
2020-11-23  9:09 ` bug#31796: 27.1; " Andreas Abel
2020-11-23 15:23   ` Eli Zaretskii
2020-11-23 16:16   ` Drew Adams
2020-11-23 21:22     ` Dmitry Gutov
2020-11-24 19:28     ` Juri Linkov
2020-11-24 20:12       ` Drew Adams
2020-11-25  7:31         ` Juri Linkov
2020-11-25 17:37           ` Drew Adams
2020-11-24 20:19       ` Eli Zaretskii
2020-11-24 20:31         ` Juri Linkov
2020-11-24 20:51           ` Drew Adams
2020-11-24 21:07           ` Eli Zaretskii
2020-11-25  7:28             ` Juri Linkov
2020-11-25 15:48               ` Eli Zaretskii
2020-11-25 20:18                 ` Juri Linkov
2020-11-25 20:30                   ` Eli Zaretskii
2020-11-29  2:30                     ` Dmitry Gutov
2020-11-29 15:22                       ` Eli Zaretskii
2020-11-23 21:28   ` Dmitry Gutov
2020-11-23 23:49     ` Andreas Abel
2020-11-24  0:13       ` Dmitry Gutov
2020-11-24  1:19         ` Dmitry Gutov
2020-11-24 15:16       ` Eli Zaretskii
2020-11-24 15:43         ` Dmitry Gutov
2020-11-24 16:35           ` Eli Zaretskii
2020-11-24 19:43             ` Dmitry Gutov
2020-11-24 20:16               ` Eli Zaretskii
2020-11-30  2:25                 ` Dmitry Gutov
2020-11-30  8:49                   ` Juri Linkov
2020-12-01  2:21                     ` Dmitry Gutov
2020-12-01  8:39                       ` Juri Linkov
2020-12-03  2:46                         ` Dmitry Gutov
2020-12-06 21:00                           ` Juri Linkov
2020-12-16  3:00                             ` Dmitry Gutov
2020-12-16 20:32                               ` Juri Linkov
2020-12-17  0:40                                 ` Dmitry Gutov
2020-11-30 15:30                   ` Eli Zaretskii
2020-11-30 15:39                     ` Jean Louis
2020-11-30 16:36                       ` Eli Zaretskii
2020-11-30 15:42                     ` Jean Louis
2020-12-01  1:23                       ` Dmitry Gutov
2020-12-01  8:36                         ` Juri Linkov
2020-12-01 15:20                           ` Dmitry Gutov
2020-12-01  1:24                     ` Dmitry Gutov
2020-12-01  5:20                   ` Richard Stallman
2020-12-01 15:46                     ` Eli Zaretskii
2020-12-02  4:26                       ` Richard Stallman
2020-12-02 14:56                         ` Eli Zaretskii
2020-12-02 17:17                           ` Dmitry Gutov
2020-12-02 17:39                             ` Eli Zaretskii
2020-12-02 17:43                               ` Dmitry Gutov
2020-12-02 17:47                                 ` Eli Zaretskii
2020-12-03  5:26                                   ` Richard Stallman
2020-12-03  2:23                     ` Dmitry Gutov
2020-11-24 19:29     ` Juri Linkov
2020-11-24 19:39       ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).