unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* How to grep for a string spanning multiple lines?
@ 2022-11-26  6:42 Marcin Borkowski
  2022-11-26  8:27 ` Jean Louis
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Marcin Borkowski @ 2022-11-26  6:42 UTC (permalink / raw)
  To: Help Gnu Emacs mailing list

Hi all,

assume I have a file (probably an Org mode one) with some stuff
I archived from the 'net.  (I'm going to start to sue
youtube-sub-extractor.el.)  Here is my problem: assume I remember that
someone in some video said something, and I want to find that part.
However, it turns out that it is split between two (or more) lines.

Traditional `grep' is not helpful in this situation.  Neither is
isearch, nor swiper.  One idea would be to convert the subtitles to one
long line (which is an option), but are there any other ways to search
for a string spanning more than one line (and not knowing which words
are separated by a space and which ones by a newline)?

Both Emacs-y and shell-y tools would be appreciated.

TIA,

-- 
Marcin Borkowski
http://mbork.pl



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-26  6:42 How to grep for a string spanning multiple lines? Marcin Borkowski
@ 2022-11-26  8:27 ` Jean Louis
  2022-11-26  8:36 ` tomas
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Jean Louis @ 2022-11-26  8:27 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list

* Marcin Borkowski <mbork@mbork.pl> [2022-11-26 09:45]:
> Hi all,
> 
> assume I have a file (probably an Org mode one) with some stuff
> I archived from the 'net.  (I'm going to start to sue
> youtube-sub-extractor.el.)  Here is my problem: assume I remember that
> someone in some video said something, and I want to find that part.
> However, it turns out that it is split between two (or more) lines.
> 
> Traditional `grep' is not helpful in this situation.  Neither is
> isearch, nor swiper.  One idea would be to convert the subtitles to one
> long line (which is an option), but are there any other ways to search
> for a string spanning more than one line (and not knowing which words
> are separated by a space and which ones by a newline)?
> 
> Both Emacs-y and shell-y tools would be appreciated.

I was just searching for pattern here:

(define-skeleton with-tabulated-id
  "Helper skeleton for `tabulated-list-get-id'"
  nil
  "(defun " (skeleton-read "Function name: ") " (" (skeleton-read "Arguments: " "&optional id") ")
  \"\"
  (interactive)
  (when-tabulated-id \"" (skeleton-read "Table: ") "\"
     ))")

by using this method and `grep':

$ grep -Pazo 'with-tabulated-id.*\n.*Helper' rcd-cf.el
with-tabulated-id
  "Helper~/Programming/emacs-lisp

As advised here:
https://stackoverflow.com/questions/152708/how-can-i-search-for-a-multiline-pattern-in-a-file

So grep can find patterns across multiple lines.

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-26  6:42 How to grep for a string spanning multiple lines? Marcin Borkowski
  2022-11-26  8:27 ` Jean Louis
@ 2022-11-26  8:36 ` tomas
  2022-11-26  8:43   ` Jean Louis
  2022-11-26 10:57 ` Arthur Miller
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: tomas @ 2022-11-26  8:36 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1049 bytes --]

On Sat, Nov 26, 2022 at 07:42:59AM +0100, Marcin Borkowski wrote:
> Hi all,
> 
> assume I have a file (probably an Org mode one) with some stuff
> I archived from the 'net.  (I'm going to start to sue
> youtube-sub-extractor.el.)  Here is my problem: assume I remember that
> someone in some video said something, and I want to find that part.
> However, it turns out that it is split between two (or more) lines.
> 
> Traditional `grep' is not helpful in this situation.  Neither is
> isearch, nor swiper.  One idea would be to convert the subtitles to one
> long line (which is an option), but are there any other ways to search
> for a string spanning more than one line (and not knowing which words
> are separated by a space and which ones by a newline)?
> 
> Both Emacs-y and shell-y tools would be appreciated.

Note that, at least, in Emacs, the POSIX character class [:space:]
also matches line breaks. So if you always use [[:space:]]+ to
separate your words, you might find what you are looking for.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-26  8:36 ` tomas
@ 2022-11-26  8:43   ` Jean Louis
  2022-11-26  8:59     ` Eli Zaretskii
  0 siblings, 1 reply; 26+ messages in thread
From: Jean Louis @ 2022-11-26  8:43 UTC (permalink / raw)
  To: tomas; +Cc: help-gnu-emacs

* tomas@tuxteam.de <tomas@tuxteam.de> [2022-11-26 11:37]:
> Note that, at least, in Emacs, the POSIX character class [:space:]
> also matches line breaks. So if you always use [[:space:]]+ to
> separate your words, you might find what you are looking for.

Just that for some reason it does not work as expected:

(string-match "[[:space:]]+" "Hello\nthere") ➜ nil

(string-match "[[:space:]]+" "Hello
there") ➜ nil

(xr "[[:space:]]+") ➜ (one-or-more space)

(rx (one-or-more space)) ➜ "[[:space:]]+"

That is why I had to make this:

(defun rcd-string-clean-whitespace (s)
  "Return trimmed string S after cleaning whitespaces."
  (replace-regexp-in-string 
   (rx (one-or-more (or "\n" (any whitespace))))
   " "
   (string-trim s)))

as then it works as expected:

(rcd-string-clean-whitespace "Hello\nthere") ➜ "Hello there"

Because "[[:space:]]+" does not include "\n" that I know:

(replace-regexp-in-string "[[:space:]]+" " " "Hello\nthere") ➜ "Hello
there"



-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-26  8:43   ` Jean Louis
@ 2022-11-26  8:59     ` Eli Zaretskii
  2022-11-26  9:06       ` Jean Louis
  0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-11-26  8:59 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Sat, 26 Nov 2022 11:43:47 +0300
> From: Jean Louis <bugs@gnu.support>
> Cc: help-gnu-emacs@gnu.org
> 
> Because "[[:space:]]+" does not include "\n" that I know:

Read its documentation carefully, and you will realize that whether it does
or doesn't include a newline depends on the syntax table of the current
buffer.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-26  8:59     ` Eli Zaretskii
@ 2022-11-26  9:06       ` Jean Louis
  2022-11-27  7:31         ` Michael Heerdegen
  0 siblings, 1 reply; 26+ messages in thread
From: Jean Louis @ 2022-11-26  9:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

* Eli Zaretskii <eliz@gnu.org> [2022-11-26 12:00]:
> > Date: Sat, 26 Nov 2022 11:43:47 +0300
> > From: Jean Louis <bugs@gnu.support>
> > Cc: help-gnu-emacs@gnu.org
> > 
> > Because "[[:space:]]+" does not include "\n" that I know:
> 
> Read its documentation carefully, and you will realize that whether it does
> or doesn't include a newline depends on the syntax table of the current
> buffer.

Thanks.

It means it is not to be used to search for new lines in program, as
programs may not have syntax tables.

Then this is solution:

(rx (one-or-more (or "\n" (any whitespace)))) ➜ "\\(?:
\\|[[:space:]]\\)+"


--
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-26  6:42 How to grep for a string spanning multiple lines? Marcin Borkowski
  2022-11-26  8:27 ` Jean Louis
  2022-11-26  8:36 ` tomas
@ 2022-11-26 10:57 ` Arthur Miller
  2022-11-26 14:55 ` Emanuel Berg
  2022-12-04 21:55 ` Rudolf Adamkovič
  4 siblings, 0 replies; 26+ messages in thread
From: Arthur Miller @ 2022-11-26 10:57 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: Help Gnu Emacs mailing list

Marcin Borkowski <mbork@mbork.pl> writes:

> Hi all,
>
> assume I have a file (probably an Org mode one) with some stuff
> I archived from the 'net.  (I'm going to start to sue
> youtube-sub-extractor.el.)  Here is my problem: assume I remember that
> someone in some video said something, and I want to find that part.
> However, it turns out that it is split between two (or more) lines.
>
> Traditional `grep' is not helpful in this situation.  Neither is
> isearch, nor swiper.  One idea would be to convert the subtitles to one
> long line (which is an option), but are there any other ways to search
> for a string spanning more than one line (and not knowing which words
> are separated by a space and which ones by a newline)?
>
> Both Emacs-y and shell-y tools would be appreciated.
>
> TIA,

If your plans are to use regex search & friends in Emacs Lisp, then
matching over multiple lines can be a bit tricky. Out of the box,
without trixing with syntax tables (which I am not familiar with), ".*"
will match only to the end of the line. You can use ".*\\(\n.*\\)*".

I have learned it from the Wiki page:
https://www.emacswiki.org/emacs/MultilineRegexp . I don't know if you
are aware of it already or not, hope it helps.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-26  6:42 How to grep for a string spanning multiple lines? Marcin Borkowski
                   ` (2 preceding siblings ...)
  2022-11-26 10:57 ` Arthur Miller
@ 2022-11-26 14:55 ` Emanuel Berg
  2022-11-27  6:54   ` Marcin Borkowski
  2022-12-04 21:55 ` Rudolf Adamkovič
  4 siblings, 1 reply; 26+ messages in thread
From: Emanuel Berg @ 2022-11-26 14:55 UTC (permalink / raw)
  To: help-gnu-emacs

Marcin Borkowski wrote:

> I remember that someone in some video said something, and
> I want to find that part. However, it turns out that it is
> split between two (or more) lines.

(re-search-forward ";; Hello my friend\n;; stay a while, and listen")

;; Hello my friend
;; stay a while, and listen

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-26 14:55 ` Emanuel Berg
@ 2022-11-27  6:54   ` Marcin Borkowski
  2022-11-27  7:26     ` Jean Louis
  2022-11-27 13:48     ` Emanuel Berg
  0 siblings, 2 replies; 26+ messages in thread
From: Marcin Borkowski @ 2022-11-27  6:54 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: help-gnu-emacs


On 2022-11-26, at 15:55, Emanuel Berg <incal@dataswamp.org> wrote:

> Marcin Borkowski wrote:
>
>> I remember that someone in some video said something, and
>> I want to find that part. However, it turns out that it is
>> split between two (or more) lines.
>
> (re-search-forward ";; Hello my friend\n;; stay a while, and listen")
>
> ;; Hello my friend
> ;; stay a while, and listen

Well, that's obviously cheating.  The real problem is how to search when
you have no idea where you have spaces and where you have newlines...

-- 
Marcin Borkowski
http://mbork.pl



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-27  6:54   ` Marcin Borkowski
@ 2022-11-27  7:26     ` Jean Louis
  2022-11-27 13:48     ` Emanuel Berg
  1 sibling, 0 replies; 26+ messages in thread
From: Jean Louis @ 2022-11-27  7:26 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: Emanuel Berg, help-gnu-emacs


(re-search-forward "friend.*\n.*while")

* Marcin Borkowski <mbork@mbork.pl> [2022-11-27 09:56]:
> On 2022-11-26, at 15:55, Emanuel Berg <incal@dataswamp.org> wrote:
> 
> > Marcin Borkowski wrote:
> >
> >> I remember that someone in some video said something, and
> >> I want to find that part. However, it turns out that it is
> >> split between two (or more) lines.
> >
> > (re-search-forward ";; Hello my friend\n;; stay a while, and listen")
> >
> > ;; Hello my friend
> > ;; stay a while, and listen
> 
> Well, that's obviously cheating.  The real problem is how to search when
> you have no idea where you have spaces and where you have
> newlines...

Problem is so much less when using regular expressions, see above.


-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-26  9:06       ` Jean Louis
@ 2022-11-27  7:31         ` Michael Heerdegen
  2022-11-27  7:44           ` Jean Louis
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Heerdegen @ 2022-11-27  7:31 UTC (permalink / raw)
  To: help-gnu-emacs

Jean Louis <bugs@gnu.support> writes:

> It means it is not to be used to search for new lines in program, as
> programs may not have syntax tables.
>
> Then this is solution:
>
> (rx (one-or-more (or "\n" (any whitespace)))) ➜ "\\(?:
> \\|[[:space:]]\\)+"

`rx' has a keyword to mach any char (including whitespace), it's
called `anychar'.

Michael.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-27  7:31         ` Michael Heerdegen
@ 2022-11-27  7:44           ` Jean Louis
  2022-11-27 12:04             ` Michael Heerdegen
  0 siblings, 1 reply; 26+ messages in thread
From: Jean Louis @ 2022-11-27  7:44 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: help-gnu-emacs

* Michael Heerdegen <michael_heerdegen@web.de> [2022-11-27 10:32]:
> Jean Louis <bugs@gnu.support> writes:
> 
> > It means it is not to be used to search for new lines in program, as
> > programs may not have syntax tables.
> >
> > Then this is solution:
> >
> > (rx (one-or-more (or "\n" (any whitespace)))) ➜ "\\(?:
> > \\|[[:space:]]\\)+"
> 
> `rx' has a keyword to mach any char (including whitespace), it's
> called `anychar'.

OK sure, thanks. Though I can't see relation from any char to
searching for whitespaces including new line. How do you mean it?

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-27  7:44           ` Jean Louis
@ 2022-11-27 12:04             ` Michael Heerdegen
  2022-11-27 18:25               ` Jean Louis
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Heerdegen @ 2022-11-27 12:04 UTC (permalink / raw)
  To: help-gnu-emacs

Jean Louis <bugs@gnu.support> writes:

> > > (rx (one-or-more (or "\n" (any whitespace)))) ➜ "\\(?:
> > > \\|[[:space:]]\\)+"
> > 
> > `rx' has a keyword to mach any char (including whitespace), it's
> > called `anychar'.
>
> OK sure, thanks. Though I can't see relation from any char to
> searching for whitespaces including new line. How do you mean it?

Sorry, I was a bit confused by the fact that this been discussed not
that long ago, some weeks maybe?  AFAIR I there said one could use

  (rx (any whitespace ?\n))

Michael.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-27  6:54   ` Marcin Borkowski
  2022-11-27  7:26     ` Jean Louis
@ 2022-11-27 13:48     ` Emanuel Berg
  2022-11-27 18:10       ` tomas
  1 sibling, 1 reply; 26+ messages in thread
From: Emanuel Berg @ 2022-11-27 13:48 UTC (permalink / raw)
  To: help-gnu-emacs

Marcin Borkowski wrote:

>>> I remember that someone in some video said something, and
>>> I want to find that part. However, it turns out that it is
>>> split between two (or more) lines.
>>
>> (re-search-forward ";; Hello my friend\n;; stay a while, and listen")
>>
>> ;; Hello my friend
>> ;; stay a while, and listen
>
> Well, that's obviously cheating. The real problem is how to
> search when you have no idea where you have spaces and where
> you have newlines...

Okay, I didn't understand ... you want to search for something
you or someone else said (wrote), but you only remember the
words, not punctuation, whitespace etc?

It can be done with regular expressions of course, but ...
I wonder if it isn't better to have some "inexact" search
algorithm which quantifies the proximity or score of each
inexact hit in a set of results, so that - if I just make
things up for a possible search for "Hello my friend" in
a body of text where it doesn't quite exist - it could produce
fallout such as

  results           score
  -----------------------
  Hello my Fräulein   89%
  Hello MySQL         28%
  Go to Helvetia       3%

Because the advantage would then also be that even the words
wouldn't have to be exact!

Interesting, I'd like that myself very much!

Do we have that?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-27 13:48     ` Emanuel Berg
@ 2022-11-27 18:10       ` tomas
  2022-11-27 19:04         ` Emanuel Berg
  2022-11-27 19:46         ` [External] : " Drew Adams
  0 siblings, 2 replies; 26+ messages in thread
From: tomas @ 2022-11-27 18:10 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

On Sun, Nov 27, 2022 at 02:48:52PM +0100, Emanuel Berg wrote:

[...]

> It can be done with regular expressions of course, but ...
> I wonder if it isn't better to have some "inexact" search
> algorithm [...]

> Because the advantage would then also be that even the words
> wouldn't have to be exact!
> 
> Interesting, I'd like that myself very much!
> 
> Do we have that?

You might enjoy agrep, then. If you want to go further, do some
research on weighted Levenshtein distance (there /is/ an elisp
function named `string-distance` for you, then).

Beyond that, you are firmly in "information retrieval" [1] land,
with its diverse and colourful landscape :-)

Cheers

[1] https://en.wikipedia.org/wiki/Information_retrieval
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-27 12:04             ` Michael Heerdegen
@ 2022-11-27 18:25               ` Jean Louis
  0 siblings, 0 replies; 26+ messages in thread
From: Jean Louis @ 2022-11-27 18:25 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: help-gnu-emacs

* Michael Heerdegen <michael_heerdegen@web.de> [2022-11-27 15:06]:
> Jean Louis <bugs@gnu.support> writes:
> 
> > > > (rx (one-or-more (or "\n" (any whitespace)))) ➜ "\\(?:
> > > > \\|[[:space:]]\\)+"
> > > 
> > > `rx' has a keyword to mach any char (including whitespace), it's
> > > called `anychar'.
> >
> > OK sure, thanks. Though I can't see relation from any char to
> > searching for whitespaces including new line. How do you mean it?
> 
> Sorry, I was a bit confused by the fact that this been discussed not
> that long ago, some weeks maybe?  AFAIR I there said one could use
> 
>   (rx (any whitespace ?\n))

That is shorter, I will use it.

-- 
Jean

Take action in Free Software Foundation campaigns:
https://www.fsf.org/campaigns

In support of Richard M. Stallman
https://stallmansupport.org/



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-27 18:10       ` tomas
@ 2022-11-27 19:04         ` Emanuel Berg
  2022-11-27 19:46         ` [External] : " Drew Adams
  1 sibling, 0 replies; 26+ messages in thread
From: Emanuel Berg @ 2022-11-27 19:04 UTC (permalink / raw)
  To: help-gnu-emacs

tomas wrote:

>> It can be done with regular expressions of course, but ...
>> I wonder if it isn't better to have some "inexact" search
>> algorithm [...]
>
>> Because the advantage would then also be that even the
>> words wouldn't have to be exact!
>> 
>> Interesting, I'd like that myself very much!
>> 
>> Do we have that?
>
> You might enjoy agrep, then. If you want to go further, do
> some research on weighted Levenshtein distance (there /is/
> an elisp function named `string-distance` for you, then).
>
> Beyond that, you are firmly in "information retrieval" land,
> with its diverse and colourful landscape :-)
>
>   https://en.wikipedia.org/wiki/Information_retrieval

Okay, so agrep, weighted Levenshtein distance,
`string-distance' and information retrieval ...

There is an endless world to rediscover :)

Thanks, get back to you, maybe.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [External] : Re: How to grep for a string spanning multiple lines?
  2022-11-27 18:10       ` tomas
  2022-11-27 19:04         ` Emanuel Berg
@ 2022-11-27 19:46         ` Drew Adams
  2022-11-28  5:07           ` tomas
  1 sibling, 1 reply; 26+ messages in thread
From: Drew Adams @ 2022-11-27 19:46 UTC (permalink / raw)
  To: tomas@tuxteam.de, help-gnu-emacs@gnu.org

> You might enjoy agrep, then. If you want to go further, do some
> research on weighted Levenshtein distance (there /is/ an elisp
> function named `string-distance` for you, then).

(Caveat: Not following this thread.)

FYI, there is Elisp library `levenshtein.el':

https://www.emacswiki.org/emacs/download/levenshtein.el


(FYI2: Icicles uses that library, if you have it:

https://www.emacswiki.org/emacs/Icicles_-_Completion_Methods_and_Styles#LevenshteinCompletion)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [External] : Re: How to grep for a string spanning multiple lines?
  2022-11-27 19:46         ` [External] : " Drew Adams
@ 2022-11-28  5:07           ` tomas
  2022-11-28  6:17             ` Drew Adams
  2022-11-28 21:05             ` Stefan Monnier via Users list for the GNU Emacs text editor
  0 siblings, 2 replies; 26+ messages in thread
From: tomas @ 2022-11-28  5:07 UTC (permalink / raw)
  To: Drew Adams; +Cc: help-gnu-emacs@gnu.org

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

On Sun, Nov 27, 2022 at 07:46:38PM +0000, Drew Adams wrote:
> > You might enjoy agrep, then. If you want to go further, do some
> > research on weighted Levenshtein distance (there /is/ an elisp
> > function named `string-distance` for you, then).
> 
> (Caveat: Not following this thread.)
> 
> FYI, there is Elisp library `levenshtein.el':
> 
> https://www.emacswiki.org/emacs/download/levenshtein.el

Now to find the differences to the built-in `string-distance'. So much
to do ;-)

> (FYI2: Icicles uses that library, if you have it:

(why not the built-in? Did it came later, or is it less useful?)

Thanks
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [External] : Re: How to grep for a string spanning multiple lines?
  2022-11-28  5:07           ` tomas
@ 2022-11-28  6:17             ` Drew Adams
  2022-11-29  2:00               ` Emanuel Berg
  2022-11-28 21:05             ` Stefan Monnier via Users list for the GNU Emacs text editor
  1 sibling, 1 reply; 26+ messages in thread
From: Drew Adams @ 2022-11-28  6:17 UTC (permalink / raw)
  To: tomas@tuxteam.de; +Cc: help-gnu-emacs@gnu.org

> > (FYI2: Icicles uses that library, if you have it:
> 
> (why not the built-in [`string-difference']?
> Did it came later, or is it less useful?)

Icicles has had Levenshtein "fuzzy" matching
since 2009.  I just use the `levenshtein.el'
function `levenshtein-distance'.  No doubt
`string-distance' could now be used instead.
___

As the Icicles doc says, for most purposes,
Levenshtein matching isn't all that useful
for completion, IMO.  See the URL I pointed to:

https://www.emacswiki.org/emacs/Icicles_-_Completion_Methods_and_Styles#LevenshteinCompletion

First, incremental completion (rematching
each time you type or delete a char) is
important to using Icicles, and ordinary
Levenshtein matching against completion
candidates is often too slow for that.

Comparing, not string1 vs string2 (a "strict"
match), but string1 vs substrings of string2,
is quicker.  But even such non-strict
Levenshtein matching can be too slow.  (You
can switch between strict and non-strict on
the fly, while completing.)

What's often most useful is matching only
within 1 Levenshtein unit (a 1-char change),
testing that with a regexp that matches
strings at most one unit from substrings of
the target string.

For this reason, the default value of option
`icicle-levenshtein-distance' is 1.  A string
matches another if the first is within this
distance of the second (if strict) or of some
substring of the second (if non-strict).

Icicles provides seven kinds of "fuzzy"
matching, which you can switch to on the
fly, but I personally don't think fuzzy
matching is so useful for completion, in
general.

https://www.emacswiki.org/emacs/Icicles_-_Completion_Methods_and_Styles

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [External] : Re: How to grep for a string spanning multiple lines?
  2022-11-28  5:07           ` tomas
  2022-11-28  6:17             ` Drew Adams
@ 2022-11-28 21:05             ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-11-28 21:12               ` Emanuel Berg
  2022-11-28 21:17               ` Emanuel Berg
  1 sibling, 2 replies; 26+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-11-28 21:05 UTC (permalink / raw)
  To: help-gnu-emacs

>> https://www.emacswiki.org/emacs/download/levenshtein.el
> Now to find the differences to the built-in `string-distance'.

That's easy:

    (string-distance "string-distance" "levenshtein")  =>  13


-- Stefan




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [External] : Re: How to grep for a string spanning multiple lines?
  2022-11-28 21:05             ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-11-28 21:12               ` Emanuel Berg
  2022-11-28 21:17               ` Emanuel Berg
  1 sibling, 0 replies; 26+ messages in thread
From: Emanuel Berg @ 2022-11-28 21:12 UTC (permalink / raw)
  To: help-gnu-emacs

Stefan Monnier via Users list for the GNU Emacs text editor wrote:

>>> https://www.emacswiki.org/emacs/download/levenshtein.el
>>
>> Now to find the differences to the built-in `string-distance'.
>
> That's easy:
>
>     (string-distance "string-distance" "levenshtein")  =>  13

Haha :D

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [External] : Re: How to grep for a string spanning multiple lines?
  2022-11-28 21:05             ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-11-28 21:12               ` Emanuel Berg
@ 2022-11-28 21:17               ` Emanuel Berg
  1 sibling, 0 replies; 26+ messages in thread
From: Emanuel Berg @ 2022-11-28 21:17 UTC (permalink / raw)
  To: help-gnu-emacs

Stefan Monnier via Users list for the GNU Emacs text editor wrote:

>> Now to find the differences to the built-in `string-distance'.
>
> That's easy:
>
>     (string-distance "string-distance" "levenshtein")  =>  13

But how do search a whole buffer with `string-distance'?

I mean it's easy enough to eat your way thru the buffer
top-to-bottom, then compare and compile, but by what unit?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [External] : Re: How to grep for a string spanning multiple lines?
  2022-11-28  6:17             ` Drew Adams
@ 2022-11-29  2:00               ` Emanuel Berg
  0 siblings, 0 replies; 26+ messages in thread
From: Emanuel Berg @ 2022-11-29  2:00 UTC (permalink / raw)
  To: help-gnu-emacs

Like this maybe?

;;; -*- lexical-binding: t -*-
;;
;; this file:
;;   https://dataswamp.org/~incal/emacs-init/psea.el

(require 'cl-lib)

(defun pattern-search (str &optional beg end)
  (interactive `(,(read-string "search: ")
                 ,@(when (use-region-p)
                     (list (region-beginning)
                           (region-end) ))))
  (or beg (setq beg (point-min)))
  (or end (setq end (point-max)))
  (cl-loop
     with len = (length str)
     for c from beg to (- end len)
     with hit
     with hit-dist
     with best
     with best-dist = len
     do (setq hit (buffer-substring-no-properties c (+ c len)))
        (setq hit-dist (string-distance hit str))
        (when (< hit-dist best-dist)
          (setq best hit)
          (setq best-dist hit-dist) )
     finally (message "%s" (list best best-dist)) ))

(defalias 'psea #'pattern-search)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-11-26  6:42 How to grep for a string spanning multiple lines? Marcin Borkowski
                   ` (3 preceding siblings ...)
  2022-11-26 14:55 ` Emanuel Berg
@ 2022-12-04 21:55 ` Rudolf Adamkovič
  2022-12-05 23:06   ` Emanuel Berg
  4 siblings, 1 reply; 26+ messages in thread
From: Rudolf Adamkovič @ 2022-12-04 21:55 UTC (permalink / raw)
  To: Marcin Borkowski, Help Gnu Emacs mailing list

Marcin Borkowski <mbork@mbork.pl> writes:

> However, it turns out that it is split between two (or more)
> lines. [...] Traditional `grep' is not helpful in this situation.
> Neither is isearch, nor swiper.

I have the following in my literate Emacs configuration file:

  * Isearch configuration
  
  [...]
  
  Search across spaces, tabs, and newlines.
  
  #+begin_src emacs-lisp
  (setq isearch-lax-whitespace t
        isearch-regexp-lax-whitespace t
        search-whitespace-regexp "[ \t\r\n]+")
  #+end_src

Perhaps that could help you?

Rudy
-- 
"Chop your own wood and it will warm you twice."
-- Henry Ford; Francis Kinloch, 1819; Henry David Thoreau, 1854

Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to grep for a string spanning multiple lines?
  2022-12-04 21:55 ` Rudolf Adamkovič
@ 2022-12-05 23:06   ` Emanuel Berg
  0 siblings, 0 replies; 26+ messages in thread
From: Emanuel Berg @ 2022-12-05 23:06 UTC (permalink / raw)
  To: help-gnu-emacs

Rudolf Adamkovič wrote:

>     (setq isearch-lax-whitespace t
>           isearch-regexp-lax-whitespace t
>           search-whitespace-regexp "[ \t\r\n]+")
>
> Perhaps that could help you?

Clever!

There is a character class for that, but maybe it's better to
do it that way.

Also ?\s is whitespace but again maybe it's better to do it
your way ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-12-05 23:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-26  6:42 How to grep for a string spanning multiple lines? Marcin Borkowski
2022-11-26  8:27 ` Jean Louis
2022-11-26  8:36 ` tomas
2022-11-26  8:43   ` Jean Louis
2022-11-26  8:59     ` Eli Zaretskii
2022-11-26  9:06       ` Jean Louis
2022-11-27  7:31         ` Michael Heerdegen
2022-11-27  7:44           ` Jean Louis
2022-11-27 12:04             ` Michael Heerdegen
2022-11-27 18:25               ` Jean Louis
2022-11-26 10:57 ` Arthur Miller
2022-11-26 14:55 ` Emanuel Berg
2022-11-27  6:54   ` Marcin Borkowski
2022-11-27  7:26     ` Jean Louis
2022-11-27 13:48     ` Emanuel Berg
2022-11-27 18:10       ` tomas
2022-11-27 19:04         ` Emanuel Berg
2022-11-27 19:46         ` [External] : " Drew Adams
2022-11-28  5:07           ` tomas
2022-11-28  6:17             ` Drew Adams
2022-11-29  2:00               ` Emanuel Berg
2022-11-28 21:05             ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-11-28 21:12               ` Emanuel Berg
2022-11-28 21:17               ` Emanuel Berg
2022-12-04 21:55 ` Rudolf Adamkovič
2022-12-05 23:06   ` Emanuel Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).