unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Advanced search with wildcard using notmuch for mutt
@ 2023-11-30  2:18 io
  2023-11-30 11:53 ` Michael J Gruber
  0 siblings, 1 reply; 9+ messages in thread
From: io @ 2023-11-30  2:18 UTC (permalink / raw)
  To: notmuch




i have an html email with this sentence 'xycfe11cg64d_2501034012' within the body of the message.
no result found when i search for '2501034012'
i have even tried using '*2501034012*' (wildcard)

Your help is appreciated.
Thank you.


-- 
Best regards,
io

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advanced search with wildcard using notmuch for mutt
  2023-11-30  2:18 Advanced search with wildcard using notmuch for mutt io
@ 2023-11-30 11:53 ` Michael J Gruber
  2023-11-30 13:31   ` David Bremner
  0 siblings, 1 reply; 9+ messages in thread
From: Michael J Gruber @ 2023-11-30 11:53 UTC (permalink / raw)
  To: io; +Cc: notmuch


[-- Attachment #1.1: Type: text/plain, Size: 728 bytes --]

Am Do., 30. Nov. 2023 um 12:37 Uhr schrieb io <io@ooeeeoo.com>:

>
>
>
> i have an html email with this sentence 'xycfe11cg64d_2501034012' within
> the body of the message.
> no result found when i search for '2501034012'
> i have even tried using '*2501034012*' (wildcard)
>
> Notmuch doesn't support regular expressions ("wildcards") when searching
the body of messages, see the man page  for `notmuch-search-terms`. It
computes stems of words and indexes (and searches) those.

Using xapian commands, one could extract all stems and grep those for a
term which one "remembers partially" (often happened to me), and then feed
that into notmuch. Might be worthwhile scripting or even integrating into
notmuch (sexp?).

Michael

[-- Attachment #1.2: Type: text/html, Size: 1154 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advanced search with wildcard using notmuch for mutt
  2023-11-30 11:53 ` Michael J Gruber
@ 2023-11-30 13:31   ` David Bremner
  2023-12-04  1:36     ` io
  0 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2023-11-30 13:31 UTC (permalink / raw)
  To: Michael J Gruber, io; +Cc: notmuch

Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:


> Using xapian commands, one could extract all stems and grep those for a
> term which one "remembers partially" (often happened to me), and then feed
> that into notmuch. Might be worthwhile scripting or even integrating into
> notmuch (sexp?).

The words are stored unstemmed as well, so in principle we could extract
those and do some kind of fuzzy search on them to construct
queries. This is what notmuch already does for regex searches on fields
other than from, subject, and mid. The reason this is not too attractive
for the message body is that it works on a per word basis, and most uses
of regex (although not the one under discussion) involve matching
multiple words. Currently Xapian only supports trailing wildcards (which
would not help here), but we could do that for body words. It just
doesn't (or hasn't) seemed like such a common use case. 

Alas, none of the forward looking discussion really helps the original
poster.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advanced search with wildcard using notmuch for mutt
  2023-11-30 13:31   ` David Bremner
@ 2023-12-04  1:36     ` io
  2023-12-04 11:39       ` David Bremner
  0 siblings, 1 reply; 9+ messages in thread
From: io @ 2023-12-04  1:36 UTC (permalink / raw)
  To: David Bremner; +Cc: Michael J Gruber, notmuch

what xapian 'indexing system' did was to index the entire sentence 'xxx_yyy' and you will not be able to find any sentence which contain the word 'yyy'?
xapian should have this simple wildcard feature which 'grep'(search) offer. ($grep '*word*' file). It is strange that xapian restrict the search to 'trailing wildcard' only.
Novice user who get introduce to notmuch just want to run the search and get the result straight away. 

-- 
Best regards,
io
On Thu, Nov 30, 2023 at 09:31:19AM -0400, David Bremner wrote:
> Michael J Gruber <michaeljgruber+grubix+git@gmail.com> writes:
> 
> 
> > Using xapian commands, one could extract all stems and grep those for a
> > term which one "remembers partially" (often happened to me), and then feed
> > that into notmuch. Might be worthwhile scripting or even integrating into
> > notmuch (sexp?).
> 
> The words are stored unstemmed as well, so in principle we could extract
> those and do some kind of fuzzy search on them to construct
> queries. This is what notmuch already does for regex searches on fields
> other than from, subject, and mid. The reason this is not too attractive
> for the message body is that it works on a per word basis, and most uses
> of regex (although not the one under discussion) involve matching
> multiple words. Currently Xapian only supports trailing wildcards (which
> would not help here), but we could do that for body words. It just
> doesn't (or hasn't) seemed like such a common use case. 
> 
> Alas, none of the forward looking discussion really helps the original
> poster.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advanced search with wildcard using notmuch for mutt
  2023-12-04  1:36     ` io
@ 2023-12-04 11:39       ` David Bremner
  2023-12-04 20:00         ` Olly Betts
  0 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2023-12-04 11:39 UTC (permalink / raw)
  To: io; +Cc: notmuch, xapian-discuss

io <io@ooeeeoo.com> writes:

> what xapian 'indexing system' did was to index the entire sentence
> 'xxx_yyy' and you will not be able to find any sentence which contain
> the word 'yyy'?

I'm curious that you refer to xxx_yyy as a sentence. In the contexts I
am familiar with, the point of _ is to join things together into one
word (or one identifier/token). Other than that your understanding seems
correct.

> xapian should have this simple wildcard feature which 'grep'(search)
> offer. ($grep '*word*' file). It is strange that xapian restrict the
> search to 'trailing wildcard' only.

I guess the restriction is based on what is easy to do efficiently with
the Xapian database (find prefixes).  If I remember correctly there was
some work in progress to support leading wildcards in Xapian. I can't
find relevant discussion now, but I CC'ed the Xapian list in case
someone remembers that.

> Novice user who get introduce to notmuch just want to run the search
> and get the result straight away.

Generally the focus of Xapian (and thus notmuch) is on words and
phrases like "bob ate my pizza". I agree this is disappointing for
someone who wants "all the flexibility of grep, but faster".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advanced search with wildcard using notmuch for mutt
  2023-12-04 11:39       ` David Bremner
@ 2023-12-04 20:00         ` Olly Betts
  2023-12-04 21:10           ` David Bremner
  0 siblings, 1 reply; 9+ messages in thread
From: Olly Betts @ 2023-12-04 20:00 UTC (permalink / raw)
  To: David Bremner; +Cc: io, notmuch, xapian-discuss

On Mon, Dec 04, 2023 at 06:39:43AM -0500, David Bremner wrote:
> I guess the restriction is based on what is easy to do efficiently with
> the Xapian database (find prefixes).  If I remember correctly there was
> some work in progress to support leading wildcards in Xapian. I can't
> find relevant discussion now, but I CC'ed the Xapian list in case
> someone remembers that.

The development version of Xapian supports both `*` and `?` glob-style
wildcards in any position.

You can enable them for the QueryParser using FLAG_WILDCARD_MULTI,
FLAG_WILDCARD_SINGLE or FLAG_WILDCARD_GLOB (the last one is just the
first two combined).

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advanced search with wildcard using notmuch for mutt
  2023-12-04 20:00         ` Olly Betts
@ 2023-12-04 21:10           ` David Bremner
  2023-12-06  7:07             ` io
  0 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2023-12-04 21:10 UTC (permalink / raw)
  To: Olly Betts; +Cc: io, notmuch, xapian-discuss

Olly Betts <olly@survex.com> writes:

> The development version of Xapian supports both `*` and `?` glob-style
> wildcards in any position.
>
> You can enable them for the QueryParser using FLAG_WILDCARD_MULTI,
> FLAG_WILDCARD_SINGLE or FLAG_WILDCARD_GLOB (the last one is just the
> first two combined).

I see FLAG_FUZZY as well, supporting edit distance. Sounds like release
1.5 will be a crowd pleaser ;)

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advanced search with wildcard using notmuch for mutt
  2023-12-04 21:10           ` David Bremner
@ 2023-12-06  7:07             ` io
  2023-12-07 18:55               ` Olly Betts
  0 siblings, 1 reply; 9+ messages in thread
From: io @ 2023-12-06  7:07 UTC (permalink / raw)
  To: David Bremner; +Cc: Olly Betts, notmuch, xapian-discuss

i like to know how do we use 'FLAG_FUZZY' if one need to use 'quest' to do the search query.


-- 
Best regards,
io
On Mon, Dec 04, 2023 at 05:10:57PM -0400, David Bremner wrote:
> Olly Betts <olly@survex.com> writes:
> 
> > The development version of Xapian supports both `*` and `?` glob-style
> > wildcards in any position.
> >
> > You can enable them for the QueryParser using FLAG_WILDCARD_MULTI,
> > FLAG_WILDCARD_SINGLE or FLAG_WILDCARD_GLOB (the last one is just the
> > first two combined).
> 
> I see FLAG_FUZZY as well, supporting edit distance. Sounds like release
> 1.5 will be a crowd pleaser ;)
> 
> d
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Advanced search with wildcard using notmuch for mutt
  2023-12-06  7:07             ` io
@ 2023-12-07 18:55               ` Olly Betts
  0 siblings, 0 replies; 9+ messages in thread
From: Olly Betts @ 2023-12-07 18:55 UTC (permalink / raw)
  To: io; +Cc: notmuch, xapian-discuss

On Wed, Dec 06, 2023 at 03:07:08PM +0800, io wrote:
> i like to know how do we use 'FLAG_FUZZY' if one need to use 'quest'
> to do the search query.

E.g. this will match a document indexed by term "fuzzy":

quest --flags=default,fuzzy --db=/path/to/db 'phuzzy~'

Default edit distance is 2, but you can also specify it explicitly, e.g.
phuzzi~3 sets it to 3 and phuzzi~0.5 sets it to half the length (so also
3 here).

As Bremner indicated, this is new in git master.

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-12-07 18:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-30  2:18 Advanced search with wildcard using notmuch for mutt io
2023-11-30 11:53 ` Michael J Gruber
2023-11-30 13:31   ` David Bremner
2023-12-04  1:36     ` io
2023-12-04 11:39       ` David Bremner
2023-12-04 20:00         ` Olly Betts
2023-12-04 21:10           ` David Bremner
2023-12-06  7:07             ` io
2023-12-07 18:55               ` Olly Betts

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).