From: Gregor Zattler <telegraph@gmx.net>
To: notmuch@notmuchmail.org
Subject: Re: how to search for hyphenated words? (was: how to search for Morse code?)
Date: Tue, 12 Mar 2019 08:34:19 +0100 [thread overview]
Message-ID: <87a7i0v950.fsf@len.workgroup> (raw)
In-Reply-To: <87wol4dhe7.fsf@tethera.net>
[-- Attachment #1: Type: text/plain, Size: 2928 bytes --]
Hi David, Matt, Carl, notmuch developers,
* David Bremner <david@tethera.net> [2019-03-11; 22:13]:
> Matt Armstrong <marmstrong@google.com> writes:
>> Carl Worth <cworth@cworth.org> writes:
>>> The trick here is that when notmuch is indexing body text it feeds it
>>> into a Xapian function that parses the text by finding "terms" in the
>>> text. And this parser considers both punctuation and whitespace as
>>> separators between terms.
>>
>> I notice that Xapian supports something called "phrase searches",
>> documented as:
>>
>> "A phrase surrounded with double quotes ("") matches documents
>> containing that exact phrase. Hyphenated words are also treated as
>> phrases, as are cases such as filenames and email addresses
>> (e.g. /etc/passwd or president@whitehouse.gov)."
>>
>> I assume that this particular Xapian feature is unavailable in notmuch?
>> If so, I wonder if enabling has ever been considered?
>
> It is enabled, and documented in notmuch-search-terms(7). Unfortunately
> I don't think it's related to the original request. The mention of
> hyphenated words is about the input to the query parser, not the
> (necessarily) the retrieved text.
what I do not understand is that it dosn't matter if I search for
org-notmuch
or
"org-notmuch"
'"org-notmuch"'
or even
org ADJ/1 notmuch
$ notmuch count --output=messages '"org-notmuch"'
581
$ notmuch count --output=messages 'org-notmuch'
581
$ notmuch count --output=messages org-notmuch
581
$ notmuch count --output=messages org ADJ/1 notmuch
581
a typical example of a matched message is the attached one.
Somehow the search matches the address of this very mailing list
in the body of the email (I assume).
But obviously there are much more emails with this address in
them:
$ notmuch count --output=messages 'notmuch@notmuchmail.org'
27396
$ notmuch count --output=messages '"notmuch@notmuchmail.org"'
27396
Or with a naive search (no decoding of possible base64 encoded
parts) there are
$ find /home/grfz/Mail/~ml/emacs-orgmode@gnu.org /home/grfz/Mail/~ml/notmuch@notmuchmail.org* -type f -print0 | xargs -0r grep -l -- 'notmuch@notmuchmail.org' | xargs -IXXXX sh -c "cat XXXX | sed -e '1,/^$/ d' | grep -c notmuch@notmuchmail.org " | egrep -c "1|2|3|4|5|6|7|8|9"
16795
emails with the address at least once in the body.
Therefore I wonder why notmuch matches 581 messages.
A naive search for org-notmuch on the files (no decoding of
possible base64 encoded parts) only shows 79 files (77 unique
emails):
mkdir -vp /tmp/test/{cur,new,tmp}
$ find /home/grfz/Mail/~ml/emacs-orgmode@gnu.org /home/grfz/Mail/~ml/notmuch@notmuchmail.org* -type f -print0 | xargs -0r grep -l -- 'org-notmuch' | xargs ln -vs --target-directory=/tmp/kolp/cur/ | wc -l
79
Therefore I wonder why notmuch matches 581 messages, not 16795
messages or 77 messages.
Somehow these numbers do not fit!?
Ciao; Gregor
--
-... --- .-. . -.. ..--.. ...-.-
[-- Attachment #2: 1514563210.28210_1.len:2,S --]
[-- Type: message/rfc822, Size: 1530 bytes --]
From: root@len.workgroup (Cron Daemon)
To: root@localhost
Subject: Cron <grfz@len> ~/bin/mailwiederdurchschleusen
Date: Fri, 29 Dec 2017 17:00:09 +0100
Message-ID: <E1eUx4r-0007Kx-5w@len.workgroup>
Date: Thu, 28 Dec 2017 21:04:52 -0500
From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Re: Gnus and emails sent by me
----------------------------------------------------------
Date: Thu, 28 Dec 2017 22:00:56 -0400
From: David Bremner <david@tethera.net>
To: David Edmondson <dme@dme.org>, notmuch@notmuchmail.org
Subject: Re: Xapian exception leading to database corruption
----------------------------------------------------------
next prev parent reply other threads:[~2019-03-12 7:34 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-23 12:20 how to search for Morse code? Gregor Zattler
2018-07-23 14:16 ` Ben Oliver
2018-07-24 10:34 ` Ben Oliver
2019-03-08 19:25 ` how to search for hyphenated words? (was: how to search for Morse code?) Gregor Zattler
2019-03-09 0:03 ` Carl Worth
2019-03-12 0:05 ` Matt Armstrong
2019-03-12 1:13 ` David Bremner
2019-03-12 7:34 ` Gregor Zattler [this message]
2019-03-12 10:41 ` David Bremner
2019-03-12 17:29 ` Gregor Zattler
2019-03-12 17:12 ` Carl Worth
2019-03-13 18:23 ` Matt Armstrong
2019-03-10 23:22 ` David Bremner
2019-03-11 8:15 ` Gregor Zattler
[not found] ` <877ed5bzr3.fsf@len.workgroup>
2019-03-11 9:46 ` David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a7i0v950.fsf@len.workgroup \
--to=telegraph@gmx.net \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).