* how to search for Morse code? @ 2018-07-23 12:20 Gregor Zattler 2018-07-23 14:16 ` Ben Oliver 2019-03-08 19:25 ` how to search for hyphenated words? (was: how to search for Morse code?) Gregor Zattler 0 siblings, 2 replies; 15+ messages in thread From: Gregor Zattler @ 2018-07-23 12:20 UTC (permalink / raw) To: notmuch Hello, today I searched for emails containing -... --- .-. . -.. ..--.. ...-.- tried with notmuch search "-... --- .-. . -.. ..--.. ...-.-" and notmuch search '-... --- .-. . -.. ..--.. ...-.-' and even notmuch search '"-... --- .-. . -.. ..--.. ...-.-"' and also with double dashes in front of the search term: notmuch search -- "-... --- .-. . -.. ..--.. ...-.-" All these searches produce notmuch search: A Xapian exception occurred A Xapian exception occurred parsing query: Unknown range operation Query string was: "-... --- .-. . -.. ..--.. ...-.-" Is it possible to search for emails containing my supposedly funny signature? Obviously this is not much of a problem for me, but perhaps I hit a hidden bug? Ciao; Gregor -- -... --- .-. . -.. ..--.. ...-.- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for Morse code? 2018-07-23 12:20 how to search for Morse code? Gregor Zattler @ 2018-07-23 14:16 ` Ben Oliver 2018-07-24 10:34 ` Ben Oliver 2019-03-08 19:25 ` how to search for hyphenated words? (was: how to search for Morse code?) Gregor Zattler 1 sibling, 1 reply; 15+ messages in thread From: Ben Oliver @ 2018-07-23 14:16 UTC (permalink / raw) To: notmuch [-- Attachment #1: Type: text/plain, Size: 294 bytes --] On 18-07-23 14:20:41, Gregor Zattler wrote: >Hello, > >today I searched for emails containing > >-... --- .-. . -.. ..--.. ...-.- > Heh I suppose the problem is that xapian won't take two periods ".." even in quotes. I asked on their IRC about how to escape it but it's quiet [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for Morse code? 2018-07-23 14:16 ` Ben Oliver @ 2018-07-24 10:34 ` Ben Oliver 0 siblings, 0 replies; 15+ messages in thread From: Ben Oliver @ 2018-07-24 10:34 UTC (permalink / raw) To: notmuch [-- Attachment #1: Type: text/plain, Size: 431 bytes --] On 18-07-23 15:16:07, Ben Oliver wrote: >On 18-07-23 14:20:41, Gregor Zattler wrote: >>Hello, >> >>today I searched for emails containing >> >>-... --- .-. . -.. ..--.. ...-.- >> > >Heh > >I suppose the problem is that xapian won't take two periods ".." even >in quotes. > >I asked on their IRC about how to escape it but it's quiet So it seems like morse code would not be indexed, which makes sense. Sorry! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* how to search for hyphenated words? (was: how to search for Morse code?) 2018-07-23 12:20 how to search for Morse code? Gregor Zattler 2018-07-23 14:16 ` Ben Oliver @ 2019-03-08 19:25 ` Gregor Zattler 2019-03-09 0:03 ` Carl Worth 2019-03-10 23:22 ` David Bremner 1 sibling, 2 replies; 15+ messages in thread From: Gregor Zattler @ 2019-03-08 19:25 UTC (permalink / raw) To: notmuch Hello, * Gregor Zattler <telegraph@gmx.net> [2018-07-23; 14:20]: > today I searched for emails containing > > -... --- .-. . -.. ..--.. ...-.- today I searched for emails containing "org-notmuch" (which supports org links to notmuch searches), e.g. with notmuch search org-notmuch notmuch search -- org-notmuch notmuch search -- "org-notmuch" notmuch search -- '"org-notmuch"' notmuch search -- '+"org-notmuch"' notmuch search -- org ADJ/1 notmuch all these resulted in very many hits most or all of which do not contain the string "org-notmuch", one found email was e.g. id:20180904105723.15564-3-david@tethera.net How would one search for hyphenated words with notmuch? Ciao; Gregor -- -... --- .-. . -.. ..--.. ...-.- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for hyphenated words? (was: how to search for Morse code?) 2019-03-08 19:25 ` how to search for hyphenated words? (was: how to search for Morse code?) Gregor Zattler @ 2019-03-09 0:03 ` Carl Worth 2019-03-12 0:05 ` Matt Armstrong 2019-03-10 23:22 ` David Bremner 1 sibling, 1 reply; 15+ messages in thread From: Carl Worth @ 2019-03-09 0:03 UTC (permalink / raw) To: Gregor Zattler, notmuch [-- Attachment #1: Type: text/plain, Size: 1428 bytes --] Hi Gregor, The trick here is that when notmuch is indexing body text it feeds it into a Xapian function that parses the text by finding "terms" in the text. And this parser considers both punctuation and whitespace as separators between terms. So your messages are not being indexed in a way to let you distinguish between "org notmuch" and "org-notmuch". (Of note, the query parser applies the same parsing to your query---so that even when you think you're typing an exact phrase like "org-notmuch" that gets parsed into separate terms "org" and "notmuch" for searching.) > all these resulted in very many hits most or all of which do not > contain the string "org-notmuch", one found email was e.g. > > id:20180904105723.15564-3-david@tethera.net That message does contain the following: +test_emacs '(notmuch-tree "id:000-real-root@example.org") + (notmuch-test-wait) Where you will notice that there's a term "org" followed (after some punctuation and whitespace separators) by a term "notmuch". > How would one search for hyphenated words with notmuch? You would need to arrange to have the indexer consider the hyphen as a letter-like character to be made part of terms. Or be extra clever and index something like "notmuch-test-wait" in multiple ways (such as a single term "notmuch-test-wait" as well as three adjacent terms "notmuch", "test", and "wait" as notmuch is doing currently). -Carl [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for hyphenated words? (was: how to search for Morse code?) 2019-03-09 0:03 ` Carl Worth @ 2019-03-12 0:05 ` Matt Armstrong 2019-03-12 1:13 ` David Bremner 0 siblings, 1 reply; 15+ messages in thread From: Matt Armstrong @ 2019-03-12 0:05 UTC (permalink / raw) To: Carl Worth, Gregor Zattler, notmuch Carl Worth <cworth@cworth.org> writes: > Hi Gregor, > > The trick here is that when notmuch is indexing body text it feeds it > into a Xapian function that parses the text by finding "terms" in the > text. And this parser considers both punctuation and whitespace as > separators between terms. I notice that Xapian supports something called "phrase searches", documented as: "A phrase surrounded with double quotes ("") matches documents containing that exact phrase. Hyphenated words are also treated as phrases, as are cases such as filenames and email addresses (e.g. /etc/passwd or president@whitehouse.gov)." I assume that this particular Xapian feature is unavailable in notmuch? If so, I wonder if enabling has ever been considered? Being able to "drop down" to do things like exact phrase matches is one reason why I use notmuch, because the precision sometimes matters. I currently do this by fetching the mail message itself and using old-school mail processing tools on the message file. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for hyphenated words? (was: how to search for Morse code?) 2019-03-12 0:05 ` Matt Armstrong @ 2019-03-12 1:13 ` David Bremner 2019-03-12 7:34 ` Gregor Zattler 2019-03-13 18:23 ` Matt Armstrong 0 siblings, 2 replies; 15+ messages in thread From: David Bremner @ 2019-03-12 1:13 UTC (permalink / raw) To: Matt Armstrong, Carl Worth, Gregor Zattler, notmuch Matt Armstrong <marmstrong@google.com> writes: > Carl Worth <cworth@cworth.org> writes: > >> Hi Gregor, >> >> The trick here is that when notmuch is indexing body text it feeds it >> into a Xapian function that parses the text by finding "terms" in the >> text. And this parser considers both punctuation and whitespace as >> separators between terms. > > I notice that Xapian supports something called "phrase searches", > documented as: > > "A phrase surrounded with double quotes ("") matches documents > containing that exact phrase. Hyphenated words are also treated as > phrases, as are cases such as filenames and email addresses > (e.g. /etc/passwd or president@whitehouse.gov)." > > I assume that this particular Xapian feature is unavailable in notmuch? > If so, I wonder if enabling has ever been considered? It is enabled, and documented in notmuch-search-terms(7). Unfortunately I don't think it's related to the original request. The mention of hyphenated words is about the input to the query parser, not the (necessarily) the retrieved text. d ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for hyphenated words? (was: how to search for Morse code?) 2019-03-12 1:13 ` David Bremner @ 2019-03-12 7:34 ` Gregor Zattler 2019-03-12 10:41 ` David Bremner 2019-03-12 17:12 ` Carl Worth 2019-03-13 18:23 ` Matt Armstrong 1 sibling, 2 replies; 15+ messages in thread From: Gregor Zattler @ 2019-03-12 7:34 UTC (permalink / raw) To: notmuch [-- Attachment #1: Type: text/plain, Size: 2928 bytes --] Hi David, Matt, Carl, notmuch developers, * David Bremner <david@tethera.net> [2019-03-11; 22:13]: > Matt Armstrong <marmstrong@google.com> writes: >> Carl Worth <cworth@cworth.org> writes: >>> The trick here is that when notmuch is indexing body text it feeds it >>> into a Xapian function that parses the text by finding "terms" in the >>> text. And this parser considers both punctuation and whitespace as >>> separators between terms. >> >> I notice that Xapian supports something called "phrase searches", >> documented as: >> >> "A phrase surrounded with double quotes ("") matches documents >> containing that exact phrase. Hyphenated words are also treated as >> phrases, as are cases such as filenames and email addresses >> (e.g. /etc/passwd or president@whitehouse.gov)." >> >> I assume that this particular Xapian feature is unavailable in notmuch? >> If so, I wonder if enabling has ever been considered? > > It is enabled, and documented in notmuch-search-terms(7). Unfortunately > I don't think it's related to the original request. The mention of > hyphenated words is about the input to the query parser, not the > (necessarily) the retrieved text. what I do not understand is that it dosn't matter if I search for org-notmuch or "org-notmuch" '"org-notmuch"' or even org ADJ/1 notmuch $ notmuch count --output=messages '"org-notmuch"' 581 $ notmuch count --output=messages 'org-notmuch' 581 $ notmuch count --output=messages org-notmuch 581 $ notmuch count --output=messages org ADJ/1 notmuch 581 a typical example of a matched message is the attached one. Somehow the search matches the address of this very mailing list in the body of the email (I assume). But obviously there are much more emails with this address in them: $ notmuch count --output=messages 'notmuch@notmuchmail.org' 27396 $ notmuch count --output=messages '"notmuch@notmuchmail.org"' 27396 Or with a naive search (no decoding of possible base64 encoded parts) there are $ find /home/grfz/Mail/~ml/emacs-orgmode@gnu.org /home/grfz/Mail/~ml/notmuch@notmuchmail.org* -type f -print0 | xargs -0r grep -l -- 'notmuch@notmuchmail.org' | xargs -IXXXX sh -c "cat XXXX | sed -e '1,/^$/ d' | grep -c notmuch@notmuchmail.org " | egrep -c "1|2|3|4|5|6|7|8|9" 16795 emails with the address at least once in the body. Therefore I wonder why notmuch matches 581 messages. A naive search for org-notmuch on the files (no decoding of possible base64 encoded parts) only shows 79 files (77 unique emails): mkdir -vp /tmp/test/{cur,new,tmp} $ find /home/grfz/Mail/~ml/emacs-orgmode@gnu.org /home/grfz/Mail/~ml/notmuch@notmuchmail.org* -type f -print0 | xargs -0r grep -l -- 'org-notmuch' | xargs ln -vs --target-directory=/tmp/kolp/cur/ | wc -l 79 Therefore I wonder why notmuch matches 581 messages, not 16795 messages or 77 messages. Somehow these numbers do not fit!? Ciao; Gregor -- -... --- .-. . -.. ..--.. ...-.- [-- Attachment #2: 1514563210.28210_1.len:2,S --] [-- Type: message/rfc822, Size: 1530 bytes --] From: root@len.workgroup (Cron Daemon) To: root@localhost Subject: Cron <grfz@len> ~/bin/mailwiederdurchschleusen Date: Fri, 29 Dec 2017 17:00:09 +0100 Message-ID: <E1eUx4r-0007Kx-5w@len.workgroup> Date: Thu, 28 Dec 2017 21:04:52 -0500 From: Maxim Cournoyer <maxim.cournoyer@gmail.com> To: help-gnu-emacs@gnu.org Subject: Re: Gnus and emails sent by me ---------------------------------------------------------- Date: Thu, 28 Dec 2017 22:00:56 -0400 From: David Bremner <david@tethera.net> To: David Edmondson <dme@dme.org>, notmuch@notmuchmail.org Subject: Re: Xapian exception leading to database corruption ---------------------------------------------------------- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for hyphenated words? (was: how to search for Morse code?) 2019-03-12 7:34 ` Gregor Zattler @ 2019-03-12 10:41 ` David Bremner 2019-03-12 17:29 ` Gregor Zattler 2019-03-12 17:12 ` Carl Worth 1 sibling, 1 reply; 15+ messages in thread From: David Bremner @ 2019-03-12 10:41 UTC (permalink / raw) To: Gregor Zattler, notmuch Gregor Zattler <telegraph@gmx.net> writes: > From: root@len.workgroup (Cron Daemon) > Subject: Cron <grfz@len> ~/bin/mailwiederdurchschleusen > To: root@localhost > Date: Fri, 29 Dec 2017 17:00:09 +0100 > > Date: Thu, 28 Dec 2017 21:04:52 -0500 > From: Maxim Cournoyer <maxim.cournoyer@gmail.com> > To: help-gnu-emacs@gnu.org > Subject: Re: Gnus and emails sent by me > ---------------------------------------------------------- > Date: Thu, 28 Dec 2017 22:00:56 -0400 > From: David Bremner <david@tethera.net> > To: David Edmondson <dme@dme.org>, notmuch@notmuchmail.org > Subject: Re: Xapian exception leading to database corruption > ---------------------------------------------------------- The line To: David Edmondson <dme@dme.org>, notmuch@notmuchmail.org contains the phrase "org notmuch". You can see this easier by stripping all the punctuation. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for hyphenated words? (was: how to search for Morse code?) 2019-03-12 10:41 ` David Bremner @ 2019-03-12 17:29 ` Gregor Zattler 0 siblings, 0 replies; 15+ messages in thread From: Gregor Zattler @ 2019-03-12 17:29 UTC (permalink / raw) To: notmuch Hi David, * David Bremner <david@tethera.net> [2019-03-12; 07:41]: > Gregor Zattler <telegraph@gmx.net> writes: > > >> From: root@len.workgroup (Cron Daemon) >> Subject: Cron <grfz@len> ~/bin/mailwiederdurchschleusen >> To: root@localhost >> Date: Fri, 29 Dec 2017 17:00:09 +0100 >> >> Date: Thu, 28 Dec 2017 21:04:52 -0500 >> From: Maxim Cournoyer <maxim.cournoyer@gmail.com> >> To: help-gnu-emacs@gnu.org >> Subject: Re: Gnus and emails sent by me >> ---------------------------------------------------------- >> Date: Thu, 28 Dec 2017 22:00:56 -0400 >> From: David Bremner <david@tethera.net> >> To: David Edmondson <dme@dme.org>, notmuch@notmuchmail.org >> Subject: Re: Xapian exception leading to database corruption >> ---------------------------------------------------------- > > The line > > To: David Edmondson <dme@dme.org>, notmuch@notmuchmail.org > > contains the phrase "org notmuch". You can see this easier by stripping > all the punctuation. Thanks, now I see (the light :-) Ciao; Gregor -- -... --- .-. . -.. ..--.. ...-.- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for hyphenated words? (was: how to search for Morse code?) 2019-03-12 7:34 ` Gregor Zattler 2019-03-12 10:41 ` David Bremner @ 2019-03-12 17:12 ` Carl Worth 1 sibling, 0 replies; 15+ messages in thread From: Carl Worth @ 2019-03-12 17:12 UTC (permalink / raw) To: Gregor Zattler, notmuch [-- Attachment #1: Type: text/plain, Size: 1030 bytes --] On Tue, Mar 12 2019, Gregor Zattler wrote: > what I do not understand is that it dosn't matter if I search for > > org-notmuch > > or > > "org-notmuch" > > '"org-notmuch"' > > or even > > org ADJ/1 notmuch Correct. All four of those forms are giving you phrase searches, (so a term "org" followed immediately by a term "notmuch"). > a typical example of a matched message is the attached one. > Somehow the search matches the address of this very mailing list > in the body of the email (I assume). No, I don't think you are seeing a match on the mailing-list address itself, (which has "notmuch" two terms before "org"). > Therefore I wonder why notmuch matches 581 messages, not 16795 > messages or 77 messages. David showed you one example from the message you copied: > To: David Edmondson <dme@dme.org>, notmuch@notmuchmail.org And I showed one earlier in the thread. In each case, the message includes "org" followed (after some amount of punctuation and whitespace, perhaps including newlines) by "notmuch". -Carl [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for hyphenated words? (was: how to search for Morse code?) 2019-03-12 1:13 ` David Bremner 2019-03-12 7:34 ` Gregor Zattler @ 2019-03-13 18:23 ` Matt Armstrong 1 sibling, 0 replies; 15+ messages in thread From: Matt Armstrong @ 2019-03-13 18:23 UTC (permalink / raw) To: David Bremner, Carl Worth, Gregor Zattler, notmuch David Bremner <david@tethera.net> writes: > Matt Armstrong <marmstrong@google.com> writes: > >> Carl Worth <cworth@cworth.org> writes: >> >>> Hi Gregor, >>> >>> The trick here is that when notmuch is indexing body text it feeds it >>> into a Xapian function that parses the text by finding "terms" in the >>> text. And this parser considers both punctuation and whitespace as >>> separators between terms. >> >> I notice that Xapian supports something called "phrase searches", >> documented as: >> >> "A phrase surrounded with double quotes ("") matches documents >> containing that exact phrase. Hyphenated words are also treated as >> phrases, as are cases such as filenames and email addresses >> (e.g. /etc/passwd or president@whitehouse.gov)." >> >> I assume that this particular Xapian feature is unavailable in notmuch? >> If so, I wonder if enabling has ever been considered? > > It is enabled, and documented in notmuch-search-terms(7). Unfortunately > I don't think it's related to the original request. The mention of > hyphenated words is about the input to the query parser, not the > (necessarily) the retrieved text. Ah, so it boils down to the Xapian definition of "exact phrase." Notably, "exact phrase" is not "identical sequence of characters" as some people might expect. Quick tests with various search engines reveal their phrase search as operating the same way. E.g. searching for "org notmuch" finds all sorts of results: org-notmuch.el notmuchmail.org/notmuch-emacs/ to:devicetree@vger.kernel.org notmuch tag +inbox +unread -new (require 'org-notmuch nil t) https://notmuchmail.org/notmuch-emacs/. * imaps://mail.example.org/Notmuch/search For what it is worth, one thing I've taken to doing is using period separators in the notmuch phrase searches I use in scripts and even interactively. Using periods is generally immune to confusing issues related to quoting double quoted things, and always remains a single shell "word." They are also, most often, clearly not the exact content I'm searching for, so they make it clear than the match algorithm is inexact. E.g. subject:notmuch.is.wonderful instead of: subject:"notmuch is wonderful" ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for hyphenated words? (was: how to search for Morse code?) 2019-03-08 19:25 ` how to search for hyphenated words? (was: how to search for Morse code?) Gregor Zattler 2019-03-09 0:03 ` Carl Worth @ 2019-03-10 23:22 ` David Bremner 2019-03-11 8:15 ` Gregor Zattler [not found] ` <877ed5bzr3.fsf@len.workgroup> 1 sibling, 2 replies; 15+ messages in thread From: David Bremner @ 2019-03-10 23:22 UTC (permalink / raw) To: Gregor Zattler, notmuch Gregor Zattler <telegraph@gmx.net> writes: > > How would one search for hyphenated words with notmuch? > In special cases, explained in notmuch-search-terms(7), one can use regexp searches, which are slower, but don't drop punctuation. d ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how to search for hyphenated words? (was: how to search for Morse code?) 2019-03-10 23:22 ` David Bremner @ 2019-03-11 8:15 ` Gregor Zattler [not found] ` <877ed5bzr3.fsf@len.workgroup> 1 sibling, 0 replies; 15+ messages in thread From: Gregor Zattler @ 2019-03-11 8:15 UTC (permalink / raw) To: notmuch Hi David, notmuch developers, * David Bremner <david@tethera.net> [2019-03-10; 20:22]: > Gregor Zattler <telegraph@gmx.net> writes: >> How would one search for hyphenated words with notmuch? >> > > In special cases, explained in notmuch-search-terms(7), one can use > regexp searches, which are slower, but don't drop punctuation. thanks, this works for the subject: field, which helps a lot. Regexes do not work on the body of messages and I assume they will not work with the upcoming "body:" field? Thanks for your attention, Gregor ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <877ed5bzr3.fsf@len.workgroup>]
* Re: how to search for hyphenated words? (was: how to search for Morse code?) [not found] ` <877ed5bzr3.fsf@len.workgroup> @ 2019-03-11 9:46 ` David Bremner 0 siblings, 0 replies; 15+ messages in thread From: David Bremner @ 2019-03-11 9:46 UTC (permalink / raw) To: Gregor Zattler; +Cc: notmuch Gregor Zattler <telegraph@gmx.net> writes: > Hi David, notmuch developers, > * David Bremner <david@tethera.net> [2019-03-10; 20:22]: >> Gregor Zattler <telegraph@gmx.net> writes: >>> How would one search for hyphenated words with notmuch? >>> >> >> In special cases, explained in notmuch-search-terms(7), one can use >> regexp searches, which are slower, but don't drop punctuation. > > thanks, this works for the subject: field, which helps a lot. > > Regexes do not work on the body of messages and I assume they > will not work with the upcoming "body:" field? That's correct. d ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-03-13 18:23 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-23 12:20 how to search for Morse code? Gregor Zattler 2018-07-23 14:16 ` Ben Oliver 2018-07-24 10:34 ` Ben Oliver 2019-03-08 19:25 ` how to search for hyphenated words? (was: how to search for Morse code?) Gregor Zattler 2019-03-09 0:03 ` Carl Worth 2019-03-12 0:05 ` Matt Armstrong 2019-03-12 1:13 ` David Bremner 2019-03-12 7:34 ` Gregor Zattler 2019-03-12 10:41 ` David Bremner 2019-03-12 17:29 ` Gregor Zattler 2019-03-12 17:12 ` Carl Worth 2019-03-13 18:23 ` Matt Armstrong 2019-03-10 23:22 ` David Bremner 2019-03-11 8:15 ` Gregor Zattler [not found] ` <877ed5bzr3.fsf@len.workgroup> 2019-03-11 9:46 ` David Bremner
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).