unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Frederick Eaton <frederik@ofb.net>
To: Pengji Zhang <me@pengjiz.com>
Cc: notmuch@notmuchmail.org
Subject: Re: searching for a message by path
Date: Fri, 20 Sep 2024 20:23:40 -0700	[thread overview]
Message-ID: <20240921032340.opozeclfbyqzw2yt@localhost> (raw)
In-Reply-To: <87zfo1dfa1.fsf@pengjiz.com>

Thank you for your response, Pengji.

On Sat, Sep 21, 2024 at 08:25:10AM +0800, Pengji Zhang wrote:
>Hi Frederick,
>
>Frederick Eaton <frederik@ofb.net> writes:
>
>>I am trying to figure out how to adapt a script I wrote for 
>>filtering messages, to apply notmuch tags to each message. A 
>>difficulty is that the messages are already in the Notmuch database, 
>>because another tool has delivered them to a maildir and run 
>>"notmuch new".
>>
>>Now, Notmuch can provide me with the paths of all the new 
>>(unfiltered) messages, which I can give to my script. The question I 
>>have is, once the filter is done, how can the script tell Notmuch 
>>which message to apply the tags to?
>
>
>I am not sure if I understand you correctly. If the problem here is to 
>distinguish existing messages and new messages, would the config 
>option 'new.tags' work? For example, use
>
>   notmuch config set new.tags new
>
>to give all new messages a 'new' tag.

No, I already have that configuration. The first sentence described what I already know how to do, the second sentence is what I'm trying to do.

Suppose the filter script reads a message from a particular file and decides that it is spam. How does the filter tell Notmuch that the message corresponding to that file is spam? You seem to be saying below that the filter script should extract the Message-ID and use it to identify the message to Notmuch, since file paths of the messages are not indexed. Probably what my script should be doing for each message is appending a line to a batch file like this:

     +spam -new -- id:some_message_id@foo
     +inbox -new -- id:some_other@baz

and then passing the batch file to "notmuch tag"?

>>I've tentatively concluded that the best way to locate each message 
>>in the Notmuch database is to extract the Message-ID and search for 
>>it with "id:"? But the FAQ says that multiple messages can have the 
>>same Message-ID (and some spam messages don't have one at all).
>
>IIRC, in the Notmuch database tags are associated with message IDs, so 
>you probably do not need to worry about this.

This time, I'm not sure I understand.

>>If I could access the message using the filename that the script is 
>>processing, it would seem slightly more reliable. It seems like 
>>there should be some way to allow a Notmuch database entry to be 
>>accessed directly by filename, without even creating a Notmuch-style 
>>search query containing that filename, but rather by passing the 
>>filename as a command-line argument to "notmuch". It would be nice 
>>not to have to worry about quoting and unquoting.
>
>I am not sure if this is useful, given that (presumably) Notmuch uses 
>message IDs as keys. Besides, those filenames are usually generated 
>automatically and quite cryptic.

It might be useful for the reasons I stated, namely in case the Message-ID does not exist or is not unique.

>>When I try to search for a message using "path:", nothing seems to 
>>work.
>>
>>[...]
>>
>>There were no results for any of the "path:" searches, although the 
>>"id:" search worked. I am using version 0.32.2 and can update if 
>>this may be related to a bug that was fixed in the past few years.
>
>I have never used 0.32.2 so I am not sure if there are any 
>differences, but for version 0.38.3, the prefix "path:" is used to 
>search for messages in some *directory*, and the query should be 
>*relative* to the maildir.
>
>I highly recommend the manual page 'notmuch-search-terms(7)' and also 
>other pages if you have time. They are informative and well written, 
>and very helpful for writing message processing scripts.

Thank you for interpreting that section for me. The manual pages may be informative and well written, but if my opinion matters, then I think that they could be made slightly clearer than they are. For example, explaining directly to the user that there is no index of path names would help clarify what can be done with the software. Also, a short example of using Notmuch in a filter script would be useful in one of the manual pages, particularly illustrating the case where the programmer wants to re-tag a message that is provided as a file or on stdin.

My copy of the notmuch-search-terms manual page says:

        path:<directory-path> or path:<directory-path>/** or path:/<regex>/
               The path: prefix searches for email messages that are in partic-
               ular directories within the mail store. The  directory  must  be
               specified  relative  to  the  top-level maildir (and without the
               leading slash). ...

I see now that this text is only suggesting that Notmuch supports searches for directory names, but on first read it wasn't really clear to me whether "directory-path" means a "path to a directory" or a "file path consisting of directories followed by a filename", particularly as there is no obvious reason for Notmuch not to index filenames. I think "path:<directory>" would be clearer, and saying "The path: prefix matches email messages that are stored in a specified directory on the filesystem, which must be specified relative to the top-level maildir, and here is how to find out what the 'top-level maildir' is when you have for example $HOME/mail/notmuch/ configured as your database path in ~/.notmuch-config ...". Even clearer would be to explain why the "path:" search prefix only accepts directories, point out that it should be called "dir:" instead of "path:", and warn the user that the search will be inefficient because there is no index of filenames.

Thank you,

Frederick

  reply	other threads:[~2024-09-21  3:23 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-20 17:52 searching for a message by path Frederick Eaton
2024-09-21  0:25 ` Pengji Zhang
2024-09-21  3:23   ` Frederick Eaton [this message]
2024-09-21  9:01     ` Pengji Zhang
2024-09-21  9:38     ` Michael J Gruber
2024-09-21 10:44     ` Gregor Zattler
2024-09-21 16:24     ` Panayotis Manganaris
2024-09-21 17:30       ` Teemu Likonen
2024-09-23 22:14         ` Panayotis Manganaris
2024-09-24 13:00           ` David Bremner
2024-09-24  9:09       ` Michael J Gruber
2024-09-28  2:56         ` Frederick Eaton
2024-09-29 12:08           ` David Bremner
2024-10-12 22:59             ` David Bremner
2024-10-14  6:50               ` Michael J Gruber
2024-10-14 10:58                 ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240921032340.opozeclfbyqzw2yt@localhost \
    --to=frederik@ofb.net \
    --cc=me@pengjiz.com \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).