unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Xu Wang <xuwang762@gmail.com>
To: Carl Worth <cworth@cworth.org>
Cc: notmuch@notmuchmail.org
Subject: Re: correct way to search for only PDF attachments
Date: Tue, 29 Sep 2015 00:51:01 -0400	[thread overview]
Message-ID: <CAJhTkNg0_j3R8zdpywmZkreFU2p+Wky8oxC7vvuQYzNK2U=-1Q@mail.gmail.com> (raw)
In-Reply-To: <87vbau9e8i.fsf@wondoo.home.cworth.org>

On Mon, Sep 28, 2015 at 10:00 PM, Carl Worth <cworth@cworth.org> wrote:
> On Mon, Sep 28 2015, Xu Wang wrote:
>> I would look to look for all emails from a colleague jongho. I tried:
>>
>> from:jongho attachment:pdf
>>
>> which seems to do as I wanted.
>
> Good. That should work.
>
>> To understand more, what does the following search for?
>>
>> from:jongho attachment:.*pdf
>
> Uhm, probably only strange things. There are some mechanisms for getting
> notmuch to emit some debugging information on what the final search
> terms end up being, (but I don't recall if they still require
> recompilation or not).
>
> I'm not testing now, but I wouldn't be surprised if that ended up doing
> something like searching for a phrase like "attachment pdf" anywhere
> within a message. (The Xapian parser can be somewhat unpredictable when
> you give it unexpected input.)
>
>> Also, how does the first one above know that I want only PDF
>> attachments and not an attachment called "pdformula.txt" ?
>
> It doesn't know that you want only PDF attachments. The key part is that
> the indexing is performed by breaking text up into individual terms, (at
> punctuation boundaries usually). So a search specification like
> "attachment:pdf" is searching for things that were indexed with the
> "pdf" term within the attachment prefix. So that won't match a filename
> like pdformula.txt, (which would be indexed as two terms, "pdformula"
> and "txt"), but it would match pdf.ormula.txt, (which would be indexed
> as three terms, "pdf", "ormula" and "txt").
>
> The Xapian documentation can be examined if you want more details.

This is highly useful. Thank for such an explanation!! Thank you, Carl.

Kind regards,

Xu

  reply	other threads:[~2015-09-29  4:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-29  0:55 correct way to search for only PDF attachments Xu Wang
2015-09-29  2:00 ` Carl Worth
2015-09-29  4:51   ` Xu Wang [this message]
2015-09-29  7:15   ` Suvayu Ali
2015-09-29 11:00   ` David Bremner
2015-09-29 11:56     ` Suvayu Ali
2015-09-29 13:48       ` David Bremner
2015-09-30 15:16         ` Xu Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJhTkNg0_j3R8zdpywmZkreFU2p+Wky8oxC7vvuQYzNK2U=-1Q@mail.gmail.com' \
    --to=xuwang762@gmail.com \
    --cc=cworth@cworth.org \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).