From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id F37E46DE0298 for ; Mon, 28 Sep 2015 19:00:15 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -9 X-Spam-Level: X-Spam-Status: No, score=-9 tagged_above=-999 required=5 tests=[AM.WBL=-8, ALL_TRUSTED=-1, AWL=0.000] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xzXl6xNmLQVl; Mon, 28 Sep 2015 19:00:14 -0700 (PDT) Received: from wondoo.home.cworth.org (unknown [10.0.0.1]) (Authenticated sender: cworth) by arlo.cworth.org (Postfix) with ESMTPSA id 367596DE01F5; Mon, 28 Sep 2015 19:00:14 -0700 (PDT) Received: from wondoo.home.cworth.org (localhost [IPv6:::1]) by wondoo.home.cworth.org (Postfix) with ESMTPS id 1DC3114C43AC; Mon, 28 Sep 2015 19:00:14 -0700 (PDT) From: Carl Worth To: Xu Wang , notmuch@notmuchmail.org Subject: Re: correct way to search for only PDF attachments In-Reply-To: References: User-Agent: Notmuch/0.20.2 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu) Date: Mon, 28 Sep 2015 19:00:13 -0700 Message-ID: <87vbau9e8i.fsf@wondoo.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Sep 2015 02:00:16 -0000 --=-=-= Content-Type: text/plain On Mon, Sep 28 2015, Xu Wang wrote: > I would look to look for all emails from a colleague jongho. I tried: > > from:jongho attachment:pdf > > which seems to do as I wanted. Good. That should work. > To understand more, what does the following search for? > > from:jongho attachment:.*pdf Uhm, probably only strange things. There are some mechanisms for getting notmuch to emit some debugging information on what the final search terms end up being, (but I don't recall if they still require recompilation or not). I'm not testing now, but I wouldn't be surprised if that ended up doing something like searching for a phrase like "attachment pdf" anywhere within a message. (The Xapian parser can be somewhat unpredictable when you give it unexpected input.) > Also, how does the first one above know that I want only PDF > attachments and not an attachment called "pdformula.txt" ? It doesn't know that you want only PDF attachments. The key part is that the indexing is performed by breaking text up into individual terms, (at punctuation boundaries usually). So a search specification like "attachment:pdf" is searching for things that were indexed with the "pdf" term within the attachment prefix. So that won't match a filename like pdformula.txt, (which would be indexed as two terms, "pdformula" and "txt"), but it would match pdf.ormula.txt, (which would be indexed as three terms, "pdf", "ormula" and "txt"). The Xapian documentation can be examined if you want more details. -Carl --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCgAGBQJWCfCtAAoJEGACM7qeVNxhS7IP/28s0fs91BSkfOw8+0xMKP2q JSv4Ze/5bfe+52U4GwKOX53fRVCDAmGz4lIA88GciM0185p0j4jjG6K6u+WfTr9r cGMAWGGWFZM7UFjK6viVOTu0Y+XzVWxJFFO8nROr368eMQ7cZPNt9VgvNFFT51qa tulCjt0ImQ1yyLlKPpagv9YJ3UFgp3G9HTr08HvOutb5oSpNtIR9efBkq2M+u+p3 SS9xmWwwCTY0OA6L6K0r5g3FazQrgdIXbldwf7EV64WdLBBcPJjleZGeAqhmHDwk UYZ6wc1u+2kcKOPafR8UwXSlAKMq8qLv6BcHPFoUaDFxnAvau1dS2w0FTLzmLS5J OZSBH5CV9Ucyt+X7OjnRCYbiH7Koa6Ov+Bv7GkoyznUiOU9m4YXBlVZSe3YsbhE0 hKZPx/IuDKQXQsmzoE7FWtjjIqhaFAKH7YszO07tC29GCkw7C+VpYSLyZYpP49sc YMz4/YaQHVCIPLw+0YlHzFuTezrkryVAt2JuRUQgfffILXEZMdQ8dXsmVXU6Tk2S 17ksXff2QJOuaoaYLEhmG3sGH7EHzrJL7LVaEJdnaoVxjqLC+awapJPqhd6OGMg4 Nvu0848m43i3jjyuJfFAtcs23iDh8sxJbxnTCeG+Td0FrrbQpfcX7lDX/0OlRfIk u1Fzh0FeF5MOdIB70Yv/ =ILAb -----END PGP SIGNATURE----- --=-=-=--