From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 521B16DE0C6B for ; Fri, 8 Mar 2019 16:03:11 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -9.064 X-Spam-Level: X-Spam-Status: No, score=-9.064 tagged_above=-999 required=5 tests=[AM.WBL=-8, ALL_TRUSTED=-1, AWL=-0.064] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HNbC3CENzrKk; Fri, 8 Mar 2019 16:03:10 -0800 (PST) Received: from wondoo.home.cworth.org (unknown [10.0.0.11]) (Authenticated sender: cworth) by arlo.cworth.org (Postfix) with ESMTPSA id 3812C6DE0C66; Fri, 8 Mar 2019 16:03:10 -0800 (PST) Received: from wondoo (localhost [IPv6:::1]) by wondoo.home.cworth.org (Postfix) with ESMTPS id 09F181FE2DF6; Fri, 8 Mar 2019 16:03:10 -0800 (PST) To: Gregor Zattler , notmuch@notmuchmail.org Subject: Re: how to search for hyphenated words? (was: how to search for Morse code?) In-Reply-To: <87ef7hyxqs.fsf@len.workgroup> References: <87muui87om.fsf@len.workgroup> <87ef7hyxqs.fsf@len.workgroup> Sender: cworth@cworth.org From: Carl Worth Date: Fri, 08 Mar 2019 16:03:02 -0800 Message-ID: <87a7i4c3t5.fsf@wondoo.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 00:03:11 -0000 --=-=-= Content-Type: text/plain Hi Gregor, The trick here is that when notmuch is indexing body text it feeds it into a Xapian function that parses the text by finding "terms" in the text. And this parser considers both punctuation and whitespace as separators between terms. So your messages are not being indexed in a way to let you distinguish between "org notmuch" and "org-notmuch". (Of note, the query parser applies the same parsing to your query---so that even when you think you're typing an exact phrase like "org-notmuch" that gets parsed into separate terms "org" and "notmuch" for searching.) > all these resulted in very many hits most or all of which do not > contain the string "org-notmuch", one found email was e.g. > > id:20180904105723.15564-3-david@tethera.net That message does contain the following: +test_emacs '(notmuch-tree "id:000-real-root@example.org") + (notmuch-test-wait) Where you will notice that there's a term "org" followed (after some punctuation and whitespace separators) by a term "notmuch". > How would one search for hyphenated words with notmuch? You would need to arrange to have the indexer consider the hyphen as a letter-like character to be made part of terms. Or be extra clever and index something like "notmuch-test-wait" in multiple ways (such as a single term "notmuch-test-wait" as well as three adjacent terms "notmuch", "test", and "wait" as notmuch is doing currently). -Carl --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEzvjdyAGcpm1OMJTzYAIzup5U3GEFAlyDArYACgkQYAIzup5U 3GFCbg//csPAdcxPKk+kXosovA1+Sq7B6INKOxSk2rlfLlLtlRYHAztiZevRdt+l aq2cwwQ3M5cuP7pZ+DgBxwBLlsmAhZJW7aXJW4U7p/Jeh5PTaGLOTAjcFQd/FoIK D+Y1CwYvgjRsC+9wj3Mo0gM0Qyric/mXb/8qSmv21ydmOg2eugbcfe4X4GcJV+hg lg9Ky6QkWOre0W0+9GF0LcHYVOGo93830/EW7M2ClO7v5DdL4A08tEWMaKlfghPZ 6onoc3DgD1tOOR72Qz3yKia6p3ssDAzzqJMFnUzBLqCFPznTAccmHX0/ML5La0b2 tO0MzkMQmZpdo6e8vi6J0qzUhvACYU3Slr1RxSq9a7AOrxd+FanQjKr21/3ecRd8 Hjq7NpSoWuLmlWRJNXcF34FKsRLweAVF7ME9KV/lTd2Eu2ZfVUEcQKn+lq0ME8Dq REXURlZFM6DEa7H8/EibozGAk4GyxFHPk06WbGMZtNORsu0cKebsCXQ22nX8NoIr oqGY7k2EEEj2iC4SMlQ2vVVijAgv2zLYrTKSj4DEQeAOhSGsGsxMvdLDncUhfRpF Y6kHDR1y6toXWh3ueWM45FcrRK1/+bSC+OwtARbgqPf07Z9T16i3eYGsmDrW4mH6 o9eU++8Z2J2MW0Rqr+/LHEG7AL0gUjSv/opExxYK/lIuDrJ6U+4= =LTP4 -----END PGP SIGNATURE----- --=-=-=--