From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <3qEqJXAoKB001p6178632vv33v0t.r3123819rw23819rw1px0.36v@flex--marmstrong.bounces.google.com> Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 8B30D6DE0E92 for ; Wed, 13 Mar 2019 11:23:40 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -7.827 X-Spam-Level: X-Spam-Status: No, score=-7.827 tagged_above=-999 required=5 tests=[AWL=-0.139, DKIMWL_WL_MED=0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01, USER_IN_DEF_DKIM_WL=-7.5] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EJEHRokSseET for ; Wed, 13 Mar 2019 11:23:38 -0700 (PDT) Received: from mail-yw1-f73.google.com (mail-yw1-f73.google.com [209.85.161.73]) by arlo.cworth.org (Postfix) with ESMTPS id C0DB66DE0EE3 for ; Wed, 13 Mar 2019 11:23:37 -0700 (PDT) Received: by mail-yw1-f73.google.com with SMTP id b6so3540002ywd.23 for ; Wed, 13 Mar 2019 11:23:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to; bh=Lbiw+haLgQl2+l0dulEVfVkK5CrmR1rBzFhspqI7acA=; b=jNWWOLUXXLvx8vK4FcnaM20Js3oi+0riEMCR3lZv9EZXLFNbKNS18DorkhhlhEBe/L 9r0J7XaCMYYkk0e+sVRG2Y+ppknkNE+g/Ec9skRssXJwmUiODRln40Y1oQvGPudt34KK 2NdAFEBtxaw6Dq8AeGQSaWYWdv7Z9qz2fLhpD2synYYHaio79vxwMO6tdl/ZZAeTgr2y QSU+o/L55nOwZC380bJ6caLqxmfkFd0PVM62sj4Iyc7M4dUuCgMcpi9VNz4tSBi06Pdf VIjlz3lUAPwwVHMlqHCJ3mmf7Awjrqx0c/Fh6WvrqMCT98VVAKfFh1+aF6a5iJf9BpNU QJDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to; bh=Lbiw+haLgQl2+l0dulEVfVkK5CrmR1rBzFhspqI7acA=; b=cGCChUgNhthrwpkvR3Io5/QCeGH4f71fznpaIpzMpMhn1mU/xVptFd81WM1Wl6YO2B dXATOFflnHIKxBIVqCBt4w/HR9qFxHqMS1jYTOtb+Tjl88Xlku37O0p8ZJxsgjF0jp4p s9Q3SlGU2gjTCUxqsbC2fI5+oijR8DY8f9UHFLN4veU2iRlPjp6Me5I0bMw6lhKSh10k XPgd/q2YjYTUv5NXC6dfD3M11FHZGnKURdAESpgyBPi5aUpAzvyrkIzPaO06R2RfVZqF 7Pe04LBRAoaxNitK1KEimyO6YAuRpzpdZo34+IRYwdix7b0ERHqe3pY2fUGu0vqosJSs rBWA== X-Gm-Message-State: APjAAAXYBQ58s40dihZ0Y0m3ZWrOXzrkrt03LylMYAz+xo15AfCR2oNB nXogLn4r4Hg/A6KyBt6lhfp8ankZvF23zPcU X-Google-Smtp-Source: APXvYqwton+Hvu2h5kmgBqltT5XMcY5iSIklnGVzABLGPFK+rH1j5VkZJBcfitAMFDUhxqmaxPyKY8o/LzlBRHB6 X-Received: by 2002:a25:910e:: with SMTP id v14mr21736383ybl.74.1552501416806; Wed, 13 Mar 2019 11:23:36 -0700 (PDT) Date: Wed, 13 Mar 2019 11:23:34 -0700 In-Reply-To: <87wol4dhe7.fsf@tethera.net> Message-Id: Mime-Version: 1.0 References: <87muui87om.fsf@len.workgroup> <87ef7hyxqs.fsf@len.workgroup> <87a7i4c3t5.fsf@wondoo.home.cworth.org> <87wol4dhe7.fsf@tethera.net> Subject: Re: how to search for hyphenated words? (was: how to search for Morse code?) From: Matt Armstrong To: David Bremner , Carl Worth , Gregor Zattler , notmuch@notmuchmail.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Mar 2019 18:23:40 -0000 David Bremner writes: > Matt Armstrong writes: > >> Carl Worth writes: >> >>> Hi Gregor, >>> >>> The trick here is that when notmuch is indexing body text it feeds it >>> into a Xapian function that parses the text by finding "terms" in the >>> text. And this parser considers both punctuation and whitespace as >>> separators between terms. >> >> I notice that Xapian supports something called "phrase searches", >> documented as: >> >> "A phrase surrounded with double quotes ("") matches documents >> containing that exact phrase. Hyphenated words are also treated as >> phrases, as are cases such as filenames and email addresses >> (e.g. /etc/passwd or president@whitehouse.gov)." >> >> I assume that this particular Xapian feature is unavailable in notmuch? >> If so, I wonder if enabling has ever been considered? > > It is enabled, and documented in notmuch-search-terms(7). Unfortunately > I don't think it's related to the original request. The mention of > hyphenated words is about the input to the query parser, not the > (necessarily) the retrieved text. Ah, so it boils down to the Xapian definition of "exact phrase." Notably, "exact phrase" is not "identical sequence of characters" as some people might expect. Quick tests with various search engines reveal their phrase search as operating the same way. E.g. searching for "org notmuch" finds all sorts of results: org-notmuch.el notmuchmail.org/notmuch-emacs/ to:devicetree@vger.kernel.org notmuch tag +inbox +unread -new (require 'org-notmuch nil t) https://notmuchmail.org/notmuch-emacs/. * imaps://mail.example.org/Notmuch/search For what it is worth, one thing I've taken to doing is using period separators in the notmuch phrase searches I use in scripts and even interactively. Using periods is generally immune to confusing issues related to quoting double quoted things, and always remains a single shell "word." They are also, most often, clearly not the exact content I'm searching for, so they make it clear than the match algorithm is inexact. E.g. subject:notmuch.is.wonderful instead of: subject:"notmuch is wonderful"