From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id F00D6421197 for ; Tue, 17 Jan 2012 11:47:40 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id grh8KH253H4k for ; Tue, 17 Jan 2012 11:47:40 -0800 (PST) Received: from dmz-mailsec-scanner-5.mit.edu (DMZ-MAILSEC-SCANNER-5.MIT.EDU [18.7.68.34]) by olra.theworths.org (Postfix) with ESMTP id 5B65A421192 for ; Tue, 17 Jan 2012 11:47:40 -0800 (PST) X-AuditID: 12074422-b7fd66d0000008f9-d8-4f15d05b5f1e Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) by dmz-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP id E9.55.02297.B50D51F4; Tue, 17 Jan 2012 14:47:39 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id q0HJlXHX018149; Tue, 17 Jan 2012 14:47:33 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q0HJlRPg029944 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Tue, 17 Jan 2012 14:47:32 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1RnEzz-00082h-Kv; Tue, 17 Jan 2012 14:47:15 -0500 Date: Tue, 17 Jan 2012 14:47:15 -0500 From: Austin Clements To: Jani Nikula Subject: Re: Partial words on notmuch search? Message-ID: <20120117194715.GO16740@mit.edu> References: <20120115220600.GO7037@think.nuvreauspam> <877h0sa207.fsf@fester.com> <20120116202103.GA14329@think.nuvreauspam> <20120117023431.GF16740@mit.edu> <87aa5mkyw5.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87aa5mkyw5.fsf@nikula.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFuplleLIzCtJLcpLzFFi42IR4hRV1o2+IOpv8HG+kMWqCdIWTdOdLa7f nMnswOyxc9Zddo9b91+zezxbdYs5gDmKyyYlNSezLLVI3y6BK+N8xx/2gq/cFc+eLWJrYDzP 2cXIySEhYCLR/H09K4QtJnHh3nq2LkYuDiGBfYwSxx+dYYdwNjBKrD53kBHCOckkcfbSYiYI ZwmjxI3fU9lA+lkEVCXad99nBLHZBDQktu1fDmaLCChKbD65H8xmFnCTWLy5F2yfsICuxKkd B9hBbF4BHYlna14xQww9yCix7+ADVoiEoMTJmU9YIJq1JG78ewm0mQPIlpZY/o8DJMwJtKtn 4gMmEFtUQEViysltbBMYhWYh6Z6FpHsWQvcCRuZVjLIpuVW6uYmZOcWpybrFyYl5ealFuqZ6 uZkleqkppZsYQaHO7qK0g/HnQaVDjAIcjEo8vAWbRP2FWBPLiitzDzFKcjApifI+PA8U4kvK T6nMSCzOiC8qzUktPsQowcGsJMKbmwaU401JrKxKLcqHSUlzsCiJ86prvfMTEkhPLEnNTk0t SC2CycpwcChJ8C4FGSpYlJqeWpGWmVOCkGbi4AQZzgM0fCFIDW9xQWJucWY6RP4Uo6KUOG8T SEIAJJFRmgfXC0tFrxjFgV4R5l0OUsUDTGNw3a+ABjMBDc5pFQIZXJKIkJJqYJzXFGlYcCl5 y/efwaIn9FY7VC72XfDOXiCWeYrLNs/Jtd/XTAhiCdXtK7apf/Fao9uhKO4N7xqBFccPuOfz 331ziD1X7kvybLtzbJVvrvhad854t/nR5rb4mMAf1Uz5n10ZzLpmz/i/mqP1bM+Lv6qrXONf /jVldVvRlGKp6sXvWfVLccuVN0osxRmJhlrMRcWJAFIqgc0gAwAA Cc: notmuch@notmuchmail.org, Andrei Popescu X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jan 2012 19:47:41 -0000 Quoth Jani Nikula on Jan 17 at 7:43 pm: > On Mon, 16 Jan 2012 21:34:31 -0500, Austin Clements wrote: > > Quoth Andrei Popescu on Jan 16 at 10:21 pm: > > > This is also interesting: > > > $ notmuch count 'debian' > > > 65888 > > > $ notmuch count 'dEbian' > > > 65888 > > > $ notmuch count 'Debian' > > > 65887 > > > > The first two will match stemmed versions of "debian" such as > > "debian's" and "debianed". However, starting a term with a capital > > letter suppresses stemming (because it suggests that it's a name, > > which you wouldn't want to modify), so your last query matches only > > the term "debian". This is probably documented somewhere, though I > > don't know where. > > Interesting. Is this done when adding the terms to the database, or when > searching? I presume the latter. How much control does notmuch have over > this? This is getting a bit out of my depth, but I believe indexing is done with both stemmed and unstemmed versions of all terms (if stemming is enabled) so that search can use either. For indexing, Notmuch can set the stemmer (or no stemmer). Xapian provides stemmers for a variety of languages: http://xapian.org/docs/apidoc/html/classXapian_1_1Stem.html#6c46cedf2047b159a7e4c9d4468242b1 For query parsing, Notmuch can set both the stemmer and a "stemming strategy" that controls when it stems or doesn't stem terms: http://xapian.org/docs/apidoc/html/classXapian_1_1QueryParser.html#c7dc3b55b6083bd3ff98fc8b2726c8fd