From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 44D066DE035A for ; Sun, 29 Jan 2017 03:23:41 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.277 X-Spam-Level: X-Spam-Status: No, score=-0.277 tagged_above=-999 required=5 tests=[AWL=0.443, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G47DL1BM031m for ; Sun, 29 Jan 2017 03:23:40 -0800 (PST) Received: from mail-lf0-f46.google.com (mail-lf0-f46.google.com [209.85.215.46]) by arlo.cworth.org (Postfix) with ESMTPS id B77206DE0319 for ; Sun, 29 Jan 2017 03:23:39 -0800 (PST) Received: by mail-lf0-f46.google.com with SMTP id x1so95306720lff.0 for ; Sun, 29 Jan 2017 03:23:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nikula-org.20150623.gappssmtp.com; s=20150623; h=from:to:subject:in-reply-to:references:date:message-id:mime-version; bh=HHtQyCxesJBNmMlh3Drt+CNXr0QeCfdBXywgoYJtXs4=; b=ltAYHRsucQA1+yMgHW4KA5ZhL1NCFa0XP4XFtaYK8X4We+/40aq8Q6O62CpGSSO6hI QQCorV5/45kVEvIEU3QBvP4OhF0sfTNFetvfVdmm9Me4Ht8NlrnZfhUajd7deSvzOMNM R7sX0zZlxy7cXzXfaSZDESgm4qxxg3DZ46rUrsSn1wUNGLKSp70VlvIrNZEi6FpkEB49 fNsVeToUgLNLHIQglIjYBXXRd9K/j2FiP0IdMjl+02N3TGVfeHNGQDnyIo4tUHcTkQtu A5X2+RBYHilkSfS7gJhTw+jD6YzSHvDNxKCGBvGnww2eBzAyxauRvqmnaR9HdtTdg2MM DfaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:in-reply-to:references:date :message-id:mime-version; bh=HHtQyCxesJBNmMlh3Drt+CNXr0QeCfdBXywgoYJtXs4=; b=UtO/xFP636BzJlDuOgrsDvtGZTQmMCzB/2TruwwVN4kvZAiEDicAC4v/azLb6qVN7/ YbwT5rmKq+v0IEwoGuDRZD7eg6TzGK9yN/uFWyaey5O9r/cvmJmbuChUKPtv2yA7n6/I oHSzKE1bxSvmTZfQ/bizotVfMS9Wx8SENbVVWV0peRw0oey5LwvOqyd4dpw0cTROc/4b 8eu770KL6R+FF7YRUibEQPplQWM7vFF+yDiCefx5I0oH7aE2/Tv3D5xN2V9fFUbVQksi izSRrHQUarNYpli+lqp8uipV/Tizw7EE+8/k/Nv0iNKxohqVKHH+NVDeqniidtA9LCyK KE1A== X-Gm-Message-State: AIkVDXKM4zXpDtCEp53fEVskbYfzMhrA8L4aQqXR0FJgMqX/O1zHFgaokm6aD2YvgCFEnQ== X-Received: by 10.25.129.147 with SMTP id c141mr5285725lfd.93.1485689017732; Sun, 29 Jan 2017 03:23:37 -0800 (PST) Received: from localhost (mobile-access-5d6a3e-39.dhcp.inet.fi. [93.106.62.39]) by smtp.gmail.com with ESMTPSA id v18sm2802699lja.31.2017.01.29.03.23.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 29 Jan 2017 03:23:37 -0800 (PST) From: Jani Nikula To: David Bremner , Tomi Ollila , notmuch@notmuchmail.org Subject: Re: [Patch v4] lib: regexp matching in 'subject' and 'from' In-Reply-To: <87efzqef2r.fsf@tethera.net> References: <20170121032752.6788-1-david@tethera.net> <20170121135917.22062-1-david@tethera.net> <87efzqef2r.fsf@tethera.net> Date: Sun, 29 Jan 2017 13:23:36 +0200 Message-ID: <87ziia2jpj.fsf@nikula.org> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jan 2017 11:23:41 -0000 On Wed, 25 Jan 2017, David Bremner wrote: > Tomi Ollila writes: > >> >> Why would not mesasge_id not be useful to regex match. I can come up quite >> a few use cases... but if there are techinal difficulties... then that >> should be mentioned instead. > > I'll have a look. Since the first version of this patch (when that > message was written), people have actually asked for some kind of > wildcard matching of message-ids. Theoretically "/" is an acceptable character in message-ids [1]. Rare, unlikely, but acceptable. Searching for message-id's beginning with "/" would have to use regexps, which would break in all sorts of ways throughout the stack. I don't think there are handy alternatives to "//", given the characters that are acceptable in message-ids, but this is something to think about. For example, could the regexp matcher for message-ids first check if the "regexp" is a strict match with "/" and all, and accept those? This might be a reasonable workaround if it can be made to work. [1] https://tools.ietf.org/html/rfc2822#section-3.2.4 >> maybe this commit message should inform that xapian with field processors >> (1.4.x) is required for this feature -- and emphasize it a bit better in >> manual page ? >> >> Probably '//' is used to escape '/' -- should such a character ever needed >> in regex search. >> > > Currently no escaping is needed because it only looks at the first and > last characters of the string (the usual xapian/shell rules mean that "" might > be needed). > > The following seem to work as hoped > > # match a / with a space before it > > % notmuch search 'subject:"/ //"' > > # just a slash > > % notmuch search subject:/// > > # anchored slash > > % notmuch search subject:/^// > > The trailing slash is actually decorative, we could drop it. Actually > *blush* I just noticed the current code is missing something from this line > > if (str.at (0) == '/' && str.at (str.size () - 1)){ > > _if_ that line is fixed, then it will have the slightly odd behaviour of > > subject:/blah > > doing a non-regex search > > We could also throw an error for that case, maybe that's the best option. I'd go with an error. It's easy to loosen the rules later on if we decide that's a good idea. Much harder to accept loose rules now, let users get used to it, and try to tighten the rules if we realize we'd need that for some reason. BR, Jani.