From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id EDCFA6DE1722 for ; Fri, 10 Feb 2017 00:29:12 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.023 X-Spam-Level: X-Spam-Status: No, score=-0.023 tagged_above=-999 required=5 tests=[AWL=0.038, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H2=-0.211, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UjeT60wEmN-X for ; Fri, 10 Feb 2017 00:29:10 -0800 (PST) Received: from mail-wr0-f194.google.com (mail-wr0-f194.google.com [209.85.128.194]) by arlo.cworth.org (Postfix) with ESMTPS id 972B76DE16BA for ; Fri, 10 Feb 2017 00:29:09 -0800 (PST) Received: by mail-wr0-f194.google.com with SMTP id 89so14331967wrr.1 for ; Fri, 10 Feb 2017 00:29:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:in-reply-to:references:date:message-id:mime-version; bh=Q2u5LeY2hOiY5WKjYuLW3QSXdJnzukPnabzqNAphdyY=; b=eoBFyBM4rmBhTTWZ7pyKo7z05CzvWLiLIKhMt8BHV/yGIRgJHVdvriL8k/M1qEZRDd XJt/s5YbOj0vUN5HMGr/SysBGZUKCiHYfcTjNTfwqvSHLEd8bYw5qszsvmYevsOd/BIw PaYJI5Sf4AdG+ioLaimYowh9/1+FBOnM2jJoSioNRPes3mMhwKhLnqkDSNqnkoOFlxZ3 LH2GBgsmmQ5XoHrIkz1cqsR8CEpnsIGdPQAiShkfXDSWegEz/MPCPoIPGFeCVcWH2VDl kJVkCRWaEm5Uh5C+ooMIXPxFRzeqJv25hWkxsyKS+M/jrHwVa2fh0eZOQN4QnwsbmOot Hrxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:in-reply-to:references:date :message-id:mime-version; bh=Q2u5LeY2hOiY5WKjYuLW3QSXdJnzukPnabzqNAphdyY=; b=Ch6KPnMXq2MqHYDheguM14QkfKZpqeCegr9kv/EdUEeR50H46T+HxRzrEzvO696SwS hGsepObUaumGVtHIyfJrkPUuWoozcNnfwmwpodyUB5zrQbqkbOLGbqTZwViufhPaPqkz EP2oiNn/BBw/BZe34pjf/KT0caYwPZj1kZFe079sPMDqnaZ0p3iQVaAZK6knJ7WZ8WFp x+LjEOQYfJjDB727V7TxvcoVpM4bhM1bnf8e02Cjcb3A40Yu3NxoA6szzf2KF5S49lyd FGDhbcbssW1Z4sd7GISEAlWVpFcR86TFW425e5OZM7WuIqxZRI6AilIwiolBCSPBODbB If3Q== X-Gm-Message-State: AMke39mYeRfCmb2tCaQnjwa//LFquWc4zZNQ8A361Y4G0d268pv5iVZlQq5girpVgDZ+ow== X-Received: by 10.223.138.9 with SMTP id w9mr6464393wrw.50.1486715346823; Fri, 10 Feb 2017 00:29:06 -0800 (PST) Received: from localhost (5751dfa2.skybroadband.com. [87.81.223.162]) by smtp.gmail.com with ESMTPSA id y30sm1517473wrc.23.2017.02.10.00.29.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 Feb 2017 00:29:06 -0800 (PST) From: Mark Walters To: David Bremner , Jani Nikula , Tomi Ollila , notmuch@notmuchmail.org Subject: Re: [Patch v4] lib: regexp matching in 'subject' and 'from' In-Reply-To: <87efz8vz0w.fsf@rocinante.cs.unb.ca> References: <20170121032752.6788-1-david@tethera.net> <20170121135917.22062-1-david@tethera.net> <87efzqef2r.fsf@tethera.net> <87ziia2jpj.fsf@nikula.org> <87efz8vz0w.fsf@rocinante.cs.unb.ca> Date: Fri, 10 Feb 2017 08:29:05 +0000 Message-ID: <878tpetpny.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Feb 2017 08:29:13 -0000 On Thu, 09 Feb 2017, David Bremner wrote: > Jani Nikula writes: > >> >> Theoretically "/" is an acceptable character in message-ids [1]. Rare, >> unlikely, but acceptable. Searching for message-id's beginning with "/" >> would have to use regexps, which would break in all sorts of ways >> throughout the stack. I don't think there are handy alternatives to >> "//", given the characters that are acceptable in message-ids, >> but this is something to think about. > > Would telling the user to \ escape ( or double /) the initial / be good > enough there? This would disable regex processing. I guess this goes > back to someone's earlier suggestion. A third option would be to use > single quotes there ("id:'/foo'"), but that isn't really consistent with either Xapian > or usual regex conventions. > > So I guess my favourite idea ATM is to use id:\/some/crazy/message-id > FWIW, I don't have any such message ids. > >> For example, could the regexp matcher for message-ids first check if the >> "regexp" is a strict match with "/" and all, and accept those? This >> might be a reasonable workaround if it can be made to work. > > We're building a query, so I think the equivalent is to make an OR, with > the exact match and the regex posting source. That could be done, > although I'm a bit uneasy about how this makes the syntax for id: > different, so id:/foo would be legit, but from:/foo would be an error. > Maybe the dwim-factor is worth it. Hi Broadly I like the backslash escaping option. Two thoughts: can any fields (from/subject/message-id) start with a "\" anyway? I think not but thought it worth checking. Secondly, message-id is often round-tripped, that is output from notmuch and then fed back to notmuch. Do we want to escape the output as above before printing in any cases? My view is that if we output the message-id prefixed with "id:" then we should escape it (which applies with --output=messages --format=text), but if we don't print the "id:" part then we shouldn't (eg with --format=json). A similar thing would apply to emacs: if it is a normal stash then escape the id, but if it is a "bare stash" then do not. Actually, one more thing: it would be a shame to block or significantly delay the series for such a corner case. Best wishes Mark