From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id qW/8MqiU82ADGAAAgWs5BA (envelope-from ) for ; Sun, 18 Jul 2021 04:40:40 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id 6As4LqiU82DsGAAAB5/wlQ (envelope-from ) for ; Sun, 18 Jul 2021 02:40:40 +0000 Received: from mail.notmuchmail.org (nmbug.tethera.net [144.217.243.247]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 73E121E7DC for ; Sun, 18 Jul 2021 04:40:40 +0200 (CEST) Received: from nmbug.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id E42572757D; Sat, 17 Jul 2021 22:40:32 -0400 (EDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by mail.notmuchmail.org (Postfix) with ESMTP id 03F3727185 for ; Sat, 17 Jul 2021 22:40:29 -0400 (EDT) Received: by fethera.tethera.net (Postfix, from userid 1001) id C69F45FD21; Sat, 17 Jul 2021 22:40:28 -0400 (EDT) Received: (nullmailer pid 3853917 invoked by uid 1000); Sun, 18 Jul 2021 02:40:27 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: v2 sexpr parser Date: Sat, 17 Jul 2021 23:39:56 -0300 Message-Id: <20210718024021.3850340-1-david@tethera.net> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Message-ID-Hash: MACUWHVL7UHVD75ZQNMBJL5654VRLGLB X-Message-ID-Hash: MACUWHVL7UHVD75ZQNMBJL5654VRLGLB X-MailFrom: bremner@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.1 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1626576040; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=FF8E/v62RTxY/VI9OK4paQT8d6OXmO5gMZSoLYONH+8=; b=SYBUIABBCGDGViXEvG/ErDLc1h/CuIEh0F54Xx2aCgPim+089BrLOxoM2N7Q/2YPLZEOu1 KLahGXL5ZR78kN1ZAJXAQONXzmk69pccY7yoUkLYCpAboMF2miyypgk2SZEEjgghuUT+Kq JcJw/sm//UidH4T0wenLCdc75cB0XH+2p95Jthd/MQyE3SnvooefV4G1UiCF2sFIQWDi99 gPALZsT0ca4YBa7g2xAvcXCz9ardNawfzWGU6fuGcCy6XZuMJuuUBDxTZOCoyslsiK8788 BsAHXQ/KlMTq20ZbNUPfSIKA5h3E/OSWsb5W/YOwc75nZllW0MQOlmkOAX1h8A== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1626576040; a=rsa-sha256; cv=none; b=uhBVDCNHI67c4/bk92a5zcqRDexddJdM6LHJv/67NMNTPIdc7Ia4AfG3KlSslEtxyc2So+ 6FNvN99fcrZKA1bOs/G/1ZITsx7scLqloN58z1mGI6gmfKPf1hUj8Yiv+ujV7K70f7NkqU WbzfsQ/TV/UJRmT/7evBZ6ZxEDDGWMtMMgB/12T//cIqQdF/GEt79+DLJrlWoioiW7QDDh iiRlGsHfiASswOjA+RTnsSLIrJDdkj/Spl/QvS6hOeCwEHVjhQXRH7IjULJuPy3oyyTs9R Op/wg6Fy0MWh9CtN2nw0Lo8yINHgxRH1jFETXYxfgOtSGonPjFk9wOHcDxE3JA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 144.217.243.247 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Spam-Score: -0.57 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 144.217.243.247 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: 73E121E7DC X-Spam-Score: -0.57 X-Migadu-Scanner: scn0.migadu.com X-TUID: T6IRPgH7z8wq This is a substantially revised version of the series at [1]. As far as I know, it now understands (a translation of) most of the queries handled by the existing query parser. Some remaining limitations/issues 1) The new query parser is only hooked into the notmuch search subcommand. It should be fairly rote to hook it into the other relevant subcommands, but I want to wait until resolving (2) before proceeding. 2) The command line option --query-syntax={sexp,xapian} is a bit klunky. Also "xapian" should perhaps be renamed "infix" to match the 'infix' operator in the new parser. 3) There is no documentation. I think notmuch-search-terms(7) is too long already, so there should probably be a separate manual page. I don't want to write that until I'm sure we want the new parser. 4) There is still some uncertainty around utf8 handling in sfsexp. 5) I'm not too sure about the new API call notmuch_query_create_sexpr. I guess a more idiomatic thing to do would be to add a new function with an extra argument, and have the old function call it. 6) The way that user defined headers are used in the new parser is a bit different than the existing one. Instead of (List notmuch), you currently have to write (header List notmuch). I don't know if that's better or worse. It's a bit more typing, but it is maybe a bit clearer to read. It would probably not be too hard to switch. 7) Trailing wildcards like "subject:foo*" are not implemented yet. In [2] Hannu mentioned being unclear on the design goals of the s-expression query parser, so let me try and articulate the main design goals a bit better. I think the existing query parser is great for making "easy things easy". But when things are not easy and/or the user wants better diagnostics, it is nice to have an alternative. A) More consistent / predictable syntax. The notmuch query parser adds several features to the Xapian query parser. Mainly due for implementation reasons, this has resulted in a somewhat quirky syntax, and often fairly painful escaping. Probably the most egregious syntax quirk is that '*' (for all messages) cannot be composed with other queries. In particular is should simplify and make more reliable code like "notmuch-search-filter", which tries to combine an existing query with some user specified filter. With the new parser, this 15-20 lines can be replaced by `(and (infix ,existing) (infix ,new)) B) Better error reporting. Xapian's query parser is designed to be permissive and almost never rejects a query string. This is not always ideal, particularly with debugging constructed queries. C) Extensibility The Xapian Query API has functionality that is not (yet) exposed via the QueryParser. It turns out that some common feature requests are easy to add [3]. For example, to match messages with a List-Id header, you can use '(header List :any)'. [1]: id:20210714000239.804384-1-david@tethera.net [2]: id:60f190f8.1c69fb81.7e7d2.40d1@mx.google.com [3]: In fairness, they would probably be fairly easy to add to the Xapian QueryParser as well. But then we'd need to depend on a sufficiently recent version.