From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id gFPCKBhlt2O9MgAAbAwnHQ (envelope-from ) for ; Fri, 06 Jan 2023 01:02:32 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id iJ/JJxhlt2NhMQEAG6o9tA (envelope-from ) for ; Fri, 06 Jan 2023 01:02:32 +0100 Received: from mail.notmuchmail.org (yantan.tethera.net [IPv6:2a01:4f9:c011:7a79::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 512A86CB for ; Fri, 6 Jan 2023 01:02:32 +0100 (CET) Received: from yantan.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id 451B8604CD; Fri, 6 Jan 2023 00:02:19 +0000 (UTC) Received: from fethera.tethera.net (fethera.tethera.net [IPv6:2607:5300:60:c5::1]) by mail.notmuchmail.org (Postfix) with ESMTP id A93D1604BA for ; Fri, 6 Jan 2023 00:02:16 +0000 (UTC) Received: by fethera.tethera.net (Postfix, from userid 1001) id C7A1B5FB9E; Thu, 5 Jan 2023 19:02:15 -0500 (EST) Received: (nullmailer pid 3595981 invoked by uid 1000); Fri, 06 Jan 2023 00:02:12 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: [PATCH v2 2/3] lib: parse index.as_text Date: Thu, 5 Jan 2023 20:02:05 -0400 Message-Id: <20230106000206.3595708-3-david@tethera.net> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230106000206.3595708-1-david@tethera.net> References: <20230106000206.3595708-1-david@tethera.net> MIME-Version: 1.0 Message-ID-Hash: RPOS2ZHNXN3OCAUON4MAATO6OIOJMSVF X-Message-ID-Hash: RPOS2ZHNXN3OCAUON4MAATO6OIOJMSVF X-MailFrom: bremner@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.3 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Country: DE X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1672963352; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-owner:list-unsubscribe:list-subscribe:list-post; bh=j2EHaOLAEufLTM2alYtrvSxAd8dV36wYPuk/4WTDj14=; b=LdyKxlqX/tpRdWHdaCMzMXPPvNMzNCaxq4oZSDjFA90t88IAQvoFDG5v7qslZq/3j3Q7Fb kHnAq1TSCtqUexIDlrxyZZwU2XFmzofi60A+ErpRHzGwes9mhEmUzyYXltdkzh5cgXmLae ftXIlxQDYsSiJoB0qlN35OhbI/KHgvC1U4DL48rSJewp4qleIKyVU4jMTgWMpHiVOaRwW7 +SK29kfPtf6MB6hIaiOfsud8EpFuS/GG6lCrLMVx8tKIWdBTi9+JaPUwCTRJBX6CrwIk96 fNMRuMg6AcSjmilnrYpkyL2OnvIBlBJQhosOWhLFJ/NBDkq9WEieHNEhLhYv+Q== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2a01:4f9:c011:7a79::1 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org; dmarc=none ARC-Seal: i=1; s=key1; d=yhetil.org; t=1672963352; a=rsa-sha256; cv=none; b=OIFx8Uayvp20UszzwrGYuKPW4yFpL/RLrr7eHdrG/KPzcfXjf+loMEo6QRmeRvsMcOMXKy WkpGNfPpVPG0ILSLslKnZFikDoq9rhc1goq8pZ5szM5B6jbld/6rveBD9prZJQ4zGXMQ/V umdl6JZSIMoN5iJesWg811AnfGUmfqPZUg89Dytopls9SP6TJ0qsFW2MCLEP5gcuY+1766 0AdrIYsZ8yJHM3JAqbLa/oqDK7yD/jxqOVdntrZFgZ0qq4FukFK25n5b6W9vPkEZ7zqYSB aBRoKLxm8BMWjw5+EtD4FWRv3/PQzJN+p2DqxILOjvAG9CM1ouWWvcIi9c8ZIA== X-Spam-Score: -0.82 X-Migadu-Queue-Id: 512A86CB Authentication-Results: aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2a01:4f9:c011:7a79::1 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org; dmarc=none X-Migadu-Scanner: scn1.migadu.com X-Migadu-Spam-Score: -0.82 X-TUID: rFNN4bghV/02 We pre-parse into a list of compiled regular expressions to avoid calling regexc on the hot (indexing) path. As explained in the code comment, this cannot be done lazily with reasonable error reporting, at least not without touching a lot of the code in index.cc. --- lib/database-private.h | 4 ++++ lib/open.cc | 53 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+) diff --git a/lib/database-private.h b/lib/database-private.h index b9be4e22..61232f1a 100644 --- a/lib/database-private.h +++ b/lib/database-private.h @@ -291,6 +291,10 @@ struct _notmuch_database { /* Track what parameters were specified when opening */ notmuch_open_param_t params; + + /* list of regular expressions to check for text indexing */ + regex_t *index_as_text; + size_t index_as_text_length; }; /* Prior to database version 3, features were implied by the database diff --git a/lib/open.cc b/lib/open.cc index 67ff868c..54d1faf3 100644 --- a/lib/open.cc +++ b/lib/open.cc @@ -320,6 +320,8 @@ _alloc_notmuch (const char *database_path, const char *config_path, const char * notmuch->transaction_count = 0; notmuch->transaction_threshold = 0; notmuch->view = 1; + notmuch->index_as_text = NULL; + notmuch->index_as_text_length = 0; notmuch->params = NOTMUCH_PARAM_NONE; if (database_path) @@ -427,6 +429,53 @@ _load_database_state (notmuch_database_t *notmuch) notmuch, notmuch->xapian_db->get_uuid ().c_str ()); } +/* XXX This should really be done lazily, but the error reporting path in the indexing code + * would need to be redone to report any errors. + */ +notmuch_status_t +_ensure_index_as_text (notmuch_database_t *notmuch, char **message) +{ + int nregex = 0; + regex_t *regexv = NULL; + + if (notmuch->index_as_text) + return NOTMUCH_STATUS_SUCCESS; + + for (notmuch_config_values_t *list = notmuch_config_get_values (notmuch, + NOTMUCH_CONFIG_INDEX_AS_TEXT); + notmuch_config_values_valid (list); + notmuch_config_values_move_to_next (list)) { + regex_t *new_regex; + int rerr; + const char *str = notmuch_config_values_get (list); + size_t len = strlen (str); + + /* str must be non-empty, because n_c_get_values skips empty + * strings */ + assert (len > 0); + + regexv = talloc_realloc (notmuch, regexv, regex_t, nregex + 1); + new_regex = ®exv[nregex]; + + rerr = regcomp (new_regex, str, REG_EXTENDED | REG_NOSUB); + if (rerr) { + size_t error_size = regerror (rerr, new_regex, NULL, 0); + char *error = (char *) talloc_size (str, error_size); + + regerror (rerr, new_regex, error, error_size); + IGNORE_RESULT (asprintf (message, "Error in index.as_text: %s: %s\n", error, str)); + + return NOTMUCH_STATUS_ILLEGAL_ARGUMENT; + } + nregex++; + } + + notmuch->index_as_text = regexv; + notmuch->index_as_text_length = nregex; + + return NOTMUCH_STATUS_SUCCESS; +} + static notmuch_status_t _finish_open (notmuch_database_t *notmuch, const char *profile, @@ -531,6 +580,10 @@ _finish_open (notmuch_database_t *notmuch, if (status) goto DONE; + status = _ensure_index_as_text (notmuch, &message); + if (status) + goto DONE; + autocommit_str = notmuch_config_get (notmuch, NOTMUCH_CONFIG_AUTOCOMMIT); if (unlikely (! autocommit_str)) { INTERNAL_ERROR ("missing configuration for autocommit"); -- 2.39.0