From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id 6NoZI0bjE2OhDQAAbAwnHQ (envelope-from ) for ; Sun, 04 Sep 2022 01:29:10 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id UJkvI0bjE2NdbQAA9RJhRA (envelope-from ) for ; Sun, 04 Sep 2022 01:29:10 +0200 Received: from mail.notmuchmail.org (yantan.tethera.net [135.181.149.255]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 31A0C199AE for ; Sun, 4 Sep 2022 01:29:10 +0200 (CEST) Received: from yantan.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id BBE365FD33; Sat, 3 Sep 2022 23:28:56 +0000 (UTC) Received: from fethera.tethera.net (fethera.tethera.net [IPv6:2607:5300:60:c5::1]) by mail.notmuchmail.org (Postfix) with ESMTP id 4E3005F376 for ; Sat, 3 Sep 2022 23:28:53 +0000 (UTC) Received: by fethera.tethera.net (Postfix, from userid 1001) id 873C55FBC2; Sat, 3 Sep 2022 19:28:52 -0400 (EDT) Received: (nullmailer pid 1474097 invoked by uid 1000); Sat, 03 Sep 2022 23:28:47 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: [PATCH 2/3] lib: parse index.as_text Date: Sat, 3 Sep 2022 20:28:38 -0300 Message-Id: <20220903232839.1473915-3-david@tethera.net> X-Mailer: git-send-email 2.35.2 In-Reply-To: <20220903232839.1473915-1-david@tethera.net> References: <20220903232839.1473915-1-david@tethera.net> MIME-Version: 1.0 Message-ID-Hash: 5ST7KO4INYBLGFULAROHN7D5YPFQ7FPW X-Message-ID-Hash: 5ST7KO4INYBLGFULAROHN7D5YPFQ7FPW X-MailFrom: bremner@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.3 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: DE ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1662247750; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-owner:list-unsubscribe:list-subscribe:list-post; bh=pXoD0IDGqDh++4vwF5qS3pEaSOStr8p9eaAjnKo0Wb8=; b=IB+P/4fgOtI5QiJ37VVW9Yk9SfQGpgtiWdC+PFZqUJPEKCJ3iEZjScuibX+LoTamGdhvQc cPoUz9VIp/4Dubhdrj89IF8ilHsANK3HLZXPQ8P6QU4PBZXnqjDEn9WqXQ/UBb8iTwHGws P4qOc3CL8dsA7vugKOFu7w1l39wBxICXO3m9n4nnOtOILZNXy53WA4UFsH7aiRV3zKnisf 3wm4NuDdAeGG/gVvUYLFN5PDMWAI1pM8uhmVu3us/mSBnHD+ELWg19h7wa9MriCFoCorF+ BYDF/BN0s92YIfAJqer5bUuYKpnt9iOydrsxceBr1+7PD07e0nk7QgeONPO71w== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1662247750; a=rsa-sha256; cv=none; b=dz500XtJ4Db0uwfLLXUpnryYtBXISNbU+h9V/I9cDT7rse+o9T6UGxYAenYmaY9twAZtZw EH1RFpbcDpJyRgk8QF+NPQHFiOdoVWn1IjLLOFkL5D14av/nRHyIi6M8MB2C/vPUrhrrnJ XJCd6c/9rd3Glfu3ZvhGy9wTXjngfqY7TVkc+x5T7E8Ivf+D4g+dvdmHSo2pNVar/5n4Uj IOYI3g8sBBSHNgBJe+Pn0uvJ6X4pSD1YMrKz2nTlrvDnL+esxRHEB5AWId3axJN/XPP8Oj Gcr3+j6VO/DItHpfTSh/oqZvUCgI1qcuLg6lxnjIGVCJwDenK4pAAJsGiuk/kA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 135.181.149.255 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Spam-Score: -1.32 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 135.181.149.255 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: 31A0C199AE X-Spam-Score: -1.32 X-Migadu-Scanner: scn1.migadu.com X-TUID: 0lfr2cI16ElG We pre-parse into a list of compiled regular expressions to avoid calling regexc on the hot (indexing) path. As explained in the code comment, this cannot be done lazily with reasonable error reporting, at least not without touching a lot of the code in index.cc. --- lib/database-private.h | 4 ++++ lib/open.cc | 53 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+) diff --git a/lib/database-private.h b/lib/database-private.h index b9be4e22..61232f1a 100644 --- a/lib/database-private.h +++ b/lib/database-private.h @@ -291,6 +291,10 @@ struct _notmuch_database { /* Track what parameters were specified when opening */ notmuch_open_param_t params; + + /* list of regular expressions to check for text indexing */ + regex_t *index_as_text; + size_t index_as_text_length; }; /* Prior to database version 3, features were implied by the database diff --git a/lib/open.cc b/lib/open.cc index 67ff868c..54d1faf3 100644 --- a/lib/open.cc +++ b/lib/open.cc @@ -320,6 +320,8 @@ _alloc_notmuch (const char *database_path, const char *config_path, const char * notmuch->transaction_count = 0; notmuch->transaction_threshold = 0; notmuch->view = 1; + notmuch->index_as_text = NULL; + notmuch->index_as_text_length = 0; notmuch->params = NOTMUCH_PARAM_NONE; if (database_path) @@ -427,6 +429,53 @@ _load_database_state (notmuch_database_t *notmuch) notmuch, notmuch->xapian_db->get_uuid ().c_str ()); } +/* XXX This should really be done lazily, but the error reporting path in the indexing code + * would need to be redone to report any errors. + */ +notmuch_status_t +_ensure_index_as_text (notmuch_database_t *notmuch, char **message) +{ + int nregex = 0; + regex_t *regexv = NULL; + + if (notmuch->index_as_text) + return NOTMUCH_STATUS_SUCCESS; + + for (notmuch_config_values_t *list = notmuch_config_get_values (notmuch, + NOTMUCH_CONFIG_INDEX_AS_TEXT); + notmuch_config_values_valid (list); + notmuch_config_values_move_to_next (list)) { + regex_t *new_regex; + int rerr; + const char *str = notmuch_config_values_get (list); + size_t len = strlen (str); + + /* str must be non-empty, because n_c_get_values skips empty + * strings */ + assert (len > 0); + + regexv = talloc_realloc (notmuch, regexv, regex_t, nregex + 1); + new_regex = ®exv[nregex]; + + rerr = regcomp (new_regex, str, REG_EXTENDED | REG_NOSUB); + if (rerr) { + size_t error_size = regerror (rerr, new_regex, NULL, 0); + char *error = (char *) talloc_size (str, error_size); + + regerror (rerr, new_regex, error, error_size); + IGNORE_RESULT (asprintf (message, "Error in index.as_text: %s: %s\n", error, str)); + + return NOTMUCH_STATUS_ILLEGAL_ARGUMENT; + } + nregex++; + } + + notmuch->index_as_text = regexv; + notmuch->index_as_text_length = nregex; + + return NOTMUCH_STATUS_SUCCESS; +} + static notmuch_status_t _finish_open (notmuch_database_t *notmuch, const char *profile, @@ -531,6 +580,10 @@ _finish_open (notmuch_database_t *notmuch, if (status) goto DONE; + status = _ensure_index_as_text (notmuch, &message); + if (status) + goto DONE; + autocommit_str = notmuch_config_get (notmuch, NOTMUCH_CONFIG_AUTOCOMMIT); if (unlikely (! autocommit_str)) { INTERNAL_ERROR ("missing configuration for autocommit"); -- 2.35.2