From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id WN4zOxstAWOy5AAAbAwnHQ (envelope-from ) for ; Sat, 20 Aug 2022 20:51:08 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id aMk9OxstAWPFygAA9RJhRA (envelope-from ) for ; Sat, 20 Aug 2022 20:51:07 +0200 Received: from mail.notmuchmail.org (yantan.tethera.net [IPv6:2a01:4f9:c011:7a79::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C8C61ED20 for ; Sat, 20 Aug 2022 20:51:07 +0200 (CEST) Received: from yantan.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id 0FC505FD4B; Sat, 20 Aug 2022 18:50:57 +0000 (UTC) Received: from fethera.tethera.net (fethera.tethera.net [IPv6:2607:5300:60:c5::1]) by mail.notmuchmail.org (Postfix) with ESMTP id 2CD7F5FD41 for ; Sat, 20 Aug 2022 18:50:53 +0000 (UTC) Received: by fethera.tethera.net (Postfix, from userid 1001) id 89C4C5FBC4; Sat, 20 Aug 2022 14:50:52 -0400 (EDT) Received: (nullmailer pid 289651 invoked by uid 1000); Sat, 20 Aug 2022 18:50:32 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: [PATCH 3/3] WIP/lib: index all text/* attachements. Date: Sat, 20 Aug 2022 11:50:07 -0700 Message-Id: <20220820185007.289543-4-david@tethera.net> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220820185007.289543-1-david@tethera.net> References: <20220820185007.289543-1-david@tethera.net> MIME-Version: 1.0 Message-ID-Hash: 22HDI2TL5VGF7BNWDVYZBGDMMC7GHKUK X-Message-ID-Hash: 22HDI2TL5VGF7BNWDVYZBGDMMC7GHKUK X-MailFrom: bremner@tethera.net X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: jwilk@jwilk.net X-Mailman-Version: 3.3.3 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: DE ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1661021467; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-owner:list-unsubscribe:list-subscribe:list-post; bh=6XAG1ITv2UVcO0LRLHTnxvWCoKBGcCYJ2LAWkL2avok=; b=ck7bKHfuf4GUk3Ockfcb2euXJzYUUYJYCwV2nNk7vlKbWx8P0vjEdogKT0eh1iJMM5vFyz 92MgNfYuG1vbDLZ1dDOWurYtBnH+htSw+mEnvVE2U4TTVmQSw1zqiCode39oPIoqvf3+WQ u3TsJy5RInLtqrOJv3kC1YaOsABGFuXv63QfciKMYjb295uk2VAASUisctK5IXE70z+RzT 8b2NbwNIVHsaZSY/e2Kub9pAodiYw0Ak2h9gzyJnahEPLQWa9jJjNn0uFrX+6JQdwYHfz/ INE5bpgOHrlwtpDxIh9DYesZxW7567pBMw+CUzAdOKXC3KMVbhSOnNgZ9JzIXg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1661021467; a=rsa-sha256; cv=none; b=cTDm1RRj1ZHjb3wdDPnrYZ/iZfhDLv8niDdWalHwVjCfyWS/29NZR7exuzJ3Cd/NYVfOeS LVt3ekPklV2X0pgOjBnZ8R/mK8rVd9Y1qTpu19MKgRYEpwfCZyJKHrMqYElc3I8lX8WEfO HcE37sK6NkWRiI6Greq8cUVlWQfV5rwHVATltmmIbUAAO5dLXCb87rgHesntwT1LdYKW1G 4meGPMl5aS3SvL1x3GlRmhOkEZVzUyHMZkWqjTX5abdaB6sxzsJKERSImK921Iie5LMQPd iY/gM+OtnZsQpTpgPpZX66wBScRmzhCbgJHCYb79JbkLWy5CSTAQM6K5GdlGRg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2a01:4f9:c011:7a79::1 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Spam-Score: -1.18 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 2a01:4f9:c011:7a79::1 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: C8C61ED20 X-Spam-Score: -1.18 X-Migadu-Scanner: scn1.migadu.com X-TUID: pRgOrlOfU4oo This probably needs a stricter test, perhaps an explicit list of (regexes? for) allowed types. --- lib/index.cc | 23 ++++++++++++++++++++--- test/T050-new.sh | 1 - 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/lib/index.cc b/lib/index.cc index 728bfb22..aca73580 100644 --- a/lib/index.cc +++ b/lib/index.cc @@ -380,6 +380,21 @@ _index_pkcs7_part (notmuch_message_t *message, GMimeObject *part, _notmuch_message_crypto_t *msg_crypto); +static bool _indexable_mime_type (GMimeObject *part) { + GMimeContentType *content_type = g_mime_object_get_content_type (part); + + if (content_type) { + char *mime_string = g_mime_content_type_get_mime_type (content_type); + if (mime_string) { + /* XXX TODO: use a more sensible test, maybe configurable */ + bool ret = (STRNCMP_LITERAL (mime_string, "text/") == 0); + g_free (mime_string); + return ret; + } + } + return false; +} + /* Callback to generate terms for each mime part of a message. */ static void _index_mime_part (notmuch_message_t *message, @@ -497,9 +512,11 @@ _index_mime_part (notmuch_message_t *message, _notmuch_message_add_term (message, "tag", "attachment"); _notmuch_message_gen_terms (message, "attachment", filename); - /* XXX: Would be nice to call out to something here to parse - * the attachment into text and then index that. */ - goto DONE; + if (! _indexable_mime_type (part)) { + /* XXX: Would be nice to call out to something here to parse + * the attachment into text and then index that. */ + goto DONE; + } } byte_array = g_byte_array_new (); diff --git a/test/T050-new.sh b/test/T050-new.sh index cb67889c..dd665de3 100755 --- a/test/T050-new.sh +++ b/test/T050-new.sh @@ -458,7 +458,6 @@ test_expect_equal_file EXPECTED OUTPUT add_email_corpus indexing test_begin_subtest "index text/* attachments" -test_subtest_known_broken notmuch search id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain > EXPECTED notmuch search id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain and ersatz > OUTPUT test_expect_equal_file_nonempty EXPECTED OUTPUT -- 2.35.1