unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Cc: jwilk@jwilk.net
Subject: [PATCH 3/3] WIP/lib: index all text/* attachements.
Date: Sat, 20 Aug 2022 11:50:07 -0700	[thread overview]
Message-ID: <20220820185007.289543-4-david@tethera.net> (raw)
In-Reply-To: <20220820185007.289543-1-david@tethera.net>

This probably needs a stricter test, perhaps an explicit list
of (regexes? for) allowed types.
---
 lib/index.cc     | 23 ++++++++++++++++++++---
 test/T050-new.sh |  1 -
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/lib/index.cc b/lib/index.cc
index 728bfb22..aca73580 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -380,6 +380,21 @@ _index_pkcs7_part (notmuch_message_t *message,
 		   GMimeObject *part,
 		   _notmuch_message_crypto_t *msg_crypto);
 
+static bool _indexable_mime_type (GMimeObject *part) {
+    GMimeContentType *content_type = g_mime_object_get_content_type (part);
+
+    if (content_type) {
+	char *mime_string = g_mime_content_type_get_mime_type (content_type);
+	if (mime_string) {
+	    /* XXX TODO: use a more sensible test, maybe configurable */
+	    bool ret = (STRNCMP_LITERAL (mime_string, "text/") == 0);
+	    g_free (mime_string);
+	    return ret;
+	}
+    }
+    return false;
+}
+
 /* Callback to generate terms for each mime part of a message. */
 static void
 _index_mime_part (notmuch_message_t *message,
@@ -497,9 +512,11 @@ _index_mime_part (notmuch_message_t *message,
 	_notmuch_message_add_term (message, "tag", "attachment");
 	_notmuch_message_gen_terms (message, "attachment", filename);
 
-	/* XXX: Would be nice to call out to something here to parse
-	 * the attachment into text and then index that. */
-	goto DONE;
+	if (! _indexable_mime_type (part)) {
+	    /* XXX: Would be nice to call out to something here to parse
+	     * the attachment into text and then index that. */
+	    goto DONE;
+	}
     }
 
     byte_array = g_byte_array_new ();
diff --git a/test/T050-new.sh b/test/T050-new.sh
index cb67889c..dd665de3 100755
--- a/test/T050-new.sh
+++ b/test/T050-new.sh
@@ -458,7 +458,6 @@ test_expect_equal_file EXPECTED OUTPUT
 add_email_corpus indexing
 
 test_begin_subtest "index text/* attachments"
-test_subtest_known_broken
 notmuch search id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain > EXPECTED
 notmuch search id:20200930101213.2m2pt3jrspvcrxfx@localhost.localdomain and ersatz > OUTPUT
 test_expect_equal_file_nonempty EXPECTED OUTPUT
-- 
2.35.1

      parent reply	other threads:[~2022-08-20 18:51 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-20 18:50 WIP: index text attachments David Bremner
2022-08-20 18:50 ` [PATCH 1/3] test: rename indexing corpus David Bremner
2022-09-03 12:10   ` David Bremner
2022-08-20 18:50 ` [PATCH 2/3] test: add known broken test for indexing text/* attachments David Bremner
2022-08-20 18:50 ` David Bremner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220820185007.289543-4-david@tethera.net \
    --to=david@tethera.net \
    --cc=jwilk@jwilk.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).