From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 7A2C1431FAE for ; Thu, 23 Feb 2012 23:36:36 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -2.3 X-Spam-Level: X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L1ZVK0O7IZbh for ; Thu, 23 Feb 2012 23:36:35 -0800 (PST) Received: from max.feld.cvut.cz (max.feld.cvut.cz [147.32.192.36]) by olra.theworths.org (Postfix) with ESMTP id 69351431FBC for ; Thu, 23 Feb 2012 23:36:35 -0800 (PST) Received: from localhost (unknown [192.168.200.4]) by max.feld.cvut.cz (Postfix) with ESMTP id AFFAB19F3399; Fri, 24 Feb 2012 08:36:34 +0100 (CET) X-Virus-Scanned: IMAP AMAVIS Received: from max.feld.cvut.cz ([192.168.200.1]) by localhost (styx.feld.cvut.cz [192.168.200.4]) (amavisd-new, port 10044) with ESMTP id tYAF04f-RUnu; Fri, 24 Feb 2012 08:36:30 +0100 (CET) Received: from imap.feld.cvut.cz (imap.feld.cvut.cz [147.32.192.34]) by max.feld.cvut.cz (Postfix) with ESMTP id CEE1E19F339E; Fri, 24 Feb 2012 08:36:30 +0100 (CET) Received: from steelpick.2x.cz (cable-86-56-3-85.cust.telecolumbus.net [86.56.3.85]) (Authenticated sender: sojkam1) by imap.feld.cvut.cz (Postfix) with ESMTPSA id BA548660969; Fri, 24 Feb 2012 08:36:30 +0100 (CET) Received: from wsh by steelpick.2x.cz with local (Exim 4.77) (envelope-from ) id 1S0phd-0001B1-Vb; Fri, 24 Feb 2012 08:36:29 +0100 From: Michal Sojka To: notmuch@notmuchmail.org Subject: [PATCH 1/2] Convert non-UTF-8 parts to UTF-8 before indexing them Date: Fri, 24 Feb 2012 08:36:22 +0100 Message-Id: <1330068983-4483-1-git-send-email-sojkam1@fel.cvut.cz> X-Mailer: git-send-email 1.7.9.1 In-Reply-To: <1330043595-22054-1-git-send-email-sojkam1@fel.cvut.cz> References: <1330043595-22054-1-git-send-email-sojkam1@fel.cvut.cz> X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 07:36:36 -0000 This fixes a bug that didn't allow to search for non-ASCII words such parts. The code here was copied from show_text_part_content(), because the show command already does the needed conversion when showing the message. --- lib/index.cc | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) diff --git a/lib/index.cc b/lib/index.cc index d8f8b2b..e377732 100644 --- a/lib/index.cc +++ b/lib/index.cc @@ -315,6 +315,7 @@ _index_mime_part (notmuch_message_t *message, GByteArray *byte_array; GMimeContentDisposition *disposition; char *body; + const char *charset; if (! part) { fprintf (stderr, "Warning: Not indexing empty mime part.\n"); @@ -390,6 +391,20 @@ _index_mime_part (notmuch_message_t *message, g_mime_stream_filter_add (GMIME_STREAM_FILTER (filter), discard_uuencode_filter); + charset = g_mime_object_get_content_type_parameter (part, "charset"); + if (charset) { + GMimeFilter *charset_filter; + charset_filter = g_mime_filter_charset_new (charset, "UTF-8"); + /* This result can be NULL for things like "unknown-8bit". + * Don't set a NULL filter as that makes GMime print + * annoying assertion-failure messages on stderr. */ + if (charset_filter) { + g_mime_stream_filter_add (GMIME_STREAM_FILTER (filter), + charset_filter); + g_object_unref (charset_filter); + } + } + wrapper = g_mime_part_get_content_object (GMIME_PART (part)); if (wrapper) g_mime_data_wrapper_write_to_stream (wrapper, filter); -- 1.7.9.1