From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id E8602431FBD for ; Sat, 25 Feb 2012 21:18:19 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k9bubM7SlLN4 for ; Sat, 25 Feb 2012 21:18:19 -0800 (PST) Received: from dmz-mailsec-scanner-3.mit.edu (DMZ-MAILSEC-SCANNER-3.MIT.EDU [18.9.25.14]) by olra.theworths.org (Postfix) with ESMTP id EE42A431FAE for ; Sat, 25 Feb 2012 21:18:18 -0800 (PST) X-AuditID: 1209190e-b7f7c6d0000008c3-87-4f48648c7ae1 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-3.mit.edu (Symantec Messaging Gateway) with SMTP id F9.88.02243.C84684F4; Fri, 24 Feb 2012 23:33:16 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id q1P4XFM3026899; Fri, 24 Feb 2012 23:33:15 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q1P4XCQo025930 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Fri, 24 Feb 2012 23:33:14 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1S19Jo-00013e-Hc; Fri, 24 Feb 2012 23:33:12 -0500 Date: Fri, 24 Feb 2012 23:33:12 -0500 From: Austin Clements To: Michal Sojka Subject: Re: [PATCH 1/2] Convert non-UTF-8 parts to UTF-8 before indexing them Message-ID: <20120225043312.GL30513@mit.edu> References: <1330043595-22054-1-git-send-email-sojkam1@fel.cvut.cz> <1330068983-4483-1-git-send-email-sojkam1@fel.cvut.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1330068983-4483-1-git-send-email-sojkam1@fel.cvut.cz> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpileLIzCtJLcpLzFFi42IRYrdT1+1J8fA32P9SzOL6zZnMFjevTmJz YPL48ucDq8ezVbeYA5iiuGxSUnMyy1KL9O0SuDLeT53HXLCHr+LMpd2MDYzbuLsYOTkkBEwk Dt3cwwZhi0lcuLceyObiEBLYxyjx8et2JpCEkMAGRonla4IgEieZJH4c2M0O4SxhlLh98Q6Q w8HBIqAqceJWEkgDm4CGxLb9yxlBbBEBNYnuBSvANjALSEt8+90MNlRYwE9i5cN7YHFeAR2J plV9zBDLaiQezpzJBBEXlDg58wkLRK+WxI1/L5lAVoHMWf6PAyTMKeAsse7fL7AxogIqElNO bmObwCg0C0n3LCTdsxC6FzAyr2KUTcmt0s1NzMwpTk3WLU5OzMtLLdI11svNLNFLTSndxAgK ak5Jvh2MXw8qHWIU4GBU4uFl3uLuL8SaWFZcmXuIUZKDSUmUtz7Zw1+ILyk/pTIjsTgjvqg0 J7X4EKMEB7OSCK8dG1CONyWxsiq1KB8mJc3BoiTOq6b1zk9IID2xJDU7NbUgtQgmK8PBoSTB 2wwyVLAoNT21Ii0zpwQhzcTBCTKcB2j4IZAa3uKCxNzizHSI/ClGRSlx3pkgCQGQREZpHlwv LOm8YhQHekWYdz5IFQ8wYcF1vwIazAQ02P6vK8jgkkSElFQDY1vhl5bN59Yc2FPiuKzWch1/ 2wHvTzM073/vdDI6k8Zw4zLrv1sxf6q3qSus/bT9Tu3nnGNWmVEm0d/cDi2Q99hizfops3li VYEly52Cg5dkn/Cn/hZQW2K6/3Ewm97O4o3qLBNPSESvUH5m+fF75lM2t8VePOuXPsvcaLT9 1ZQg+7z0trtVHEosxRmJhlrMRcWJAFGIVhcVAwAA Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Feb 2012 05:18:20 -0000 LGTM. I'm assuming this interacts with the uuencoding filter in the right order (I don't see how any other order could be correct), but don't actually know. Quoth Michal Sojka on Feb 24 at 8:36 am: > This fixes a bug that didn't allow to search for non-ASCII words such > parts. The code here was copied from show_text_part_content(), because > the show command already does the needed conversion when showing the > message. > --- > lib/index.cc | 15 +++++++++++++++ > 1 files changed, 15 insertions(+), 0 deletions(-) > > diff --git a/lib/index.cc b/lib/index.cc > index d8f8b2b..e377732 100644 > --- a/lib/index.cc > +++ b/lib/index.cc > @@ -315,6 +315,7 @@ _index_mime_part (notmuch_message_t *message, > GByteArray *byte_array; > GMimeContentDisposition *disposition; > char *body; > + const char *charset; > > if (! part) { > fprintf (stderr, "Warning: Not indexing empty mime part.\n"); > @@ -390,6 +391,20 @@ _index_mime_part (notmuch_message_t *message, > g_mime_stream_filter_add (GMIME_STREAM_FILTER (filter), > discard_uuencode_filter); > > + charset = g_mime_object_get_content_type_parameter (part, "charset"); > + if (charset) { > + GMimeFilter *charset_filter; > + charset_filter = g_mime_filter_charset_new (charset, "UTF-8"); > + /* This result can be NULL for things like "unknown-8bit". > + * Don't set a NULL filter as that makes GMime print > + * annoying assertion-failure messages on stderr. */ > + if (charset_filter) { > + g_mime_stream_filter_add (GMIME_STREAM_FILTER (filter), > + charset_filter); > + g_object_unref (charset_filter); > + } > + } > + > wrapper = g_mime_part_get_content_object (GMIME_PART (part)); > if (wrapper) > g_mime_data_wrapper_write_to_stream (wrapper, filter);