From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 886236DE01F7 for ; Mon, 30 Jul 2018 00:29:01 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.006 X-Spam-Level: X-Spam-Status: No, score=-0.006 tagged_above=-999 required=5 tests=[AWL=0.005, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id M9HS5UV0rF9b for ; Mon, 30 Jul 2018 00:29:00 -0700 (PDT) Received: from smtp.eurecom.fr (smtp.eurecom.fr [193.55.113.210]) by arlo.cworth.org (Postfix) with ESMTP id CD3346DE01E1 for ; Mon, 30 Jul 2018 00:28:59 -0700 (PDT) X-IronPort-AV: E=Sophos;i="5.51,422,1526335200"; d="scan'208";a="7973915" Received: from waha.eurecom.fr (HELO smtps.eurecom.fr) ([10.3.2.236]) by drago1i.eurecom.fr with ESMTP; 30 Jul 2018 09:28:56 +0200 Received: from archibald (unknown [193.55.114.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtps.eurecom.fr (Postfix) with ESMTPSA id 50E004DA; Mon, 30 Jul 2018 09:28:56 +0200 (CEST) From: Sebastian Poeplau To: Jeffrey Stedfast , "notmuch\@notmuchmail.org" Subject: Re: Handling mislabeled emails encoded with Windows-1252 In-Reply-To: <9C0F603A-6125-4CF0-8AE7-E02301355906@microsoft.com> References: <87lgaeat37.fsf@eurecom.fr> <8736w91jz0.fsf@tethera.net> <87effszpg7.fsf@eurecom.fr> <87zhyby589.fsf@eurecom.fr> <9C0F603A-6125-4CF0-8AE7-E02301355906@microsoft.com> Date: Mon, 30 Jul 2018 09:28:57 +0200 Message-ID: <87wotdxjuu.fsf@eurecom.fr> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jul 2018 07:29:01 -0000 Hi, > Yes, that looks good. I would have probably unreffed the null_stream > and null_stream_filter inside of that if-block rather than at the end > of the function, but that's a stylistic issue that the notmuch authors > can comment on. The patch as it stands should work correctly from what > I can tell __ I was worried about the string returned by g_mime_filter_windows_real_charset: once I unref everything, isn't there a risk of the filter being deleted? As far as I can tell from the code, the returned charset might be a pointer into the filter object... > As an added optimization, you could try limiting that block of code to > just when the charset is one of the iso-8859-* charsets. > > The following code snippet should help with that: > > charset = charset ? g_mime_charset_canon_name (charset) : NULL; > if (wrapper && charset && g_ascii_strncasecmp (charset, "iso-8859-", 9)) { > ... > > The reason you need to use g_mime_charset_canon_name (if you decide to > add the optimization) is that mail software does not always use the > canonical form of the various charset names that they use. Often you > will get stuff like "latin1" or "iso_8859-1". Nice, I'll add it. Thanks a lot, Sebastian