From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 8FFA66DE0219 for ; Tue, 24 Jul 2018 07:19:22 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.006 X-Spam-Level: X-Spam-Status: No, score=-0.006 tagged_above=-999 required=5 tests=[AWL=0.005, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XvZqJqbL2q3o for ; Tue, 24 Jul 2018 07:19:21 -0700 (PDT) Received: from smtp.eurecom.fr (smtp.eurecom.fr [193.55.113.210]) by arlo.cworth.org (Postfix) with ESMTP id 961CB6DE01FF for ; Tue, 24 Jul 2018 07:19:21 -0700 (PDT) X-IronPort-AV: E=Sophos;i="5.51,398,1526335200"; d="scan'208";a="7936886" Received: from waha.eurecom.fr (HELO smtps.eurecom.fr) ([10.3.2.236]) by drago1i.eurecom.fr with ESMTP; 24 Jul 2018 16:19:20 +0200 Received: from archibald (unknown [193.55.114.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtps.eurecom.fr (Postfix) with ESMTPSA id B06EAFEC; Tue, 24 Jul 2018 16:19:19 +0200 (CEST) From: Sebastian Poeplau To: Jeffrey Stedfast , "notmuch\@notmuchmail.org" Subject: Re: Handling mislabeled emails encoded with Windows-1252 In-Reply-To: References: <87lgaeat37.fsf@eurecom.fr> <8736w91jz0.fsf@tethera.net> Date: Tue, 24 Jul 2018 16:19:20 +0200 Message-ID: <87effszpg7.fsf@eurecom.fr> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Jul 2018 14:19:22 -0000 Hi Jeff, > GMime actually comes with a stream filter (GMimeFilterWindows) which can auto-detect this situation. > > In this particular case, you'd instantiate the GMimeFilterWindows like this: > > filter = g_mime_filter_windows_new ("iso-8859-1"); > > "iso-8859-1" being the charset that the content claims to be in. > > Then you'd pipe the raw (decoded but not converted to utf-8) content though the filter and afterward call g_mime_filter_windows_real_charset (filter) which would return, in this user's case, "windows-1252". Nice, this is exactly what I was looking for! Somehow I missed it when checking GMime. I'll adapt my local fix and post the results here. Thanks, Sebastian