From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id AAE246DE00B4 for ; Mon, 23 Jul 2018 18:49:31 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[AWL=0.011, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mrEn5rXN12D3 for ; Mon, 23 Jul 2018 18:49:30 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id 9FD146DE00AC for ; Mon, 23 Jul 2018 18:49:30 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1fhmS8-0007PJ-15; Mon, 23 Jul 2018 21:49:28 -0400 Received: (nullmailer pid 27325 invoked by uid 1000); Tue, 24 Jul 2018 01:49:23 -0000 From: David Bremner To: Sebastian Poeplau , notmuch@notmuchmail.org Subject: Re: Handling mislabeled emails encoded with Windows-1252 In-Reply-To: <87lgaeat37.fsf@eurecom.fr> References: <87lgaeat37.fsf@eurecom.fr> Date: Tue, 24 Jul 2018 09:49:23 +0800 Message-ID: <8736w91jz0.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Jul 2018 01:49:31 -0000 Sebastian Poeplau writes: > Hi, > > This email is to suggest a minor change in how notmuch handles text > encoding when displaying emails. The motivation is the following: I keep > receiving emails that are encoded with Windows-1252 but claim to be > ISO=C2=A08859-1. The two character sets only differ in the range between = 0x80 > and 0x9F where Windows-1252 contains special characters (e.g. =E2=80=9Cqu= otation > marks=E2=80=9D) while ISO=C2=A08859-1 only has non-printable ones. The mi= slabeling > thus causes some special characters in such emails to be displayed with > a replacement symbol for non-printable characters. Hi Sebastian; Everyone's mail situation is unique, but I haven't noticed this problem. Do you have a mechanical (e.g. scripted) way of detecting such mails? I suppose it could just look for characters in the range 0x80 to 0x95 in allegedly ISO_8859-1 messages. A census of the situation in my own mail would help me think about this problem, I think. David