From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Incorrect rendering of accented characters in HTML e-mail (Gnus) Date: Sat, 10 Oct 2020 17:00:54 +0300 Message-ID: <83v9fi41ux.fsf@gnu.org> References: <87362mp5md.fsf@pc-117-162.ovh.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36002"; mail-complaints-to="usenet@ciao.gmane.io" To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sat Oct 10 16:07:04 2020 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kRFWa-0009HT-ID for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 10 Oct 2020 16:07:04 +0200 Original-Received: from localhost ([::1]:49874 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kRFWZ-0005xq-Io for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 10 Oct 2020 10:07:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41264) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRFQc-0006iU-EM for help-gnu-emacs@gnu.org; Sat, 10 Oct 2020 10:00:58 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:37157) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kRFQc-0004f1-1H for help-gnu-emacs@gnu.org; Sat, 10 Oct 2020 10:00:54 -0400 Original-Received: from [176.228.60.248] (port=4420 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kRFQa-0005EU-B6 for help-gnu-emacs@gnu.org; Sat, 10 Oct 2020 10:00:53 -0400 In-Reply-To: <87362mp5md.fsf@pc-117-162.ovh.com> (message from Garjola Dindi on Sat, 10 Oct 2020 15:34:02 +0200) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:124411 Archived-At: > From: Garjola Dindi > Date: Sat, 10 Oct 2020 15:34:02 +0200 > > If I use describe-char to inspect the characters, I get this before > «washing»: > > ,---- > | position: 470 of 867 (54%), column: 30 > | character: i (displayed as i) (codepoint 105, #o151, #x69) > | charset: ascii (ASCII (ISO646 IRV)) > | code point in charset: 0x69 > | script: latin > | syntax: w which means: word | > | category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, r:Roman > | to input: type "C-x 8 RET 69" or "C-x 8 RET LATIN SMALL LETTER I" > | buffer code: #x69 > | file code: #x69 (encoded by coding system utf-8-unix) > | display: by this font (glyph code) > | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#x4C > | > | Character code properties: customize what to show > | name: LATIN SMALL LETTER I > | general-category: Ll (Letter, Lowercase) > | decomposition: (105) ('i') > | > | There is an overlay here: > | From 440 to 520 > | face hl-line > | priority -50 > | window # > | > | > | There are text properties here: > | face variable-pitch > `---- > > And this after «washing» > > ,---- > | position: 472 of 871 (54%), column: 30 > | character: é (displayed as é) (codepoint 233, #o351, #xe9) > | charset: unicode (Unicode (ISO10646)) > | code point in charset: 0xE9 > | script: latin > | syntax: w which means: word > | category: .:Base, L:Left-to-right (strong), c:Chinese, j:Japanese, l:Latin, > | v:Viet > | to input: type "C-x 8 RET e9" or "C-x 8 RET LATIN SMALL LETTER E WITH ACUTE" > | buffer code: #xC3 #xA9 > | file code: #xC3 #xA9 (encoded by coding system utf-8-unix) > | display: by this font (glyph code) > | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#xAB) > | > | Character code properties: customize what to show > | name: LATIN SMALL LETTER E WITH ACUTE > | old-name: LATIN SMALL LETTER E ACUTE > | general-category: Ll (Letter, Lowercase) > | decomposition: (101 769) ('e' '́') > | > | There is an overlay here: > | From 442 to 523 > | face hl-line > | priority -50 > | window # > | > | > | There are text properties here: > | face variable-pitch > `---- > > The html part of the e-mails contains > > ,---- > | < #part type=text/plain format="flowed" charset="utf-8" > | disposition=inline nofile=yes> > `---- > > so I guess that the html renderer should pick it up. I have tested shr, > gnus-w3m and w3m and I always get the same result. > > I would be grateful if somebody could help me understand what happens. How does the character appear in the original HTML?