From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Garjola Dindi Newsgroups: gmane.emacs.help Subject: Incorrect rendering of accented characters in HTML e-mail (Gnus) Date: Sat, 10 Oct 2020 15:34:02 +0200 Message-ID: <87362mp5md.fsf@pc-117-162.ovh.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="4932"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Cancel-Lock: sha1:+C0NiI8YWslHKoILlO/zcdHR5cY= Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sat Oct 10 15:40:44 2020 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kRF75-0001Af-Sd for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 10 Oct 2020 15:40:43 +0200 Original-Received: from localhost ([::1]:35944 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kRF74-0003Wn-TM for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 10 Oct 2020 09:40:42 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37682) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRF6V-0003VT-C0 for help-gnu-emacs@gnu.org; Sat, 10 Oct 2020 09:40:07 -0400 Original-Received: from static.214.254.202.116.clients.your-server.de ([116.202.254.214]:49068 helo=ciao.gmane.io) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRF6T-0002G9-Jz for help-gnu-emacs@gnu.org; Sat, 10 Oct 2020 09:40:07 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1kRF6P-0000Rn-PR for help-gnu-emacs@gnu.org; Sat, 10 Oct 2020 15:40:01 +0200 X-Injected-Via-Gmane: http://gmane.org/ Received-SPF: pass client-ip=116.202.254.214; envelope-from=geh-help-gnu-emacs@m.gmane-mx.org; helo=ciao.gmane.io X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/10 09:17:45 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: 8 X-Spam_score: 0.8 X-Spam_bar: / X-Spam_report: (0.8 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, TO_NO_BRKTS_PCNT=2.499 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:124410 Archived-At: Hi, I posted this on gmane.emacs.gnus.user and gmane.emacs.gnus.general several days ago, but didn't get any reply. I hope to have some luck here. I am having a problem with reading html e-mail in Gnus: accented characters appear with an incorrect encoding. For instance, "é" (e with acute accent) will appear as "i". The funny part comes now. If I edit the article with gnus-summary-edit-article and just press C-c C-c (that is, I don't do any edits) the characters are displayed correctly. So right now, I use this ,----[ emacs-lisp ] | (defun my/correct-message-encoding-by-dummy-edit () | (interactive) | (progn | (gnus-summary-select-article-buffer) | (gnus-summary-edit-article) | (gnus-article-edit-done))) | | (define-key gnus-summary-mode-map (kbd " ") | 'my/correct-message-encoding-by-dummy-edit) `---- to quickly «wash» the articles. If I use describe-char to inspect the characters, I get this before «washing»: ,---- | position: 470 of 867 (54%), column: 30 | character: i (displayed as i) (codepoint 105, #o151, #x69) | charset: ascii (ASCII (ISO646 IRV)) | code point in charset: 0x69 | script: latin | syntax: w which means: word | | category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, r:Roman | to input: type "C-x 8 RET 69" or "C-x 8 RET LATIN SMALL LETTER I" | buffer code: #x69 | file code: #x69 (encoded by coding system utf-8-unix) | display: by this font (glyph code) | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#x4C | | Character code properties: customize what to show | name: LATIN SMALL LETTER I | general-category: Ll (Letter, Lowercase) | decomposition: (105) ('i') | | There is an overlay here: | From 440 to 520 | face hl-line | priority -50 | window # | | | There are text properties here: | face variable-pitch `---- And this after «washing» ,---- | position: 472 of 871 (54%), column: 30 | character: é (displayed as é) (codepoint 233, #o351, #xe9) | charset: unicode (Unicode (ISO10646)) | code point in charset: 0xE9 | script: latin | syntax: w which means: word | category: .:Base, L:Left-to-right (strong), c:Chinese, j:Japanese, l:Latin, | v:Viet | to input: type "C-x 8 RET e9" or "C-x 8 RET LATIN SMALL LETTER E WITH ACUTE" | buffer code: #xC3 #xA9 | file code: #xC3 #xA9 (encoded by coding system utf-8-unix) | display: by this font (glyph code) | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#xAB) | | Character code properties: customize what to show | name: LATIN SMALL LETTER E WITH ACUTE | old-name: LATIN SMALL LETTER E ACUTE | general-category: Ll (Letter, Lowercase) | decomposition: (101 769) ('e' '́') | | There is an overlay here: | From 442 to 523 | face hl-line | priority -50 | window # | | | There are text properties here: | face variable-pitch `---- The html part of the e-mails contains ,---- | < #part type=text/plain format="flowed" charset="utf-8" | disposition=inline nofile=yes> `---- so I guess that the html renderer should pick it up. I have tested shr, gnus-w3m and w3m and I always get the same result. I would be grateful if somebody could help me understand what happens. Thank you. -- -