From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Garjola Dindi Newsgroups: gmane.emacs.help Subject: Re: Incorrect rendering of accented characters in HTML e-mail (Gnus) Date: Sat, 10 Oct 2020 16:35:05 +0200 Message-ID: <87tuv287za.fsf@pc-117-162.ovh.com> References: <87362mp5md.fsf@pc-117-162.ovh.com> <83v9fi41ux.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="39311"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Cancel-Lock: sha1:MPOqLxw/w9NlQ4Qs+G22Vj3SmO8= Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Sat Oct 10 16:36:54 2020 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kRFzS-000AAQ-9c for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 10 Oct 2020 16:36:54 +0200 Original-Received: from localhost ([::1]:44642 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kRFzR-0000aH-CF for geh-help-gnu-emacs@m.gmane-mx.org; Sat, 10 Oct 2020 10:36:53 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48980) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRFxu-0000P0-BA for help-gnu-emacs@gnu.org; Sat, 10 Oct 2020 10:35:18 -0400 Original-Received: from static.214.254.202.116.clients.your-server.de ([116.202.254.214]:42046 helo=ciao.gmane.io) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRFxr-0000SS-VV for help-gnu-emacs@gnu.org; Sat, 10 Oct 2020 10:35:18 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1kRFxm-0008Cz-Aw for help-gnu-emacs@gnu.org; Sat, 10 Oct 2020 16:35:10 +0200 X-Injected-Via-Gmane: http://gmane.org/ Received-SPF: pass client-ip=116.202.254.214; envelope-from=geh-help-gnu-emacs@m.gmane-mx.org; helo=ciao.gmane.io X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/10 09:17:45 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -15 X-Spam_score: -1.6 X-Spam_bar: - X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:124412 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit On Sat 10-Oct-2020 at 16:00:54 +02, Eli Zaretskii wrote: >> From: Garjola Dindi >> Date: Sat, 10 Oct 2020 15:34:02 +0200 >> >> If I use describe-char to inspect the characters, I get this before >> «washing»: >> >> ,---- >> | position: 470 of 867 (54%), column: 30 >> | character: i (displayed as i) (codepoint 105, #o151, #x69) >> | charset: ascii (ASCII (ISO646 IRV)) >> | code point in charset: 0x69 >> | script: latin >> | syntax: w which means: word | >> | category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, r:Roman >> | to input: type "C-x 8 RET 69" or "C-x 8 RET LATIN SMALL LETTER I" >> | buffer code: #x69 >> | file code: #x69 (encoded by coding system utf-8-unix) >> | display: by this font (glyph code) >> | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#x4C >> | >> | Character code properties: customize what to show >> | name: LATIN SMALL LETTER I >> | general-category: Ll (Letter, Lowercase) >> | decomposition: (105) ('i') >> | >> | There is an overlay here: >> | From 440 to 520 >> | face hl-line >> | priority -50 >> | window # >> | >> | >> | There are text properties here: >> | face variable-pitch >> `---- >> >> And this after «washing» >> >> ,---- >> | position: 472 of 871 (54%), column: 30 >> | character: é (displayed as é) (codepoint 233, #o351, #xe9) >> | charset: unicode (Unicode (ISO10646)) >> | code point in charset: 0xE9 >> | script: latin >> | syntax: w which means: word >> | category: .:Base, L:Left-to-right (strong), c:Chinese, j:Japanese, l:Latin, >> | v:Viet >> | to input: type "C-x 8 RET e9" or "C-x 8 RET LATIN SMALL LETTER E WITH ACUTE" >> | buffer code: #xC3 #xA9 >> | file code: #xC3 #xA9 (encoded by coding system utf-8-unix) >> | display: by this font (glyph code) >> | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#xAB) >> | >> | Character code properties: customize what to show >> | name: LATIN SMALL LETTER E WITH ACUTE >> | old-name: LATIN SMALL LETTER E ACUTE >> | general-category: Ll (Letter, Lowercase) >> | decomposition: (101 769) ('e' '́') >> | >> | There is an overlay here: >> | From 442 to 523 >> | face hl-line >> | priority -50 >> | window # >> | >> | >> | There are text properties here: >> | face variable-pitch >> `---- >> >> The html part of the e-mails contains >> >> ,---- >> | < #part type=text/plain format="flowed" charset="utf-8" >> | disposition=inline nofile=yes> >> `---- >> >> so I guess that the html renderer should pick it up. I have tested shr, >> gnus-w3m and w3m and I always get the same result. >> >> I would be grateful if somebody could help me understand what happens. > > How does the character appear in the original HTML? Thanks for your quick response. I don't know if I am inspecting the message correctly, because when I enter the edit mode, all characters appear OK. Therefore, I am not sure if I an seeing the original html. I have also noticed that the I also have the same issue with non html e-mails. I thought they were html, but they are just multipart. For instance, here is what I see in the article buffer: ,---- | \311lodie, qui a rejoint l'\351quipe podcast, me dit que sa soeur, qui a une | formation th\351\342trale, serait disponible ponctuellement pour faire des | voix pour des lectures. Pour le moment on a jamais eu ce besoin mais \347a | peut ouvrir des perspectives. `---- (I have replaced the non printable chars with \xxx) and here is what I see in edit mode: ,---- | --=-=-= Content-Type: multipart/mixed; boundary="==-=-=" --==-=-= Content-Type: text/plain | --==-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit | | Élodie, qui a rejoint l'équipe podcast, me dit que sa soeur, qui a une | formation théâtrale, serait disponible ponctuellement pour faire des | voix pour des lectures. Pour le moment on a jamais eu ce besoin mais ça | peut ouvrir des perspectives. | --==-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit | Pour connaître la configuration de la liste, gérer votre abonnement à la | liste et vos informations personnelles : | https://listes.april.org/wws/info/libreavous | --==-=-=-- --=-=-= Content-Type: text/plain `---- Again, when I quit the edit mode, the article buffer displays things correctly. In the case of html I have for instace this in the article buffer: ,---- | Les groupes suprimacistes blancs ont profiti du mandat de Donald Trump et des ... `---- and this in the edit mode buffer ,---- | --=-=-= Content-Type: text/plain | --=-=-= Content-Type: text/plain `---- So now I think this is not due to html, but to multipart MIME. Thanks again for your help. --=-=-=--