unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Incorrect rendering of accented characters in HTML e-mail (Gnus)
@ 2020-10-10 13:34 Garjola Dindi
  2020-10-10 14:00 ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Garjola Dindi @ 2020-10-10 13:34 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

I posted this on gmane.emacs.gnus.user and gmane.emacs.gnus.general
several days ago, but didn't get any reply. I hope to have some luck
here. 

I am having a problem with reading html e-mail in Gnus: accented
characters appear with an incorrect encoding. For instance, "é" (e with
acute accent) will appear as "i".

The funny part comes now. If I edit the article with
gnus-summary-edit-article and just press C-c C-c (that is, I don't do
any edits) the characters are displayed correctly.

So right now, I use this

,----[ emacs-lisp ]
| (defun my/correct-message-encoding-by-dummy-edit ()
|   (interactive)
|   (progn
|     (gnus-summary-select-article-buffer)
|     (gnus-summary-edit-article)
|     (gnus-article-edit-done)))
| 
| (define-key gnus-summary-mode-map (kbd "<f8> <f8>")
| 'my/correct-message-encoding-by-dummy-edit)
`----

to quickly «wash» the articles. 

If I use describe-char to inspect the characters, I get this before
«washing»:

,----
| position: 470 of 867 (54%), column: 30                                     
| character: i (displayed as i) (codepoint 105, #o151, #x69)                 
| charset: ascii (ASCII (ISO646 IRV))                                        
| code point in charset: 0x69                                                
| script: latin                                                              
| syntax: w 	which means: word                                          |
| category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, r:Roman      
| to input: type "C-x 8 RET 69" or "C-x 8 RET LATIN SMALL LETTER I"          
| buffer code: #x69                                                          
| file code: #x69 (encoded by coding system utf-8-unix)                      
| display: by this font (glyph code)                                         
| ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#x4C
|                                                                            
| Character code properties: customize what to show                          
| name: LATIN SMALL LETTER I                                                 
| general-category: Ll (Letter, Lowercase)                                   
| decomposition: (105) ('i')                                                 
|                                                                            
| There is an overlay here:                                                  
| From 440 to 520                                                            
| face                 hl-line                                               
| priority             -50                                                   
| window               #<window 141 on *Article nnmaildir+RSSFeeds:ABlog*>   
|                                                                            
|                                                                            
| There are text properties here:                                            
| face                 variable-pitch                                        
`----

And this after «washing»

,----
| position: 472 of 871 (54%), column: 30                                      
| character: é (displayed as é) (codepoint 233, #o351, #xe9)                  
| charset: unicode (Unicode (ISO10646))                                       
| code point in charset: 0xE9                                                 
| script: latin                                                               
| syntax: w 	which means: word                                             
| category: .:Base, L:Left-to-right (strong), c:Chinese, j:Japanese, l:Latin, 
| v:Viet
| to input: type "C-x 8 RET e9" or "C-x 8 RET LATIN SMALL LETTER E WITH ACUTE"
| buffer code: #xC3 #xA9                                                      
| file code: #xC3 #xA9 (encoded by coding system utf-8-unix)                  
| display: by this font (glyph code)                                          
| ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#xAB)
|                                                                             
| Character code properties: customize what to show                           
| name: LATIN SMALL LETTER E WITH ACUTE                                       
| old-name: LATIN SMALL LETTER E ACUTE                                        
| general-category: Ll (Letter, Lowercase)                                    
| decomposition: (101 769) ('e' '́')                                           
|                                                                             
| There is an overlay here:                                                   
| From 442 to 523                                                             
| face                 hl-line                                                
| priority             -50                                                    
| window               #<window 155 on *Article nnmaildir+RSSFeeds:ABlog*>    
|                                                                             
|                                                                             
| There are text properties here:                                             
| face                 variable-pitch                                         
`----

The html part of the e-mails contains

,----
| < #part type=text/plain format="flowed" charset="utf-8"
| disposition=inline nofile=yes>
`----

so I guess that the html renderer should pick it up. I have tested shr,
gnus-w3m and w3m and I always get the same result.

I would be grateful if somebody could help me understand what happens.

Thank you.
-- 
- 




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-01-21  7:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-10 13:34 Incorrect rendering of accented characters in HTML e-mail (Gnus) Garjola Dindi
2020-10-10 14:00 ` Eli Zaretskii
2020-10-10 14:35   ` Garjola Dindi
2020-10-10 14:44     ` Eli Zaretskii
2020-10-10 15:53       ` Garjola Dindi
2020-10-10 16:12         ` Eli Zaretskii
2020-10-10 20:10           ` Garjola Dindi
2020-10-11  7:15     ` Damien Collard
2020-10-11 10:27       ` Garjola Dindi
2020-10-11 11:27       ` Garjola Dindi
2020-10-11 15:26         ` Damien Collard
2020-10-12 12:01           ` Garjola Dindi
2021-01-21  7:52           ` Garjola Dindi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).