all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Incorrect rendering of accented characters in HTML e-mail (Gnus)
@ 2020-10-10 13:34 Garjola Dindi
  2020-10-10 14:00 ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Garjola Dindi @ 2020-10-10 13:34 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

I posted this on gmane.emacs.gnus.user and gmane.emacs.gnus.general
several days ago, but didn't get any reply. I hope to have some luck
here. 

I am having a problem with reading html e-mail in Gnus: accented
characters appear with an incorrect encoding. For instance, "é" (e with
acute accent) will appear as "i".

The funny part comes now. If I edit the article with
gnus-summary-edit-article and just press C-c C-c (that is, I don't do
any edits) the characters are displayed correctly.

So right now, I use this

,----[ emacs-lisp ]
| (defun my/correct-message-encoding-by-dummy-edit ()
|   (interactive)
|   (progn
|     (gnus-summary-select-article-buffer)
|     (gnus-summary-edit-article)
|     (gnus-article-edit-done)))
| 
| (define-key gnus-summary-mode-map (kbd "<f8> <f8>")
| 'my/correct-message-encoding-by-dummy-edit)
`----

to quickly «wash» the articles. 

If I use describe-char to inspect the characters, I get this before
«washing»:

,----
| position: 470 of 867 (54%), column: 30                                     
| character: i (displayed as i) (codepoint 105, #o151, #x69)                 
| charset: ascii (ASCII (ISO646 IRV))                                        
| code point in charset: 0x69                                                
| script: latin                                                              
| syntax: w 	which means: word                                          |
| category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, r:Roman      
| to input: type "C-x 8 RET 69" or "C-x 8 RET LATIN SMALL LETTER I"          
| buffer code: #x69                                                          
| file code: #x69 (encoded by coding system utf-8-unix)                      
| display: by this font (glyph code)                                         
| ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#x4C
|                                                                            
| Character code properties: customize what to show                          
| name: LATIN SMALL LETTER I                                                 
| general-category: Ll (Letter, Lowercase)                                   
| decomposition: (105) ('i')                                                 
|                                                                            
| There is an overlay here:                                                  
| From 440 to 520                                                            
| face                 hl-line                                               
| priority             -50                                                   
| window               #<window 141 on *Article nnmaildir+RSSFeeds:ABlog*>   
|                                                                            
|                                                                            
| There are text properties here:                                            
| face                 variable-pitch                                        
`----

And this after «washing»

,----
| position: 472 of 871 (54%), column: 30                                      
| character: é (displayed as é) (codepoint 233, #o351, #xe9)                  
| charset: unicode (Unicode (ISO10646))                                       
| code point in charset: 0xE9                                                 
| script: latin                                                               
| syntax: w 	which means: word                                             
| category: .:Base, L:Left-to-right (strong), c:Chinese, j:Japanese, l:Latin, 
| v:Viet
| to input: type "C-x 8 RET e9" or "C-x 8 RET LATIN SMALL LETTER E WITH ACUTE"
| buffer code: #xC3 #xA9                                                      
| file code: #xC3 #xA9 (encoded by coding system utf-8-unix)                  
| display: by this font (glyph code)                                          
| ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#xAB)
|                                                                             
| Character code properties: customize what to show                           
| name: LATIN SMALL LETTER E WITH ACUTE                                       
| old-name: LATIN SMALL LETTER E ACUTE                                        
| general-category: Ll (Letter, Lowercase)                                    
| decomposition: (101 769) ('e' '́')                                           
|                                                                             
| There is an overlay here:                                                   
| From 442 to 523                                                             
| face                 hl-line                                                
| priority             -50                                                    
| window               #<window 155 on *Article nnmaildir+RSSFeeds:ABlog*>    
|                                                                             
|                                                                             
| There are text properties here:                                             
| face                 variable-pitch                                         
`----

The html part of the e-mails contains

,----
| < #part type=text/plain format="flowed" charset="utf-8"
| disposition=inline nofile=yes>
`----

so I guess that the html renderer should pick it up. I have tested shr,
gnus-w3m and w3m and I always get the same result.

I would be grateful if somebody could help me understand what happens.

Thank you.
-- 
- 




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-10 13:34 Incorrect rendering of accented characters in HTML e-mail (Gnus) Garjola Dindi
@ 2020-10-10 14:00 ` Eli Zaretskii
  2020-10-10 14:35   ` Garjola Dindi
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2020-10-10 14:00 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Garjola Dindi <garjola@garjola.net>
> Date: Sat, 10 Oct 2020 15:34:02 +0200
> 
> If I use describe-char to inspect the characters, I get this before
> «washing»:
> 
> ,----
> | position: 470 of 867 (54%), column: 30                                     
> | character: i (displayed as i) (codepoint 105, #o151, #x69)                 
> | charset: ascii (ASCII (ISO646 IRV))                                        
> | code point in charset: 0x69                                                
> | script: latin                                                              
> | syntax: w 	which means: word                                          |
> | category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, r:Roman      
> | to input: type "C-x 8 RET 69" or "C-x 8 RET LATIN SMALL LETTER I"          
> | buffer code: #x69                                                          
> | file code: #x69 (encoded by coding system utf-8-unix)                      
> | display: by this font (glyph code)                                         
> | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#x4C
> |                                                                            
> | Character code properties: customize what to show                          
> | name: LATIN SMALL LETTER I                                                 
> | general-category: Ll (Letter, Lowercase)                                   
> | decomposition: (105) ('i')                                                 
> |                                                                            
> | There is an overlay here:                                                  
> | From 440 to 520                                                            
> | face                 hl-line                                               
> | priority             -50                                                   
> | window               #<window 141 on *Article nnmaildir+RSSFeeds:ABlog*>   
> |                                                                            
> |                                                                            
> | There are text properties here:                                            
> | face                 variable-pitch                                        
> `----
> 
> And this after «washing»
> 
> ,----
> | position: 472 of 871 (54%), column: 30                                      
> | character: é (displayed as é) (codepoint 233, #o351, #xe9)                  
> | charset: unicode (Unicode (ISO10646))                                       
> | code point in charset: 0xE9                                                 
> | script: latin                                                               
> | syntax: w 	which means: word                                             
> | category: .:Base, L:Left-to-right (strong), c:Chinese, j:Japanese, l:Latin, 
> | v:Viet
> | to input: type "C-x 8 RET e9" or "C-x 8 RET LATIN SMALL LETTER E WITH ACUTE"
> | buffer code: #xC3 #xA9                                                      
> | file code: #xC3 #xA9 (encoded by coding system utf-8-unix)                  
> | display: by this font (glyph code)                                          
> | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#xAB)
> |                                                                             
> | Character code properties: customize what to show                           
> | name: LATIN SMALL LETTER E WITH ACUTE                                       
> | old-name: LATIN SMALL LETTER E ACUTE                                        
> | general-category: Ll (Letter, Lowercase)                                    
> | decomposition: (101 769) ('e' '́')                                           
> |                                                                             
> | There is an overlay here:                                                   
> | From 442 to 523                                                             
> | face                 hl-line                                                
> | priority             -50                                                    
> | window               #<window 155 on *Article nnmaildir+RSSFeeds:ABlog*>    
> |                                                                             
> |                                                                             
> | There are text properties here:                                             
> | face                 variable-pitch                                         
> `----
> 
> The html part of the e-mails contains
> 
> ,----
> | < #part type=text/plain format="flowed" charset="utf-8"
> | disposition=inline nofile=yes>
> `----
> 
> so I guess that the html renderer should pick it up. I have tested shr,
> gnus-w3m and w3m and I always get the same result.
> 
> I would be grateful if somebody could help me understand what happens.

How does the character appear in the original HTML?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-10 14:00 ` Eli Zaretskii
@ 2020-10-10 14:35   ` Garjola Dindi
  2020-10-10 14:44     ` Eli Zaretskii
  2020-10-11  7:15     ` Damien Collard
  0 siblings, 2 replies; 13+ messages in thread
From: Garjola Dindi @ 2020-10-10 14:35 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 6032 bytes --]

On Sat 10-Oct-2020 at 16:00:54 +02, Eli Zaretskii <eliz@gnu.org> wrote: 
>> From: Garjola Dindi <garjola@garjola.net>
>> Date: Sat, 10 Oct 2020 15:34:02 +0200
>> 
>> If I use describe-char to inspect the characters, I get this before
>> «washing»:
>> 
>> ,----
>> | position: 470 of 867 (54%), column: 30                                     
>> | character: i (displayed as i) (codepoint 105, #o151, #x69)                 
>> | charset: ascii (ASCII (ISO646 IRV))                                        
>> | code point in charset: 0x69                                                
>> | script: latin                                                              
>> | syntax: w 	which means: word                                          |
>> | category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, r:Roman      
>> | to input: type "C-x 8 RET 69" or "C-x 8 RET LATIN SMALL LETTER I"          
>> | buffer code: #x69                                                          
>> | file code: #x69 (encoded by coding system utf-8-unix)                      
>> | display: by this font (glyph code)                                         
>> | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#x4C
>> |                                                                            
>> | Character code properties: customize what to show                          
>> | name: LATIN SMALL LETTER I                                                 
>> | general-category: Ll (Letter, Lowercase)                                   
>> | decomposition: (105) ('i')                                                 
>> |                                                                            
>> | There is an overlay here:                                                  
>> | From 440 to 520                                                            
>> | face                 hl-line                                               
>> | priority             -50                                                   
>> | window               #<window 141 on *Article nnmaildir+RSSFeeds:ABlog*>   
>> |                                                                            
>> |                                                                            
>> | There are text properties here:                                            
>> | face                 variable-pitch                                        
>> `----
>> 
>> And this after «washing»
>> 
>> ,----
>> | position: 472 of 871 (54%), column: 30                                      
>> | character: é (displayed as é) (codepoint 233, #o351, #xe9)                  
>> | charset: unicode (Unicode (ISO10646))                                       
>> | code point in charset: 0xE9                                                 
>> | script: latin                                                               
>> | syntax: w 	which means: word                                             
>> | category: .:Base, L:Left-to-right (strong), c:Chinese, j:Japanese, l:Latin, 
>> | v:Viet
>> | to input: type "C-x 8 RET e9" or "C-x 8 RET LATIN SMALL LETTER E WITH ACUTE"
>> | buffer code: #xC3 #xA9                                                      
>> | file code: #xC3 #xA9 (encoded by coding system utf-8-unix)                  
>> | display: by this font (glyph code)                                          
>> | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#xAB)
>> |                                                                             
>> | Character code properties: customize what to show                           
>> | name: LATIN SMALL LETTER E WITH ACUTE                                       
>> | old-name: LATIN SMALL LETTER E ACUTE                                        
>> | general-category: Ll (Letter, Lowercase)                                    
>> | decomposition: (101 769) ('e' '́')                                           
>> |                                                                             
>> | There is an overlay here:                                                   
>> | From 442 to 523                                                             
>> | face                 hl-line                                                
>> | priority             -50                                                    
>> | window               #<window 155 on *Article nnmaildir+RSSFeeds:ABlog*>    
>> |                                                                             
>> |                                                                             
>> | There are text properties here:                                             
>> | face                 variable-pitch                                         
>> `----
>> 
>> The html part of the e-mails contains
>> 
>> ,----
>> | < #part type=text/plain format="flowed" charset="utf-8"
>> | disposition=inline nofile=yes>
>> `----
>> 
>> so I guess that the html renderer should pick it up. I have tested shr,
>> gnus-w3m and w3m and I always get the same result.
>> 
>> I would be grateful if somebody could help me understand what happens.
>
> How does the character appear in the original HTML?

Thanks for your quick response.

I don't know if I am inspecting the message correctly, because when I
enter the edit mode, all characters appear OK. Therefore, I am not sure
if I an seeing the original html.

I have also noticed that the I also have the same issue with non html
e-mails. I thought they were html, but they are just multipart.

For instance, here is what I see in the article buffer:

,----
| \311lodie, qui a rejoint l'\351quipe podcast, me dit que sa soeur, qui a une
| formation th\351\342trale, serait disponible ponctuellement pour faire des
| voix pour des lectures. Pour le moment on a jamais eu ce besoin mais \347a
| peut ouvrir des perspectives. 
`----

(I have replaced the non printable chars with \xxx) and here is what I
see in edit mode: 

,----
| 

[-- Attachment #2.1: Type: text/plain, Size: 2 bytes --]

| 

[-- Attachment #2.2: Type: text/plain, Size: 261 bytes --]

| 
| Élodie, qui a rejoint l'équipe podcast, me dit que sa soeur, qui a une
| formation théâtrale, serait disponible ponctuellement pour faire des
| voix pour des lectures. Pour le moment on a jamais eu ce besoin mais ça
| peut ouvrir des perspectives. 
| 

[-- Attachment #2.3: Type: text/plain, Size: 171 bytes --]

| Pour connaître la configuration de la liste, gérer votre abonnement à la 
| liste et vos informations personnelles :
| https://listes.april.org/wws/info/libreavous
| 

[-- Attachment #3: Type: text/plain, Size: 299 bytes --]

`----


Again, when I quit the edit mode, the article buffer displays things correctly.

In the case of html I have for instace this in the article buffer:


,----
| Les groupes suprimacistes blancs ont profiti du mandat de Donald Trump et des ...
`----


and this in the edit mode buffer 

,----
| 

[-- Attachment #4: Type: text/plain, Size: 2 bytes --]

| 

[-- Attachment #5: Type: text/plain, Size: 99 bytes --]

`----

So now I think this is not due to html, but to multipart MIME.

Thanks again for your help.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-10 14:35   ` Garjola Dindi
@ 2020-10-10 14:44     ` Eli Zaretskii
  2020-10-10 15:53       ` Garjola Dindi
  2020-10-11  7:15     ` Damien Collard
  1 sibling, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2020-10-10 14:44 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Garjola Dindi <garjola@garjola.net>
> Date: Sat, 10 Oct 2020 16:35:05 +0200
> 
> >> The html part of the e-mails contains
> >> 
> >> ,----
> >> | < #part type=text/plain format="flowed" charset="utf-8"
> >> | disposition=inline nofile=yes>
> >> `----
> >> 
> >> so I guess that the html renderer should pick it up. I have tested shr,
> >> gnus-w3m and w3m and I always get the same result.
> >> 
> >> I would be grateful if somebody could help me understand what happens.
> >
> > How does the character appear in the original HTML?
> 
> Thanks for your quick response.
> 
> I don't know if I am inspecting the message correctly, because when I
> enter the edit mode, all characters appear OK. Therefore, I am not sure
> if I an seeing the original html.

Can you use some other tool, like wget or curl, to download the text
as it comes from the server?

> I have also noticed that the I also have the same issue with non html
> e-mails.

"Same issue" in what sense?  Is just é replaced by i, or does
something like that happen with every non-ASCII letter?

> For instance, here is what I see in the article buffer:
> 
> ,----
> | \311lodie, qui a rejoint l'\351quipe podcast, me dit que sa soeur, qui a une
> | formation th\351\342trale, serait disponible ponctuellement pour faire des
> | voix pour des lectures. Pour le moment on a jamais eu ce besoin mais \347a
> | peut ouvrir des perspectives. 
> `----
> 
> (I have replaced the non printable chars with \xxx)

What do you mean by "non printable" here?  Do they look like octal
escapes or do they look like something else?

Btw, the above is not UTF-8 encoding, it's Latin-1 encoding.

Does the problem go away if you start Emacs as "emacs -Q"?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-10 14:44     ` Eli Zaretskii
@ 2020-10-10 15:53       ` Garjola Dindi
  2020-10-10 16:12         ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Garjola Dindi @ 2020-10-10 15:53 UTC (permalink / raw)
  To: help-gnu-emacs

On Sat 10-Oct-2020 at 16:44:26 +02, Eli Zaretskii <eliz@gnu.org> wrote: 
>> From: Garjola Dindi <garjola@garjola.net>
>> Date: Sat, 10 Oct 2020 16:35:05 +0200
>> 
>> >> The html part of the e-mails contains
>> >> 
>> >> ,----
>> >> | < #part type=text/plain format="flowed" charset="utf-8"
>> >> | disposition=inline nofile=yes>
>> >> `----
>> >> 
>> >> so I guess that the html renderer should pick it up. I have tested shr,
>> >> gnus-w3m and w3m and I always get the same result.
>> >> 
>> >> I would be grateful if somebody could help me understand what happens.
>> >
>> > How does the character appear in the original HTML?
>> 
>> Thanks for your quick response.
>> 
>> I don't know if I am inspecting the message correctly, because when I
>> enter the edit mode, all characters appear OK. Therefore, I am not sure
>> if I an seeing the original html.
>
> Can you use some other tool, like wget or curl, to download the text
> as it comes from the server?
>

I have these tools, but I don't know how to use them to get the text
from the imap server. Since I am using offlineimap, I have the file on
disk and I can compare it before and after the dummy edit I do in Gnus.

For instance, before Gnus reads the message, I have:

,----
| Content-Type: multipart/alternative; boundary="=-1601233669-108915-4573-6815-27-="
| MIME-Version: 1.0
| 
| 
| --=-1601233669-108915-4573-6815-27-=
| Content-Type: text/plain; charset=utf-8; format=flowed
| Content-Transfer-Encoding: 8bit
| 
| <https://unregardstoicien.com/2020/06/20/entretien-avec-stoa-gallica-elen-buzare-et-jerome-robin/>
| 
| Elen Buzaré et Jérôme Robin sont membres fondateurs de l'association 
| Stoa Gallica, la première association francophone de stoïcisme, fondée 
| le 15 juillet 2017. Ils ont chaleureusement accepté de répondre à 
| quelques unes de mes questions.
| -- 
`----

and after Gnus dummy edit, I have:

,----
| Content-Type: multipart/alternative; boundary="=-=-="
| 
| --=-=-=
| Content-Type: text/plain; charset=utf-8; format=flowed
| Content-Disposition: inline
| Content-Transfer-Encoding: quoted-printable
| 
| <https://unregardstoicien.com/2020/06/20/entretien-avec-stoa-gallica-elen-b=
| uzare-et-jerome-robin/>
| 
| Elen Buzar=C3=A9 et J=C3=A9r=C3=B4me Robin sont membres fondateurs de l'ass=
| ociation=20
| Stoa Gallica, la premi=C3=A8re association francophone de sto=C3=AFcisme, f=
| ond=C3=A9e=20
| le 15 juillet 2017. Ils ont chaleureusement accept=C3=A9 de r=C3=A9pondre =
| =C3=A0=20
| quelques unes de mes questions.
| --=20
`----



>> I have also noticed that the I also have the same issue with non html
>> e-mails.
>
> "Same issue" in what sense?  Is just é replaced by i, or does
> something like that happen with every non-ASCII letter?
>

The characters are incorrectly displayed. In html I have the é -> i
replacement. In plain text, I have the é replaced by \351.

>> For instance, here is what I see in the article buffer:
>> 
>> ,----
>> | \311lodie, qui a rejoint l'\351quipe podcast, me dit que sa soeur, qui a une
>> | formation th\351\342trale, serait disponible ponctuellement pour faire des
>> | voix pour des lectures. Pour le moment on a jamais eu ce besoin mais \347a
>> | peut ouvrir des perspectives. 
>> `----
>> 
>> (I have replaced the non printable chars with \xxx)
>
> What do you mean by "non printable" here?  Do they look like octal
> escapes or do they look like something else?
>

When sending the message to the list, Gnus said that these where non
printable characters and I replaced them by the sequence of individual
chars that read as the octal escapes.

> Btw, the above is not UTF-8 encoding, it's Latin-1 encoding.
>
> Does the problem go away if you start Emacs as "emacs -Q"?

I have tried, but with "emacs -Q", Gnus does not find the nnmaildir
groups. So I don't know how to proceed.

Thanks again.
-- 




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-10 15:53       ` Garjola Dindi
@ 2020-10-10 16:12         ` Eli Zaretskii
  2020-10-10 20:10           ` Garjola Dindi
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2020-10-10 16:12 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Garjola Dindi <garjola@garjola.net>
> Date: Sat, 10 Oct 2020 17:53:07 +0200
> 
> > Btw, the above is not UTF-8 encoding, it's Latin-1 encoding.
> >
> > Does the problem go away if you start Emacs as "emacs -Q"?
> 
> I have tried, but with "emacs -Q", Gnus does not find the nnmaildir
> groups. So I don't know how to proceed.

Neither do I, sorry.  Someone who knows Gnus better will have to chime
in.  It sounds like you have Emacs or Gnus misconfigured wrt encoding,
but I cannot say anything more intelligent.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-10 16:12         ` Eli Zaretskii
@ 2020-10-10 20:10           ` Garjola Dindi
  0 siblings, 0 replies; 13+ messages in thread
From: Garjola Dindi @ 2020-10-10 20:10 UTC (permalink / raw)
  To: help-gnu-emacs

On Sat 10-Oct-2020 at 18:12:56 +02, Eli Zaretskii <eliz@gnu.org> wrote: 
>> From: Garjola Dindi <garjola@garjola.net>
>> Date: Sat, 10 Oct 2020 17:53:07 +0200
>> 
>> > Btw, the above is not UTF-8 encoding, it's Latin-1 encoding.
>> >
>> > Does the problem go away if you start Emacs as "emacs -Q"?
>> 
>> I have tried, but with "emacs -Q", Gnus does not find the nnmaildir
>> groups. So I don't know how to proceed.
>
> Neither do I, sorry.  Someone who knows Gnus better will have to chime
> in.  It sounds like you have Emacs or Gnus misconfigured wrt encoding,
> but I cannot say anything more intelligent.

Thanks anyway for your time. Your questions have pushed me to
investigate further.

I have started emacs with a init.el just pointing to the .gnus whose
only contents are

,----
| (setq gnus-select-method '(nnml ""))
| (add-to-list 'gnus-secondary-select-methods
|        '(nnmaildir "MyMail" 
|                    (directory "/home/garjola/MyMail")
|                    (directory-files nnheader-directory-files-safe) 
|                    (get-new-mail nil)))
`----

in this way, there is nothing else from my config which can interfere.
But the same encoding issues occur.

I have accessed the imap server directly with Gnus' imap select method
and the messages are correctly encoded. This makes me think that the
problem comes from the nnmaildir select method or from the way I
download the messages from the server and convert to nnmaildir.

I use offlineimap for e-mail and feed2imap for RSS subscriptions. Since
these are 2 different programs, I guess that the most likely issue is
the nnmaildir configuration.

I am stuck again, but I feel I made some progress.

Thank you.

-- 




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-10 14:35   ` Garjola Dindi
  2020-10-10 14:44     ` Eli Zaretskii
@ 2020-10-11  7:15     ` Damien Collard
  2020-10-11 10:27       ` Garjola Dindi
  2020-10-11 11:27       ` Garjola Dindi
  1 sibling, 2 replies; 13+ messages in thread
From: Damien Collard @ 2020-10-11  7:15 UTC (permalink / raw)
  To: help-gnu-emacs


Hello,

On Sat, Oct 10 2020, Garjola Dindi wrote:

> I have also noticed that the I also have the same issue with non html
> e-mails. I thought they were html, but they are just multipart.

I have the same problem -- for some e-mails, not all of them.

Using nnmaildir and offlineimap like you are.

I *think* I started having this problem after upgrading to Emacs 27, but
I'm not sure...

I'll post here again if I find a solution. In the meantime, I have
adopted your "dummy edit" function.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-11  7:15     ` Damien Collard
@ 2020-10-11 10:27       ` Garjola Dindi
  2020-10-11 11:27       ` Garjola Dindi
  1 sibling, 0 replies; 13+ messages in thread
From: Garjola Dindi @ 2020-10-11 10:27 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 537 bytes --]

On Sun 11-Oct-2020 at 09:15:11 +02, Damien Collard
<damien.collard@distfp.net> wrote: 
> Hello,
>
> On Sat, Oct 10 2020, Garjola Dindi wrote:
>
>> I have also noticed that the I also have the same issue with non html
>> e-mails. I thought they were html, but they are just multipart.
>
> I have the same problem -- for some e-mails, not all of them.
>

I confirm that this does not happen with all e-mails. Only for messages
with a "multipart" enclosure but not all of them. A quick check tells me
that the problem appears with

,----
| 

[-- Attachment #2.1: Type: text/plain, Size: 2 bytes --]

| 

[-- Attachment #2.2: Type: text/plain, Size: 20 bytes --]

`----

and

,----
| 

[-- Attachment #2.3: Type: text/plain, Size: 2 bytes --]

| 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-11  7:15     ` Damien Collard
  2020-10-11 10:27       ` Garjola Dindi
@ 2020-10-11 11:27       ` Garjola Dindi
  2020-10-11 15:26         ` Damien Collard
  1 sibling, 1 reply; 13+ messages in thread
From: Garjola Dindi @ 2020-10-11 11:27 UTC (permalink / raw)
  To: help-gnu-emacs


Sorry for the previous post. Long lines with multipart messed up the
nntp post. Here it goes again.

On Sun 11-Oct-2020 at 09:15:11 +02, Damien Collard
<damien.collard@distfp.net> wrote: 
> Hello,
>
> On Sat, Oct 10 2020, Garjola Dindi wrote:
>
>> I have also noticed that the I also have the same issue with non html
>> e-mails. I thought they were html, but they are just multipart.
>
> I have the same problem -- for some e-mails, not all of them.

I confirm that this does not happen with all e-mails. Only for messages
with a "multipart" enclosure but not all of them. A quick check tells me
that the problem appears with

,----
| < #multipart type=mixed>
| < #part type=text/plain format="flowed" charset="utf-8"
| disposition=inline nofile=yes> 
`----

and

,----
| < #multipart type=alternative>
| < #part type=text/plain format="flowed" charset="utf-8"
| disposition=inline nofile=yes> 
`----

But not for this one

,----
| < #multipart type=alternative>
| < #part type=text/plain charset="iso-8859-1" disposition=inline
| nofile=yes>
`----

So I guess that the problem is the utf-8 encoding.

>
> Using nnmaildir and offlineimap like you are.
>
> I *think* I started having this problem after upgrading to Emacs 27,
> but I'm not sure...
>

I am on emacs git master branch and I have been having the problem for
several months now, so this is coherent with your guess.

> I'll post here again if I find a solution. In the meantime, I have
> adopted your "dummy edit" function.

What I understand is that:
1. html is not the issue, but the multipart, since the text/plain part
   also gets incorrectly encoded
2. only happens when charset is utf-8 and not iso-8859-1
3. only happens with nnmaildir, since I have used nnimap to download the
   same messages from the same server and they are correctly displayed
   by Gnus
4. since the «dummy edit» works, this means that the issue is corrected
   when emacs opens the message in edit mode

Thanks for any feedback.
-- 




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-11 11:27       ` Garjola Dindi
@ 2020-10-11 15:26         ` Damien Collard
  2020-10-12 12:01           ` Garjola Dindi
  2021-01-21  7:52           ` Garjola Dindi
  0 siblings, 2 replies; 13+ messages in thread
From: Damien Collard @ 2020-10-11 15:26 UTC (permalink / raw)
  To: help-gnu-emacs


I have the problem on some utf-8 non-multipart messages, not all of
them. It looks like only those that have format=flowed exhibit the
problem, but disabling it (by setting mm-fill-flowed to nil) doesn't
change anything.

On Sun, Oct 11 2020, Garjola Dindi wrote:

> Sorry for the previous post. Long lines with multipart messed up the
> nntp post. Here it goes again.
>
> On Sun 11-Oct-2020 at 09:15:11 +02, Damien Collard
> <damien.collard@distfp.net> wrote: 
>> Hello,
>>
>> On Sat, Oct 10 2020, Garjola Dindi wrote:
>>
>>> I have also noticed that the I also have the same issue with non html
>>> e-mails. I thought they were html, but they are just multipart.
>>
>> I have the same problem -- for some e-mails, not all of them.
>
> I confirm that this does not happen with all e-mails. Only for messages
> with a "multipart" enclosure but not all of them. A quick check tells me
> that the problem appears with
>
> ,----
> | < #multipart type=mixed>
> | < #part type=text/plain format="flowed" charset="utf-8"
> | disposition=inline nofile=yes> 
> `----
>
> and
>
> ,----
> | < #multipart type=alternative>
> | < #part type=text/plain format="flowed" charset="utf-8"
> | disposition=inline nofile=yes> 
> `----
>
> But not for this one
>
> ,----
> | < #multipart type=alternative>
> | < #part type=text/plain charset="iso-8859-1" disposition=inline
> | nofile=yes>
> `----
>
> So I guess that the problem is the utf-8 encoding.
>
>>
>> Using nnmaildir and offlineimap like you are.
>>
>> I *think* I started having this problem after upgrading to Emacs 27,
>> but I'm not sure...
>>
>
> I am on emacs git master branch and I have been having the problem for
> several months now, so this is coherent with your guess.
>
>> I'll post here again if I find a solution. In the meantime, I have
>> adopted your "dummy edit" function.
>
> What I understand is that:
> 1. html is not the issue, but the multipart, since the text/plain part
>    also gets incorrectly encoded
> 2. only happens when charset is utf-8 and not iso-8859-1
> 3. only happens with nnmaildir, since I have used nnimap to download the
>    same messages from the same server and they are correctly displayed
>    by Gnus
> 4. since the «dummy edit» works, this means that the issue is corrected
>    when emacs opens the message in edit mode
>
> Thanks for any feedback.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-11 15:26         ` Damien Collard
@ 2020-10-12 12:01           ` Garjola Dindi
  2021-01-21  7:52           ` Garjola Dindi
  1 sibling, 0 replies; 13+ messages in thread
From: Garjola Dindi @ 2020-10-12 12:01 UTC (permalink / raw)
  To: help-gnu-emacs

On Sun 11-Oct-2020 at 17:26:33 +02, Damien Collard
<damien.collard@distfp.net> wrote: 
> I have the problem on some utf-8 non-multipart messages, not all of
> them. It looks like only those that have format=flowed exhibit the
> problem, but disabling it (by setting mm-fill-flowed to nil) doesn't
> change anything.

You are right about the fact that not all utf-8 have the problem, but I
have some without the format=flowed that do have the issue.

So I don't see a common pattern. 

>
> On Sun, Oct 11 2020, Garjola Dindi wrote:
>
>> Sorry for the previous post. Long lines with multipart messed up the
>> nntp post. Here it goes again.
>>
>> On Sun 11-Oct-2020 at 09:15:11 +02, Damien Collard
>> <damien.collard@distfp.net> wrote: 
>>> Hello,
>>>
>>> On Sat, Oct 10 2020, Garjola Dindi wrote:
>>>
>>>> I have also noticed that the I also have the same issue with non html
>>>> e-mails. I thought they were html, but they are just multipart.
>>>
>>> I have the same problem -- for some e-mails, not all of them.
>>
>> I confirm that this does not happen with all e-mails. Only for messages
>> with a "multipart" enclosure but not all of them. A quick check tells me
>> that the problem appears with
>>
>> ,----
>> | < #multipart type=mixed>
>> | < #part type=text/plain format="flowed" charset="utf-8"
>> | disposition=inline nofile=yes> 
>> `----
>>
>> and
>>
>> ,----
>> | < #multipart type=alternative>
>> | < #part type=text/plain format="flowed" charset="utf-8"
>> | disposition=inline nofile=yes> 
>> `----
>>
>> But not for this one
>>
>> ,----
>> | < #multipart type=alternative>
>> | < #part type=text/plain charset="iso-8859-1" disposition=inline
>> | nofile=yes>
>> `----
>>
>> So I guess that the problem is the utf-8 encoding.
>>
>>>
>>> Using nnmaildir and offlineimap like you are.
>>>
>>> I *think* I started having this problem after upgrading to Emacs 27,
>>> but I'm not sure...
>>>
>>
>> I am on emacs git master branch and I have been having the problem for
>> several months now, so this is coherent with your guess.
>>
>>> I'll post here again if I find a solution. In the meantime, I have
>>> adopted your "dummy edit" function.
>>
>> What I understand is that:
>> 1. html is not the issue, but the multipart, since the text/plain part
>>    also gets incorrectly encoded
>> 2. only happens when charset is utf-8 and not iso-8859-1
>> 3. only happens with nnmaildir, since I have used nnimap to download the
>>    same messages from the same server and they are correctly displayed
>>    by Gnus
>> 4. since the «dummy edit» works, this means that the issue is corrected
>>    when emacs opens the message in edit mode
>>
>> Thanks for any feedback.
-- 




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
  2020-10-11 15:26         ` Damien Collard
  2020-10-12 12:01           ` Garjola Dindi
@ 2021-01-21  7:52           ` Garjola Dindi
  1 sibling, 0 replies; 13+ messages in thread
From: Garjola Dindi @ 2021-01-21  7:52 UTC (permalink / raw)
  To: help-gnu-emacs


Hi all,

This problem seems to have been solved recently on emacs master branch.
I am running :

GNU Emacs 28.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version 3.24.5,
cairo version 1.16.0) of 2021-01-19

I don't know when exactly the problem disappeared, but I recompile the
master branch on tuesdays and I would say that I haven't seen the issue
for more than a week.

Thanks to somebody ;)


> On Sun 11-Oct-2020 at 17:26:33 +02, Damien Collard
> <damien.collard@distfp.net> wrote: 
> > I have the problem on some utf-8 non-multipart messages, not all of
> > them. It looks like only those that have format=flowed exhibit the
> > problem, but disabling it (by setting mm-fill-flowed to nil) doesn't
> > change anything.
> You are right about the fact that not all utf-8 have the problem, but I
> have some without the format=flowed that do have the issue.
> So I don't see a common pattern. 
> >
> > On Sun, Oct 11 2020, Garjola Dindi wrote:
> >
> >> Sorry for the previous post. Long lines with multipart messed up the
> >> nntp post. Here it goes again.
> >>
> >> On Sun 11-Oct-2020 at 09:15:11 +02, Damien Collard
> >> <damien.collard@distfp.net> wrote: 
> >>> Hello,
> >>>
> >>> On Sat, Oct 10 2020, Garjola Dindi wrote:
> >>>
> >>>> I have also noticed that the I also have the same issue with non html
> >>>> e-mails. I thought they were html, but they are just multipart.
> >>>
> >>> I have the same problem -- for some e-mails, not all of them.
> >>
> >> I confirm that this does not happen with all e-mails. Only for messages
> >> with a "multipart" enclosure but not all of them. A quick check tells me
> >> that the problem appears with
> >>
> >> ,----
> >> | < #multipart type=mixed>
> >> | < #part type=text/plain format="flowed" charset="utf-8"
> >> | disposition=inline nofile=yes> 
> >> `----
> >>
> >> and
> >>
> >> ,----
> >> | < #multipart type=alternative>
> >> | < #part type=text/plain format="flowed" charset="utf-8"
> >> | disposition=inline nofile=yes> 
> >> `----
> >>
> >> But not for this one
> >>
> >> ,----
> >> | < #multipart type=alternative>
> >> | < #part type=text/plain charset="iso-8859-1" disposition=inline
> >> | nofile=yes>
> >> `----
> >>
> >> So I guess that the problem is the utf-8 encoding.
> >>
> >>>
> >>> Using nnmaildir and offlineimap like you are.
> >>>
> >>> I *think* I started having this problem after upgrading to Emacs 27,
> >>> but I'm not sure...
> >>>
> >>
> >> I am on emacs git master branch and I have been having the problem for
> >> several months now, so this is coherent with your guess.
> >>
> >>> I'll post here again if I find a solution. In the meantime, I have
> >>> adopted your "dummy edit" function.
> >>
> >> What I understand is that:
> >> 1. html is not the issue, but the multipart, since the text/plain part
> >>    also gets incorrectly encoded
> >> 2. only happens when charset is utf-8 and not iso-8859-1
> >> 3. only happens with nnmaildir, since I have used nnimap to download the
> >>    same messages from the same server and they are correctly displayed
> >>    by Gnus
> >> 4. since the «dummy edit» works, this means that the issue is corrected
> >>    when emacs opens the message in edit mode
> >>
> >> Thanks for any feedback.
-- 




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-01-21  7:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-10 13:34 Incorrect rendering of accented characters in HTML e-mail (Gnus) Garjola Dindi
2020-10-10 14:00 ` Eli Zaretskii
2020-10-10 14:35   ` Garjola Dindi
2020-10-10 14:44     ` Eli Zaretskii
2020-10-10 15:53       ` Garjola Dindi
2020-10-10 16:12         ` Eli Zaretskii
2020-10-10 20:10           ` Garjola Dindi
2020-10-11  7:15     ` Damien Collard
2020-10-11 10:27       ` Garjola Dindi
2020-10-11 11:27       ` Garjola Dindi
2020-10-11 15:26         ` Damien Collard
2020-10-12 12:01           ` Garjola Dindi
2021-01-21  7:52           ` Garjola Dindi

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.