unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Character display problem in mumi and my mail client
@ 2019-07-17 21:23 Jack Hill
  2019-07-18  8:00 ` Ricardo Wurmus
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jack Hill @ 2019-07-17 21:23 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1568 bytes --]

Hi Guix,

It appears that mumi (or at least the instance of it running on 
issues.guix.gnu.org) has problems displaying some non-ASCII characters. I 
noticed it with '’'. Compare

https://issues.guix.gnu.org/issue/36207

with

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207

In the former, Ludo's name displays as 'Ludo¢' while it the latter, it 
displays correctly as 'Ludo’'.

However, in Ludo's reply the character is displayed correctly.

Does this indicate that my mail client (alpine on Debian Stretch) is doing 
the wrong thing? It seems that mumi could handle this situation better 
since it debbugs appears to handle it correctly.

Looking at the raw mail downloaded from debbugs, I see that I'm sending 
mail as with the following encoding:

```
Content-Type: text/plain; FORMAT=flowed; CHARSET=ISO-8859-7
Content-Transfer-Encoding: 8BIT
```

while Ludo's mail is:

```
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
```

Interestingly, when I download the mbox file for my mail from debbugs, and 
looking at it Emacs with my en_us.UTF-8 locale, the cents-symbol appears. 
Using iconv to convert the file from ISO-8859-7 to UTF8, causes the 
correct character to display. So what looks to be happening is that mumi 
is interpreting my messages using the wrong encoding.

Thoughts? Is this something we want to fix?

This reminds me of rjbs's talk, "Email Hates the Living". [0]

Best,
Jack

[0] http://yapcasia.org/2011/talk/59

P.S. How are we tracking issues and patches for mumi? bug-guix@ and 
guix-patches?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-17 21:23 Character display problem in mumi and my mail client Jack Hill
@ 2019-07-18  8:00 ` Ricardo Wurmus
  2019-07-18 10:29   ` Arun Isaac
  2019-07-18 10:23 ` Arun Isaac
  2019-07-28  7:33 ` Arun Isaac
  2 siblings, 1 reply; 8+ messages in thread
From: Ricardo Wurmus @ 2019-07-18  8:00 UTC (permalink / raw)
  To: Jack Hill; +Cc: guix-devel


Hi Jack,

thanks for the report.

> It appears that mumi (or at least the instance of it running on
> issues.guix.gnu.org) has problems displaying some non-ASCII
> characters. I noticed it with '’'. Compare
>
> https://issues.guix.gnu.org/issue/36207
>
> with
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207

The problem here is that the Debbugs SOAP service (which we use in
guile-debbugs and thus mumi) serves up a base64-encoded blob for the
email body without any information about encodings, so we just try UTF-8
and fall back to ISO 8859-1 if there’s an error.  If we could operate on
the actual email that would be different.

Unfortunately, the SOAP service does not provide access to the actual
emails.  That’s only available through … the official Debbugs web
interface.

I’m currently in the process of moving away from the SOAP service for
fetching message contents, because it’s just too painful.  There are too
many truncated or otherwise mangled messages, and in the end we are just
assembling them back to a good old email to parse them with Arun’s
guile-email.  Going forward mumi will only use the SOAP service to get
bug status information and pointers to messages.  The actual emails will
be fetched through the Debbugs web interface with much cursing.

> P.S. How are we tracking issues and patches for mumi? bug-guix@ and
> guix-patches?

Either of them would be fine.  Please add [mumi] to the subject line so
that it’s easier to distinguish them.

--
Ricardo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-17 21:23 Character display problem in mumi and my mail client Jack Hill
  2019-07-18  8:00 ` Ricardo Wurmus
@ 2019-07-18 10:23 ` Arun Isaac
  2019-07-28  7:33 ` Arun Isaac
  2 siblings, 0 replies; 8+ messages in thread
From: Arun Isaac @ 2019-07-18 10:23 UTC (permalink / raw)
  To: Jack Hill, guix-devel; +Cc: guile-email

[-- Attachment #1: Type: text/plain, Size: 1746 bytes --]


Hi,

Thanks for the bug report!

> It appears that mumi (or at least the instance of it running on 
> issues.guix.gnu.org) has problems displaying some non-ASCII characters. I 
> noticed it with '’'. Compare
>
> https://issues.guix.gnu.org/issue/36207
>
> with
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207
>
> In the former, Ludo's name displays as 'Ludo¢' while it the latter, it 
> displays correctly as 'Ludo’'.
>
> However, in Ludo's reply the character is displayed correctly.
>
> Does this indicate that my mail client (alpine on Debian Stretch) is doing 
> the wrong thing? It seems that mumi could handle this situation better 
> since it debbugs appears to handle it correctly.
>
> Looking at the raw mail downloaded from debbugs, I see that I'm sending 
> mail as with the following encoding:
>
> ```
> Content-Type: text/plain; FORMAT=flowed; CHARSET=ISO-8859-7
> Content-Transfer-Encoding: 8BIT
> ```
>
> while Ludo's mail is:
>
> ```
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: quoted-printable
> ```
>
> Interestingly, when I download the mbox file for my mail from debbugs, and 
> looking at it Emacs with my en_us.UTF-8 locale, the cents-symbol appears. 
> Using iconv to convert the file from ISO-8859-7 to UTF8, causes the 
> correct character to display. So what looks to be happening is that mumi 
> is interpreting my messages using the wrong encoding.
>
> Thoughts? Is this something we want to fix?

I think this is a bug in guile-email. On brief examination, I found that
guile-email is assuming charset of UTF-8 when the
Content-Transfer-Encoding is 8bit. This is incorrect behaviour. I will
fix this soon.

Regards,
Arun.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-18  8:00 ` Ricardo Wurmus
@ 2019-07-18 10:29   ` Arun Isaac
  2019-07-18 10:49     ` Ricardo Wurmus
  0 siblings, 1 reply; 8+ messages in thread
From: Arun Isaac @ 2019-07-18 10:29 UTC (permalink / raw)
  To: Ricardo Wurmus, Jack Hill; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 776 bytes --]


> I’m currently in the process of moving away from the SOAP service for
> fetching message contents, because it’s just too painful.  There are too
> many truncated or otherwise mangled messages, and in the end we are just
> assembling them back to a good old email to parse them with Arun’s
> guile-email.  Going forward mumi will only use the SOAP service to get
> bug status information and pointers to messages.  The actual emails will
> be fetched through the Debbugs web interface with much cursing.

Debbugs provides complete mboxes. guile-email supports reading
mboxes. We could use that if the SOAP service is problematic. Scraping
the HTML from the Debbugs web interface is going to be a lot more
painful than the SOAP service.

Regards,
Arun.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-18 10:29   ` Arun Isaac
@ 2019-07-18 10:49     ` Ricardo Wurmus
  2019-07-19 22:14       ` Jack Hill
  0 siblings, 1 reply; 8+ messages in thread
From: Ricardo Wurmus @ 2019-07-18 10:49 UTC (permalink / raw)
  To: Arun Isaac; +Cc: guix-devel


Arun Isaac <arunisaac@systemreboot.net> writes:

>> I’m currently in the process of moving away from the SOAP service for
>> fetching message contents, because it’s just too painful.  There are too
>> many truncated or otherwise mangled messages, and in the end we are just
>> assembling them back to a good old email to parse them with Arun’s
>> guile-email.  Going forward mumi will only use the SOAP service to get
>> bug status information and pointers to messages.  The actual emails will
>> be fetched through the Debbugs web interface with much cursing.
>
> Debbugs provides complete mboxes. guile-email supports reading
> mboxes. We could use that if the SOAP service is problematic. Scraping
> the HTML from the Debbugs web interface is going to be a lot more
> painful than the SOAP service.

My development version of Guile Debbugs uses the mboxes, of course.
They are, however, provided by an HTTP endpoint, which is part of the
Debbugs web interface.

--
Ricardo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-18 10:49     ` Ricardo Wurmus
@ 2019-07-19 22:14       ` Jack Hill
  0 siblings, 0 replies; 8+ messages in thread
From: Jack Hill @ 2019-07-19 22:14 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Ricardo and Arun,

Thanks for the responses and analysis. It sounds like you have a good 
grasp of the problem, and my MUA isn't behaving too strangely.

Let me know if you'd like anything from me.

Best,
Jack

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-17 21:23 Character display problem in mumi and my mail client Jack Hill
  2019-07-18  8:00 ` Ricardo Wurmus
  2019-07-18 10:23 ` Arun Isaac
@ 2019-07-28  7:33 ` Arun Isaac
  2019-07-28  8:36   ` Ricardo Wurmus
  2 siblings, 1 reply; 8+ messages in thread
From: Arun Isaac @ 2019-07-28  7:33 UTC (permalink / raw)
  To: Jack Hill, Ricardo Wurmus; +Cc: guix-devel, guile-email

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]


> It appears that mumi (or at least the instance of it running on 
> issues.guix.gnu.org) has problems displaying some non-ASCII characters. I 
> noticed it with '’'. Compare
>
> https://issues.guix.gnu.org/issue/36207
>
> with
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=36207

I have fixed this bug. See
https://git.systemreboot.net/guile-email/commit/?id=ac83c2a00c13702bc365cd0f3074239fa63d743f
and
https://git.systemreboot.net/guile-email/commit/?id=1f7c45fa0b71bd137e4b661e0d473c3eb9c98f48

guile-email's parse-email and parse-email-body functions now prefer to
operate on bytevectors, rather than on strings. Likewise, mbox->emails
now returns a list of bytevectors, not a list of strings.

I have updated the API documentation at
https://guile-email.systemreboot.net/manual/Parsing-e_002dmail.html#Parsing-e_002dmail

@Ricardo:

I think you are using a git checkout of guile-email for the mumi hosted
at issues.guix.info. Could you use the latest guile-email commit on
master (specifically, c85e6917ea21631857d93f58e60d910e07317131)? That
should fix this bug. No other changes are required in mumi. I will
release guile-email 0.2.0 in another week's time.

> This reminds me of rjbs's talk, "Email Hates the Living". [0]
>
> [0] http://yapcasia.org/2011/talk/59

Indeed, email drags along a lot of backward compatibility baggage from
the past. In the future, I'll add some of the pathological examples
mentioned in this talk as test cases.

Also, if you know of any corpus of email parser test cases, please let
me know. I will use them to expand guile-email's test suite.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Character display problem in mumi and my mail client
  2019-07-28  7:33 ` Arun Isaac
@ 2019-07-28  8:36   ` Ricardo Wurmus
  0 siblings, 0 replies; 8+ messages in thread
From: Ricardo Wurmus @ 2019-07-28  8:36 UTC (permalink / raw)
  To: Arun Isaac; +Cc: guix-devel, guile-email


Arun Isaac <arunisaac@systemreboot.net> writes:

> I have fixed this bug. See
> https://git.systemreboot.net/guile-email/commit/?id=ac83c2a00c13702bc365cd0f3074239fa63d743f
> and
> https://git.systemreboot.net/guile-email/commit/?id=1f7c45fa0b71bd137e4b661e0d473c3eb9c98f48
[…]
> @Ricardo:
>
> I think you are using a git checkout of guile-email for the mumi hosted
> at issues.guix.info. Could you use the latest guile-email commit on
> master (specifically, c85e6917ea21631857d93f58e60d910e07317131)? That
> should fix this bug. No other changes are required in mumi. I will
> release guile-email 0.2.0 in another week's time.

Thank you.  I’ve updated the variant of guile-email that’s used by mumi.

-- 
Ricardo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-28  8:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-17 21:23 Character display problem in mumi and my mail client Jack Hill
2019-07-18  8:00 ` Ricardo Wurmus
2019-07-18 10:29   ` Arun Isaac
2019-07-18 10:49     ` Ricardo Wurmus
2019-07-19 22:14       ` Jack Hill
2019-07-18 10:23 ` Arun Isaac
2019-07-28  7:33 ` Arun Isaac
2019-07-28  8:36   ` Ricardo Wurmus

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).