unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Mumi now uses raw emails and Mu
@ 2019-07-21 20:18 Ricardo Wurmus
  2019-07-22  0:41 ` Jack Hill
  0 siblings, 1 reply; 9+ messages in thread
From: Ricardo Wurmus @ 2019-07-21 20:18 UTC (permalink / raw)
  To: guix-devel

Hey there,

I just wanted to let y’all know that Mumi at issues.guix.gnu.org now
stores raw emails from Debbugs instead of fetching them through the
Debbugs SOAP service.  This means that messages are no longer truncated,
which is what happened to some multipart messages in the past.

While I don’t think Mu is directly suitable for Mumi, I added Mu as a
dependency for now, and I’m indexing all emails in regular intervals.
This can be used to speed up the message search.  If you want to
implement this, I’d be happy to receive patches!

The code for Mumi can be found here:

    https://git.elephly.net/software/mumi.git

Going forward I think it would be better to use Xapian directly.  Then
we could store more relevant information in the database, such as the
Debbugs bug number to which a certain message relates.  I would like to
be able to almost instantaneously answer these questions:

* which of our issues have seemingly been forgotten?
* which have been ignored?
* which issues involve a certain person?
* what issues match this text query?

Since we’re interested in issues (not individual messages) these
questions cannot quickly be answered with Mu.  To compute the issue
number from a message requires parsing it and I’d much rather do this at
index time.

I guess we could even just fork Mu and make it store additional
information in its database.

If you’d like to take part in this effort to make Mumi more useful to
all of us, please reply to this message.

--
Ricardo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Mumi now uses raw emails and Mu
  2019-07-21 20:18 Mumi now uses raw emails and Mu Ricardo Wurmus
@ 2019-07-22  0:41 ` Jack Hill
  2019-07-22  9:53   ` Ricardo Wurmus
  2019-07-22 17:54   ` Arun Isaac
  0 siblings, 2 replies; 9+ messages in thread
From: Jack Hill @ 2019-07-22  0:41 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2130 bytes --]

Ricardo,

On Sun, 21 Jul 2019, Ricardo Wurmus wrote:

> Hey there,
>
> I just wanted to let y’all know that Mumi at issues.guix.gnu.org now
> stores raw emails from Debbugs instead of fetching them through the
> Debbugs SOAP service.  This means that messages are no longer truncated,
> which is what happened to some multipart messages in the past.
>
> While I don’t think Mu is directly suitable for Mumi, I added Mu as a
> dependency for now, and I’m indexing all emails in regular intervals.
> This can be used to speed up the message search.  If you want to
> implement this, I’d be happy to receive patches!
>
> The code for Mumi can be found here:
>
>    https://git.elephly.net/software/mumi.git
>
> Going forward I think it would be better to use Xapian directly.  Then
> we could store more relevant information in the database, such as the
> Debbugs bug number to which a certain message relates.  I would like to
> be able to almost instantaneously answer these questions:
>
> * which of our issues have seemingly been forgotten?
> * which have been ignored?
> * which issues involve a certain person?
> * what issues match this text query?
>
> Since we’re interested in issues (not individual messages) these
> questions cannot quickly be answered with Mu.  To compute the issue
> number from a message requires parsing it and I’d much rather do this at
> index time.
>
> I guess we could even just fork Mu and make it store additional
> information in its database.

Very cool!

> If you’d like to take part in this effort to make Mumi more useful to
> all of us, please reply to this message.

I'd like to try to take part. I'm still very much a novice 
programmer/Guiler, so I might require a fair amount of hand-holding, but if 
others are working on it too, that should be fine.

With the currently deployed mumi, it looks like my mails in
https://issues.guix.gnu.org/issue/36207 are still mangling the '’' after 
Ludo, but in a different way than before. Perhaps this will be a good 
place for me to start trying to debug and learning the mail processing 
code paths in mumi.

All the best,
Jack

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Mumi now uses raw emails and Mu
  2019-07-22  0:41 ` Jack Hill
@ 2019-07-22  9:53   ` Ricardo Wurmus
  2019-07-23 11:01     ` Ricardo Wurmus
  2019-07-22 17:54   ` Arun Isaac
  1 sibling, 1 reply; 9+ messages in thread
From: Ricardo Wurmus @ 2019-07-22  9:53 UTC (permalink / raw)
  To: Jack Hill; +Cc: guix-devel


Hey Jack,

>> If you’d like to take part in this effort to make Mumi more useful to
>> all of us, please reply to this message.
>
> I'd like to try to take part. I'm still very much a novice
> programmer/Guiler, so I might require a fair amount of hand-holding,
> but if others are working on it too, that should be fine.

Very nice!  I’m working on it too and Arun is the author of Guile Email,
which does all the heavy lifting of parsing emails, so we’re three
already :)

Forking Mu requires a rough understanding how it currently works.
Obviously, it must parse emails and extract certain headers.  We only
need it to extract a few more headers such as “X-Debbugs-Envelope-To”,
which contains the Debbugs bug number.  (I don’t know if that header is
set for *all* emails that end up in the bug tracker, but that’s
something we have to figure out.)

Mu is written in C and the header extraction probably happens in the C
code as well.

> With the currently deployed mumi, it looks like my mails in
> https://issues.guix.gnu.org/issue/36207 are still mangling the '’'
> after Ludo, but in a different way than before. Perhaps this will be a
> good place for me to start trying to debug and learning the mail
> processing code paths in mumi.

Yes, I noticed that too.  Previously that was because of how mumi
interpreted the response from the SOAP service.  We no longer use the
SOAP service and instead ask Debbugs for the raw emails, which are then
passed to Guile Email’s “parse-email”.  So the problem now is in fact in
Guile Email.

--
Ricardo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Mumi now uses raw emails and Mu
  2019-07-22  0:41 ` Jack Hill
  2019-07-22  9:53   ` Ricardo Wurmus
@ 2019-07-22 17:54   ` Arun Isaac
  1 sibling, 0 replies; 9+ messages in thread
From: Arun Isaac @ 2019-07-22 17:54 UTC (permalink / raw)
  To: Jack Hill, Ricardo Wurmus; +Cc: guix-devel, guile-email

[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]


> With the currently deployed mumi, it looks like my mails in
> https://issues.guix.gnu.org/issue/36207 are still mangling the '’' after 
> Ludo, but in a different way than before.

Like Ricardo said, this is indeed a bug in guile-email, the same bug you
reported earlier, in fact. I haven't fixed it yet, but I'm working on
it. The fix is a little more involved than I expected. Currently,
guile-email takes the entire email as a string, parses it and returns
the results. This constrains all characters in the email to have the
same encoding (specifically the charset in email terminology). But, an
email can have characters of different encodings. Therefore it must be
treated as a string of bytes not a string of characters.

> place for me to start trying to debug and learning the mail processing 
> code paths in mumi.

Parsing the email is done by guile-email. I only have a little
documentation at this point (you can find it at
https://guile-email.systemreboot.net/manual). So, you'll mostly have to
read the source code. There is a mailing list at
guile-email@systemreboot.net. Feel free to post any guile-email bugs or
questions to that list.

https://lists.systemreboot.net/listinfo/guile-email

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Mumi now uses raw emails and Mu
  2019-07-22  9:53   ` Ricardo Wurmus
@ 2019-07-23 11:01     ` Ricardo Wurmus
  2019-07-23 13:49       ` Ricardo Wurmus
  2019-07-24 17:07       ` Arun Isaac
  0 siblings, 2 replies; 9+ messages in thread
From: Ricardo Wurmus @ 2019-07-23 11:01 UTC (permalink / raw)
  To: Jack Hill; +Cc: guix-devel


Ricardo Wurmus <rekado@elephly.net> writes:

> Forking Mu requires a rough understanding how it currently works.
> Obviously, it must parse emails and extract certain headers.  We only
> need it to extract a few more headers such as “X-Debbugs-Envelope-To”,
> which contains the Debbugs bug number.  (I don’t know if that header is
> set for *all* emails that end up in the bug tracker, but that’s
> something we have to figure out.)
>
> Mu is written in C and the header extraction probably happens in the C
> code as well.

I forked Mu and let it store the “X-Debbugs-Envelope-To” header in the
database:

    https://git.elephly.net/?p=software/mumimu.git;a=shortlog;h=refs/heads/mumimu

The “bugid” query can be used to quickly retrieve messages belonging to
a certain Debbugs bug, so something like this will print the message
subject for all messages relating to bug 31023.

    (use-modules (mumimu))
    (mu:initialize (%config 'mail-dir))
    (for-each
     (lambda (msg)
       (pk (mu:subject msg)))
     (mu:message-list "bugid:31023"))

This seems to actually work.  Still don’t know if all emails actually
have that header; if they don’t we’ll need to implement an alternative
way to get the bug number in “get_bugid” (lib/mu-msg-file.c).

--
Ricardo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Mumi now uses raw emails and Mu
  2019-07-23 11:01     ` Ricardo Wurmus
@ 2019-07-23 13:49       ` Ricardo Wurmus
  2019-07-24 17:07       ` Arun Isaac
  1 sibling, 0 replies; 9+ messages in thread
From: Ricardo Wurmus @ 2019-07-23 13:49 UTC (permalink / raw)
  To: Jack Hill; +Cc: guix-devel


Ricardo Wurmus <rekado@elephly.net> writes:

> Ricardo Wurmus <rekado@elephly.net> writes:
>
>> Forking Mu requires a rough understanding how it currently works.
>> Obviously, it must parse emails and extract certain headers.  We only
>> need it to extract a few more headers such as “X-Debbugs-Envelope-To”,
>> which contains the Debbugs bug number.  (I don’t know if that header is
>> set for *all* emails that end up in the bug tracker, but that’s
>> something we have to figure out.)
>>
>> Mu is written in C and the header extraction probably happens in the C
>> code as well.
>
> I forked Mu and let it store the “X-Debbugs-Envelope-To” header in the
> database:[…]
> This seems to actually work.  Still don’t know if all emails actually
> have that header

They don’t.  Most obviously the first email that causes the bug to be
registered doesn’t include the header.  In mumi we can assume that the
file name of the email includes the bug number, though, so that’s an
easy albeit ugly workaround.

--
Ricardo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Mumi now uses raw emails and Mu
  2019-07-23 11:01     ` Ricardo Wurmus
  2019-07-23 13:49       ` Ricardo Wurmus
@ 2019-07-24 17:07       ` Arun Isaac
  2019-07-24 17:45         ` Ricardo Wurmus
  1 sibling, 1 reply; 9+ messages in thread
From: Arun Isaac @ 2019-07-24 17:07 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]


> I forked Mu and let it store the “X-Debbugs-Envelope-To” header in the
> database:
>
>     https://git.elephly.net/?p=software/mumimu.git;a=shortlog;h=refs/heads/mumimu
>
> This seems to actually work.  Still don’t know if all emails actually
> have that header; if they don’t we’ll need to implement an alternative
> way to get the bug number in “get_bugid” (lib/mu-msg-file.c).

Can't we find the bug number of an email by matching the To and Cc
addresses against xxxxx@debbugs.gnu.org where xxxxx is a bug number?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Mumi now uses raw emails and Mu
  2019-07-24 17:07       ` Arun Isaac
@ 2019-07-24 17:45         ` Ricardo Wurmus
  2019-07-24 18:46           ` Arun Isaac
  0 siblings, 1 reply; 9+ messages in thread
From: Ricardo Wurmus @ 2019-07-24 17:45 UTC (permalink / raw)
  To: Arun Isaac; +Cc: guix-devel


Arun Isaac <arunisaac@systemreboot.net> writes:

>> I forked Mu and let it store the “X-Debbugs-Envelope-To” header in the
>> database:
>>
>>     https://git.elephly.net/?p=software/mumimu.git;a=shortlog;h=refs/heads/mumimu
>>
>> This seems to actually work.  Still don’t know if all emails actually
>> have that header; if they don’t we’ll need to implement an alternative
>> way to get the bug number in “get_bugid” (lib/mu-msg-file.c).
>
> Can't we find the bug number of an email by matching the To and Cc
> addresses against xxxxx@debbugs.gnu.org where xxxxx is a bug number?

No because the initial email would be sent to bug-guix@gnu.org or
guix-patches@gnu.org, not xxxxx@debbugs.gnu.org.

There are also some control messages which are not sent to
xxxxx@debbugs.gnu.org but are received by the Debbugs service through
other means.

I’m now getting the bug number by convention: mumi gets them from
Debbugs, stores them in a predictable fashion (<bugnum> dash <msgnum>),
and mumimu gets the bug number from that file name.

Since mumimu isn’t a general purpose tool we can get away with that if
we all just hold our noses and ignore the stench.

--
Ricardo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Mumi now uses raw emails and Mu
  2019-07-24 17:45         ` Ricardo Wurmus
@ 2019-07-24 18:46           ` Arun Isaac
  0 siblings, 0 replies; 9+ messages in thread
From: Arun Isaac @ 2019-07-24 18:46 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 700 bytes --]


>> Can't we find the bug number of an email by matching the To and Cc
>> addresses against xxxxx@debbugs.gnu.org where xxxxx is a bug number?
>
> No because the initial email would be sent to bug-guix@gnu.org or
> guix-patches@gnu.org, not xxxxx@debbugs.gnu.org.

Yeah, right, I forgot about the first email.

> There are also some control messages which are not sent to
> xxxxx@debbugs.gnu.org but are received by the Debbugs service through
> other means.
>
> I’m now getting the bug number by convention: mumi gets them from
> Debbugs, stores them in a predictable fashion (<bugnum> dash <msgnum>),
> and mumimu gets the bug number from that file name.

Ok, that makes sense.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-24 18:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-21 20:18 Mumi now uses raw emails and Mu Ricardo Wurmus
2019-07-22  0:41 ` Jack Hill
2019-07-22  9:53   ` Ricardo Wurmus
2019-07-23 11:01     ` Ricardo Wurmus
2019-07-23 13:49       ` Ricardo Wurmus
2019-07-24 17:07       ` Arun Isaac
2019-07-24 17:45         ` Ricardo Wurmus
2019-07-24 18:46           ` Arun Isaac
2019-07-22 17:54   ` Arun Isaac

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).