From: Eric Wong <e@yhbt.net>
To: Kyle Meyer <kyle@kyleam.com>
Cc: meta@public-inbox.org
Subject: [PATCH] inboxwritable: fix From_ line unescaping
Date: Sat, 4 Apr 2020 06:20:03 +0000 [thread overview]
Message-ID: <20200404062003.GA23899@dcvr> (raw)
In-Reply-To: <87lfnb3kz8.fsf@kyleam.com>
Kyle Meyer <kyle@kyleam.com> wrote:
> I'm feeding mbox files created with Konstantin Ryabitsev's
> list-archive-maker.py script [^1] to import_vger_from_mbox. Looking
> through the result, I noticed some ">From" lines. Here's an example:
>
> https://yhetil.org/orgmode/871rpt9zc4.fsf@kyleam.com/
>
> If I'm following the code correctly, that leads to an import_mbox call,
> which in turn calls mb_add:
>
> sub mb_add ($$$$) {
> my ($im, $variant, $filter, $msg) = @_;
> $$msg =~ s/(\r?\n)+\z/$1/s;
> my $mime = PublicInbox::MIME->new($msg);
> if ($variant eq 'mboxrd') {
> $$msg =~ s/^>(>*From )/$1/sm;
> } elsif ($variant eq 'mboxo') {
> $$msg =~ s/^>From /From /sm;
> }
> [...]
Yup, and that's buggy on first sight. My fault :x
> So, it appears the ">From" _should_ be getting reversed. To eliminate
> any stupid things I may have done when creating the archive, I looked
> for a message on meta that has an in-body line starting with "From" and
> found
>
> https://public-inbox.org/meta/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/
>
> So I downloaded the public-inbox generated mbox and fed it to
> import_vger_from_mbox:
>
> curl -s https://public-inbox.org/meta/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/t.mbox.gz \
> | zcat | scripts/import_vger_from_mbox testing emacs-orgmode@gnu.org ~/inboxes/testing
>
> That too leaves a ">From" in the body:
>
> https://yhetil.org/testing/20200121222924.ioz5ve2sg65zcuoy@chatter.i7.local/
Thanks for the reproducible test case. A fix is below
(only tested with your case, nothing in t/*.t yet)
> Any idea what's going wrong here?
Two bugs, actually, but one affected your case.
> [^1]: https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/plain/list-archive-maker.py
Can you confirm the following fixes things for you?
Thanks again for the excellent bug report and apologies for
my careless bug :x
----8<----
From: Eric Wong <e@yhbt.net>
Date: Sat, 04 Apr 2020 06:17:29 +0000
Subject: [PATCH] inboxwritable: fix From_ line unescaping
We can't rely on Email::MIME noticing the change to our
scalar ref after calling `PublicInbox::MIME->new'.
This is because Email::MIME::body_set (unlike
Email::Simple::body_set) will copy the contents of the body into
`->{body_raw}' as a new scalar.
Furthermore, we need to escape multiple From lines in the body,
not just the first one, using the `g' modifier to `s//'.
Reported-by: Kyle Meyer <kyle@kyleam.com>
---
lib/PublicInbox/InboxWritable.pm | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index ce979ea2..f2ba21fc 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -157,12 +157,12 @@ my $from_strict = qr/^From \S+ +\S+ \S+ +\S+ [^:]+:[^:]+:[^:]+ [^:]+/;
sub mb_add ($$$$) {
my ($im, $variant, $filter, $msg) = @_;
$$msg =~ s/(\r?\n)+\z/$1/s;
- my $mime = PublicInbox::MIME->new($msg);
if ($variant eq 'mboxrd') {
- $$msg =~ s/^>(>*From )/$1/sm;
+ $$msg =~ s/^>(>*From )/$1/gms;
} elsif ($variant eq 'mboxo') {
- $$msg =~ s/^>From /From /sm;
+ $$msg =~ s/^>From /From /gms;
}
+ my $mime = PublicInbox::MIME->new($msg);
if ($filter) {
my $ret = $filter->scrub($mime) or return;
return if $ret == REJECT();
next prev parent reply other threads:[~2020-04-04 6:20 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-04 4:58 From-munge not being reversed on mbox import Kyle Meyer
2020-04-04 6:20 ` Eric Wong [this message]
2020-04-04 16:31 ` [PATCH] inboxwritable: fix From_ line unescaping Kyle Meyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200404062003.GA23899@dcvr \
--to=e@yhbt.net \
--cc=kyle@kyleam.com \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).