From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 3/5] scripts/slrnspool2maildir: use MHreader and LeiToMail
Date: Wed, 31 Jan 2024 10:20:18 +0000 [thread overview]
Message-ID: <20240131102021.1257902-4-e@80x24.org> (raw)
In-Reply-To: <20240131102021.1257902-1-e@80x24.org>
This contains gmane-specific header munging to unmunge the
things gmane dones to headers. While we're at it, document the
generic `lei convert' invocation for users who don't need the
gmane-specific header munging.
---
scripts/slrnspool2maildir | 90 ++++++++++++++++++++-------------------
1 file changed, 47 insertions(+), 43 deletions(-)
diff --git a/scripts/slrnspool2maildir b/scripts/slrnspool2maildir
index 8e2ba08a..ba0729ec 100755
--- a/scripts/slrnspool2maildir
+++ b/scripts/slrnspool2maildir
@@ -1,51 +1,55 @@
#!/usr/bin/perl -w
-# Copyright (C) 2013-2021 all contributors <meta@public-inbox.org>
+# Copyright (C) all contributors <meta@public-inbox.org>
# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-#
-# One-off script to convert an slrnpull news spool to Maildir
=begin usage
+One-off script to convert an slrnpull spool from gmane to Maildir
+Note: this contains Gmane-specific header munging to workaround
+the munging done by Gmane.
+
./slrnspool2maildir SLRNPULL_ROOT/news/foo/bar /path/to/maildir/
-=cut
-use strict;
-use warnings;
-use Email::Filter;
-use Email::LocalDelivery;
-use File::Glob qw(bsd_glob GLOB_NOSORT);
-sub usage { "Usage:\n".join('',grep(/\t/, `head -n 12 $0`)) }
-my $spool = shift @ARGV or die usage();
-my $dir = shift @ARGV or die usage();
--d $dir or die "$dir is not a directory\n";
-$dir .= '/' unless $dir =~ m!/\z!;
-foreach my $sub (qw(cur new tmp)) {
- my $nd = "$dir/$sub";
- -d $nd and next;
- mkdir $nd or die "mkdir $nd failed: $!\n";
-}
-foreach my $n (grep(/\d+\z/, bsd_glob("$spool/*", GLOB_NOSORT))) {
- if (open my $fh, '<', $n) {
- my $f = Email::Filter->new(data => do { local $/; <$fh> });
- my $s = $f->simple;
+A generic replacement w/o Gmane-specific munging could treat
+the slrnpull spool as an MH folder with lei:
- # gmane rewrites Received headers, which increases spamminess
- # Some older archives set Original-To
- foreach my $x (qw(Received To)) {
- my @h = $s->header("Original-$x");
- if (@h) {
- $s->header_set($x, @h);
- $s->header_set("Original-$x");
- }
+ lei convert mh:SLRNPULL_ROOT/news/foo/bar -o /path/to/maildir
+ # (and `lei daemon-kill' if you don't want the daemon to linger)
+=cut
+use v5.12;
+use autodie;
+# warning: unstable internal APIs:
+use PublicInbox::Eml;
+use PublicInbox::LeiToMail;
+use PublicInbox::MHreader;
+use PublicInbox::IO qw(read_all);
+use File::Path qw(make_path);
+use File::Spec ();
+sub usage {
+ open my $fh, '<', __FILE__;
+ ("Usage:\n", grep { /^=begin usage/../^=cut/ and !/^=/m } <$fh>);
+}
+my $spool = shift @ARGV or die usage();
+my $dst = shift @ARGV or die usage();
+$dst .= '/' unless $dst =~ m!/\z!;
+File::Path::make_path(map { $dst.$_ } qw(tmp new cur));
+$dst = File::Spec->rel2abs($dst).'/';
+opendir my $cwdfh, '.';
+my $mhr = PublicInbox::MHreader->new($spool, $cwdfh);
+my $smsg;
+$mhr->mh_each_eml(sub {
+ my ($d, $n, $kw, $eml) = @_;
+ # gmane rewrites Received headers, which increases spamminess
+ # Some older archives set Original-To
+ for my $x (qw(Received To)) {
+ my @h = $eml->header_raw("Original-$x");
+ if (@h) {
+ $eml->header_set($x, @h);
+ $eml->header_set("Original-$x");
}
-
- # triggers for the SA HEADER_SPAM rule
- foreach my $drop (qw(Approved)) { $s->header_set($drop) }
-
- # appears to be an old gmane bug:
- $s->header_set('connect()');
-
- $f->exit(0);
- $f->accept($dir);
- } else {
- warn "Failed to open $n: $!\n";
}
-}
+ # `Approved' triggers the SA HEADER_SPAM rule
+ # `connect()' appears to be an old gmane bug:
+ $eml->header_set($_) for ('Approved', 'connect()');
+ my $buf = $eml->as_string;
+ $smsg->{blob} = $n;
+ PublicInbox::LeiToMail::_buf2maildir($dst, \$buf, $smsg, 'new/');
+});
next prev parent reply other threads:[~2024-01-31 10:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 10:20 [PATCH 0/5] more MH-related updates Eric Wong
2024-01-31 10:20 ` [PATCH 1/5] lei convert: explicitly allow --sort for inputs Eric Wong
2024-01-31 10:20 ` [PATCH 2/5] import: drop redundant `use' statement Eric Wong
2024-01-31 10:20 ` Eric Wong [this message]
2024-01-31 10:20 ` [PATCH 4/5] scripts/import_*: update usage to include lei tips Eric Wong
2024-01-31 10:20 ` [PATCH 5/5] lei: sort MH inputs sequentially by default Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240131102021.1257902-4-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).