unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* Indicating the mirror's origin
@ 2023-06-14 18:42 Konstantin Ryabitsev
  2023-06-14 20:18 ` Uwe Kleine-König
  2023-06-14 23:50 ` Eric Wong
  0 siblings, 2 replies; 8+ messages in thread
From: Konstantin Ryabitsev @ 2023-06-14 18:42 UTC (permalink / raw)
  To: meta; +Cc: a.fatoum, u.kleine-koenig

Good day:

We've had a few requests to mirror public-inbox archives that originate on
other systems so they can also be searchable and viewable via lore.kernel.org.
I've been dragging my feet on these requests, because they are a potential
liability in terms of GDPR compliance.

If we are merely mirroring the archive from some other location, then there
should be a clear indication of the origin of the data and contact information
of the maintainer of the remote archive where someone could send requests for
any data removal. It's best if this is visible both via the web view and in
raw messages retrieved via our service, e.g. via an "X-Archive-Origin:" header
or something similar.

Any thoughts on this issue?

CC'ing the folks who have been dutifully asking me to mirror their lists on
lore, and who I'm sure are sick and tired of me not getting any movement on
this issue.

-K

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Indicating the mirror's origin
  2023-06-14 18:42 Indicating the mirror's origin Konstantin Ryabitsev
@ 2023-06-14 20:18 ` Uwe Kleine-König
  2023-06-14 20:56   ` Konstantin Ryabitsev
  2023-06-14 23:50 ` Eric Wong
  1 sibling, 1 reply; 8+ messages in thread
From: Uwe Kleine-König @ 2023-06-14 20:18 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta, a.fatoum

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

Hello Konstantin,

On Wed, Jun 14, 2023 at 02:42:15PM -0400, Konstantin Ryabitsev wrote:
> We've had a few requests to mirror public-inbox archives that originate on
> other systems so they can also be searchable and viewable via lore.kernel.org.
> I've been dragging my feet on these requests, because they are a potential
> liability in terms of GDPR compliance.

What is the relevant GDPR liability here? I assume someone who sent a
mail to (say) barebox@lists.infradead.org can request that you remove
your copy of that mail?!
 
> If we are merely mirroring the archive from some other location, then there
> should be a clear indication of the origin of the data and contact information
> of the maintainer of the remote archive where someone could send requests for
> any data removal. It's best if this is visible both via the web view and in
> raw messages retrieved via our service, e.g. via an "X-Archive-Origin:" header
> or something similar.

If my assumption above is the relevant issue, I wonder how an
X-Archive-Origin header helps you. If you are sued/requested to remove
that person's private data (and the claim is only directed to you (i.e.
for lore.kernel.org) but not to Pengutronix (i.e. for lore.barebox.org)
because their are in different jurisdictions) do you expect to be able
to weasel out of the request by pointing to us? INAL, but that doesn't
match my understanding of how things work. I would expect that you would
have to care for your copy and Pengutronix for theirs.

So I think what is needed here is that you can drop a mail from your
copy of the archive even if the origin doesn't necessarily drops it.

(I wonder what should be the effect on lore.kernel.org if
lore.barebox.org removes a mail. Should it disappear from the former,
too?)

> CC'ing the folks who have been dutifully asking me to mirror their lists on
> lore, and who I'm sure are sick and tired of me not getting any movement on
> this issue.

Thanks for bringing your issue forward, I hope that helps to get us
nearer to have a barebox archive (and a few more) on lore.kernel.org.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Indicating the mirror's origin
  2023-06-14 20:18 ` Uwe Kleine-König
@ 2023-06-14 20:56   ` Konstantin Ryabitsev
  0 siblings, 0 replies; 8+ messages in thread
From: Konstantin Ryabitsev @ 2023-06-14 20:56 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: meta, a.fatoum

On Wed, Jun 14, 2023 at 10:18:57PM +0200, Uwe Kleine-König wrote:
> Hello Konstantin,
> 
> On Wed, Jun 14, 2023 at 02:42:15PM -0400, Konstantin Ryabitsev wrote:
> > We've had a few requests to mirror public-inbox archives that originate on
> > other systems so they can also be searchable and viewable via lore.kernel.org.
> > I've been dragging my feet on these requests, because they are a potential
> > liability in terms of GDPR compliance.
> 
> What is the relevant GDPR liability here? I assume someone who sent a
> mail to (say) barebox@lists.infradead.org can request that you remove
> your copy of that mail?!

It feels stupid to subscribe an archiver agent to the list when a public-inbox
repository is conveniently available for cloning. However, if we just clone
the repository over and integrate it into lore, we need to indicate that we're
just copying bits over from some other location -- I cannot delete things from
the archive on my own any more.

> (I wonder what should be the effect on lore.kernel.org if
> lore.barebox.org removes a mail. Should it disappear from the former,
> too?)

Yes, if we are mirroring the underlying archive git repositories, any deletes
you make on your end will propagate to us.

-K

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Indicating the mirror's origin
  2023-06-14 18:42 Indicating the mirror's origin Konstantin Ryabitsev
  2023-06-14 20:18 ` Uwe Kleine-König
@ 2023-06-14 23:50 ` Eric Wong
  2023-06-15 14:47   ` Konstantin Ryabitsev
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Wong @ 2023-06-14 23:50 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta, a.fatoum, u.kleine-koenig

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Good day:
> 
> We've had a few requests to mirror public-inbox archives that originate on
> other systems so they can also be searchable and viewable via lore.kernel.org.
> I've been dragging my feet on these requests, because they are a potential
> liability in terms of GDPR compliance.

I just tried using `git replace' for the first time:

	git replace --edit $BLOB_OID

And all the `git cat-file --batch' invocations appear to work as
if the original blob contents never existed.  Of course,
reindexing could be necessary, as would changing the git config
to ensure `git fetch' doesn't destroy elements in the
refs/replace/ namespace.

git clones/fetch still include both the original and replacement
blob; though (favoring the replacement); so perhaps `git replace'
isn't a fit...

Then; Worse case would be to temporarily remove the mirror; or
forking it (via -edit/-purge + subscribe) until upstream cleans
it up.

lei's v2 inbox output could be used as a subscription mechanism.

> If we are merely mirroring the archive from some other location, then there
> should be a clear indication of the origin of the data and contact information
> of the maintainer of the remote archive where someone could send requests for
> any data removal. It's best if this is visible both via the web view and in
> raw messages retrieved via our service, e.g. via an "X-Archive-Origin:" header
> or something similar.

I sometimes use the $INBOX_DIR/description file for that and it
affects WWW and NNTP, but not IMAP/POP3.  I'm not sure if I want
to reintroduce header injection in case there's some conflict
with DKIM or other signature mechanisms[1]

> Any thoughts on this issue?

IANAL, obviously...

[1] https://public-inbox.org/meta/20201210214329.do66z6gzvepxc5w3@chatter.i7.local/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Indicating the mirror's origin
  2023-06-14 23:50 ` Eric Wong
@ 2023-06-15 14:47   ` Konstantin Ryabitsev
  2023-06-19  6:09     ` Uwe Kleine-König
  2023-06-20  2:37     ` [RFC] support publicinbox.$FOO.appendHeader in read-only endpoints Eric Wong
  0 siblings, 2 replies; 8+ messages in thread
From: Konstantin Ryabitsev @ 2023-06-15 14:47 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta, a.fatoum, u.kleine-koenig

On Wed, Jun 14, 2023 at 11:50:15PM +0000, Eric Wong wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > Good day:
> > 
> > We've had a few requests to mirror public-inbox archives that originate on
> > other systems so they can also be searchable and viewable via lore.kernel.org.
> > I've been dragging my feet on these requests, because they are a potential
> > liability in terms of GDPR compliance.
> 
> I just tried using `git replace' for the first time:

I think I didn't quite convey my idea -- let me try to step back a bit.

What I have is lore.kernel.org, which is actually 3 different frontends all
pulling git repositories from some other source of origin. Currently, I have
two:

- lkml.kernel.org, which subscribes to external lists via regular SMTP
- subspace.kernel.org, which is our own mlmmj server and where public-inbox
  repositories are created via public-inbox-watch

Since we control both lkml and subspace, we are the origin of the data, so if
anyone requests archive removal, we can easily comply.

Now, I want to be able to add other external public-inbox repositories to be
mirrored on lore.kernel.org, but with some clear indication that we're not the
origin of that data, we're merely mirroring it. Any GDPR removal requests need
to be sent to $ORIGIN and we'll just propagate any changes.

> 	git replace --edit $BLOB_OID

I don't want to go down that route, because while we can do such surgery on a
node, it would need to be rerun again if we bring up a new mirror node, and
it's almost guaranteed to be forgotten.

> I sometimes use the $INBOX_DIR/description file for that and it
> affects WWW and NNTP, but not IMAP/POP3.  I'm not sure if I want
> to reintroduce header injection in case there's some conflict
> with DKIM or other signature mechanisms[1]

I don't think we need to worry about it if we pick a header that's almost
certain to not be included in the default DKIM signature set.
X-Originally-Archived-At: or some other header is guaranteed to never be
signed.

-K

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Indicating the mirror's origin
  2023-06-15 14:47   ` Konstantin Ryabitsev
@ 2023-06-19  6:09     ` Uwe Kleine-König
  2023-06-20  2:37     ` [RFC] support publicinbox.$FOO.appendHeader in read-only endpoints Eric Wong
  1 sibling, 0 replies; 8+ messages in thread
From: Uwe Kleine-König @ 2023-06-19  6:09 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Eric Wong, meta, a.fatoum

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]

Hello Konstantin,

On Thu, Jun 15, 2023 at 10:47:46AM -0400, Konstantin Ryabitsev wrote:
> Now, I want to be able to add other external public-inbox repositories to be
> mirrored on lore.kernel.org, but with some clear indication that we're not the
> origin of that data, we're merely mirroring it. Any GDPR removal requests need
> to be sent to $ORIGIN and we'll just propagate any changes.

I think this is the part that won't work. In my understanding, if
someone requests that their data is removed from *your* system, you
won't be able to say "But the data is only a copy from that system over
there, direct your request at them." Yes, if the source of the data
removes the offending content, you can just sync and so remove the
content in your copy, too. But if the operators of the source don't
(because they don't cooperate, or because the relevant people who can
remove the content are on vacation or because they are in a different
jurisdiction or just because the plaintiff doesn't care about their data
at the source), you would need to have a way to remove the content.
Either by removing the content locally in your copy in a way that you
don't get it back in the next sync, or by shutting down that complete
particular archive. Without a visible pointer about the origin of the
copy the situation is identical, so I don't see an advantage when
implementing it, at least not for this purpose.

But maybe I'm missing something or in your jurisdiction things are
different than in (my inexpert understanding of) ours. (Note, I don't
wanna say that if you get such a request to delete some content, we
won't cooperate, but if I were in your position, I wouldn't want to rely
on that kind of cooperation.)

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC] support publicinbox.$FOO.appendHeader in read-only endpoints
  2023-06-15 14:47   ` Konstantin Ryabitsev
  2023-06-19  6:09     ` Uwe Kleine-König
@ 2023-06-20  2:37     ` Eric Wong
  2023-07-05 14:27       ` Ahmad Fatoum
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Wong @ 2023-06-20  2:37 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta, a.fatoum, u.kleine-koenig

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Wed, Jun 14, 2023 at 11:50:15PM +0000, Eric Wong wrote:
> > affects WWW and NNTP, but not IMAP/POP3.  I'm not sure if I want
> > to reintroduce header injection in case there's some conflict
> > with DKIM or other signature mechanisms[1]
> 
> I don't think we need to worry about it if we pick a header that's almost
> certain to not be included in the default DKIM signature set.
> X-Originally-Archived-At: or some other header is guaranteed to never be
> signed.

*shrug*  I'm not sure how useful this is, actually.

-----------8<---------
Subject: [PATCH] support publicinbox.$FOO.appendHeader in read-only endpoints

This may be used to inject arbitrary headers in the raw message
from any read-only endpoints.  This may be useful for mirrors to
indicate they're mirroring another source and can't edit/remove
messages easily.

This is set per-inbox and supports multiple key:value pairs:

	[publicinbox "foo"]
		appendheader = KEY1:VALUE1
		appendheader = KEY2:VALUE2

No variable expansion is currently supported as it's unclear if
it's necessary (e.g. for Message-ID in URLs).

Link: https://public-inbox.org/meta/20230615-focal-erosion-poop-df8246@meerkat/
---
 lib/PublicInbox/Config.pm |  2 +-
 lib/PublicInbox/Eml.pm    | 20 +++++++++++++++
 lib/PublicInbox/IMAP.pm   | 17 ++++++++++---
 lib/PublicInbox/Inbox.pm  |  8 ++++++
 lib/PublicInbox/Mbox.pm   |  8 +++---
 lib/PublicInbox/NNTP.pm   |  2 ++
 lib/PublicInbox/POP3.pm   |  2 +-
 t/netd.t                  | 53 ++++++++++++++++++++++++++++++++-------
 8 files changed, 94 insertions(+), 18 deletions(-)

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 2f1b4122..c3ec1793 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -457,7 +457,7 @@ sub _fill_ibx {
 	# more things to encourage decentralization
 	for my $k (qw(address altid nntpmirror imapmirror
 			coderepo hide listid url
-			infourl watchheader
+			infourl watchheader appendheader
 			nntpserver imapserver pop3server)) {
 		my $v = $self->{"$pfx.$k"} // next;
 		$ibx->{$k} = _array($v);
diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 8b999e1a..3f7e27e0 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -392,6 +392,26 @@ sub header_str_set {
 	header_set($self, $name, @vals);
 }
 
+# only used for publicinbox.$NAME.appendHeader
+sub header_append {
+	my ($self, $pfx, @vals) = @_;
+	$pfx .= ': ';
+	my $len = 78 - length($pfx);
+	@vals = map {;
+		utf8::encode(my $v = $_); # to bytes, support SMTPUTF8
+		# folding differs from Email::Simple::Header,
+		# we favor tabs for visibility (and space savings :P)
+		if (length($_) >= $len && (/\n[^ \t]/s || !/\n/s)) {
+			local $Text::Wrap::columns = $len;
+			local $Text::Wrap::huge = 'overflow';
+			$pfx . wrap('', "\t", $v) . $self->{crlf};
+		} else {
+			$pfx . $v . $self->{crlf};
+		}
+	} @vals;
+	${$self->{hdr}} .= join('', @vals);
+}
+
 sub mhdr_decode ($) {
 	eval { $MIME_Header->decode($_[0], Encode::FB_DEFAULT) } // $_[0];
 }
diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
index 00f99ef7..dd55a2e6 100644
--- a/lib/PublicInbox/IMAP.pm
+++ b/lib/PublicInbox/IMAP.pm
@@ -642,7 +642,7 @@ sub emit_rfc822_header {
 	$self->msg_more(${$eml->{hdr}});
 }
 
-# n.b. this is sorted to be after any emit_eml_new ops
+# n.b. this is sorted to be after any op_eml_new ops
 sub emit_rfc822_text {
 	my ($self, $k, undef, $bref) = @_;
 	$self->msg_more(" $k {".length($$bref)."}\r\n");
@@ -668,9 +668,20 @@ sub to_crlf_full {
 	${$_[0]} =~ s/\A[\r\n]*From [^\r\n]*\r\n//s;
 }
 
-sub op_crlf_bref { to_crlf_full($_[3]) }
+# used by PublicInbox::POP3, too
+sub op_crlf_bref {
+	if ($_[0]->{ibx}->{appendheader}) {
+		my $eml = PublicInbox::Eml->new($_[3]);
+		$_[0]->{ibx}->append_headers($eml);
+		${$_[3]} = $eml->as_string; # replace bref
+	}
+	to_crlf_full($_[3]);
+}
 
-sub op_crlf_hdr { to_crlf_full($_[4]->{hdr}) }
+sub op_crlf_hdr { # $_[4] = eml
+	$_[0]->{ibx}->append_headers($_[4]) if $_[0]->{ibx}->{appendheader};
+	to_crlf_full($_[4]->{hdr});
+}
 
 sub op_crlf_bdy { ${$_[4]->{bdy}} =~ s/(?<!\r)\n/\r\n/sg if $_[4]->{bdy} }
 
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 9afbb478..d5108e4a 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -410,4 +410,12 @@ sub mailboxid { # rfc 8474, 8620, 8621
 
 sub thing_type { 'public inbox' }
 
+sub append_headers {
+	my ($self, $eml) = @_;
+	# TODO: do we need MSGID expansions?
+	for (@{$self->{appendheader}}) {
+		$eml->header_append(split(/\s*:\s*/, $_, 2));
+	}
+}
+
 1;
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index bf61bb0e..0b47cb8c 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -89,15 +89,15 @@ sub emit_raw {
 
 sub msg_hdr ($$) {
 	my ($ctx, $eml) = @_;
-	my $header_obj = $eml->header_obj;
 
 	# drop potentially confusing headers, ssoma already should've dropped
 	# Lines and Content-Length
 	foreach my $d (qw(Lines Bytes Content-Length Status)) {
-		$header_obj->header_set($d);
+		$eml->header_set($d);
 	}
-	my $crlf = $header_obj->crlf;
-	my $buf = $header_obj->as_string;
+	$ctx->{ibx}->append_headers($eml) if $ctx->{ibx}->{appendheader};
+	my $crlf = $eml->{crlf};
+	my $buf = ${$eml->{hdr}};
 	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
 	$buf =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
 	"From mboxrd\@z Thu Jan  1 00:00:00 1970" . $crlf . $buf . $crlf;
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 316b7775..91e6357f 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -462,6 +462,8 @@ sub set_nntp_headers ($$) {
 	# *something* here is required for leafnode, try to follow
 	# RFC 5536 3.1.5...
 	$hdr->header_set('Path', $server_name . '!not-for-mail');
+
+	$ibx->append_headers($hdr) if $ibx->{appendheader};
 }
 
 sub art_lookup ($$$) {
diff --git a/lib/PublicInbox/POP3.pm b/lib/PublicInbox/POP3.pm
index d32793e4..863cb201 100644
--- a/lib/PublicInbox/POP3.pm
+++ b/lib/PublicInbox/POP3.pm
@@ -229,7 +229,7 @@ sub retr_cb { # called by git->cat_async via ibx_async_cat
 		$self->close;
 		die "BUG: $hex != $oid";
 	}
-	PublicInbox::IMAP::to_crlf_full($bref);
+	PublicInbox::IMAP::op_crlf_bref($self, undef, undef, $bref);
 	if (defined $top_nr) {
 		my ($hdr, $bdy) = split(/\r\n\r\n/, $$bref, 2);
 		$bref = \$hdr;
diff --git a/t/netd.t b/t/netd.t
index abdde124..47a0182f 100644
--- a/t/netd.t
+++ b/t/netd.t
@@ -6,7 +6,8 @@ use Socket qw(IPPROTO_TCP SOL_SOCKET);
 use PublicInbox::TestCommon;
 # IO::Poll and Net::NNTP are part of the standard library, but
 # distros may split them off...
-require_mods(qw(-imapd IO::Socket::SSL Mail::IMAPClient IO::Poll Net::NNTP));
+require_mods(qw(-imapd IO::Socket::SSL Mail::IMAPClient IO::Poll Net::NNTP
+	Net::POP3));
 my $imap_client = 'Mail::IMAPClient';
 $imap_client->can('starttls') or
 	plan skip_all => 'Mail::IMAPClient does not support TLS';
@@ -21,6 +22,7 @@ unless (-r $key && -r $cert) {
 use_ok 'PublicInbox::TLS';
 use_ok 'IO::Socket::SSL';
 require_git('2.6');
+require_mods(qw(File::FcntlLock)) if $^O !~ /\A(?:linux|freebsd)\z/;
 
 my ($tmpdir, $for_destroy) = tmpdir();
 my $err = "$tmpdir/stderr.log";
@@ -35,7 +37,8 @@ for (1..3) {
 	pipe(my ($r, $w)) or xbail "pipe: $!";
 	push @pad_pipes, $r, $w;
 };
-my %srv = map { $_ => tcp_server() } qw(imap nntp imaps nntps);
+my %srv = map { $_ => tcp_server() } qw(imap nntp imaps nntps pop3 http);
+my ($hdr_key, $hdr_val) = qw(x-archive-source https://example.com/);
 my $ibx = create_inbox 'netd', version => 2,
 			-primary_address => $addr, indexlevel => 'basic', sub {
 	my ($im, $ibx) = @_;
@@ -43,11 +46,14 @@ my $ibx = create_inbox 'netd', version => 2,
 	$pi_config = "$ibx->{inboxdir}/pi_config";
 	open my $fh, '>', $pi_config or BAIL_OUT "open: $!";
 	print $fh <<EOF or BAIL_OUT "print: $!";
+[publicinbox]
+	pop3state = $tmpdir/p3state
 [publicinbox "netd"]
 	inboxdir = $ibx->{inboxdir}
 	address = $addr
 	indexlevel = basic
 	newsgroup = $group
+	appendHeader = $hdr_key:$hdr_val
 EOF
 	close $fh or BAIL_OUT "close: $!\n";
 };
@@ -70,16 +76,45 @@ my %o = (
 	SSL_verify_mode => SSL_VERIFY_PEER(),
 	SSL_ca_file => 'certs/test-ca.pem',
 );
+
+my $ok_inject = sub {
+	my ($blob, $msg) = @_;
+	my $eml = PublicInbox::Eml->new($blob);
+	is_deeply([$eml->header($hdr_key)], [ $hdr_val ], "$msg header added");
+};
+
+{
+	my ($host, $port) = tcp_host_port($srv{imap});
+	my %mic_opt = (Server => $host, Port => $port, Uid => 1);
+	$mic_opt{Authmechanism} = 'ANONYMOUS';
+	$mic_opt{Authcallback} = sub { '' };
+	my $mic = $imap_client->new(%mic_opt);
+	ok($mic && $mic->examine("$group.0"), 'IMAP connected');
+	my $ret = $mic->fetch_hash(1, 'RFC822');
+	$ok_inject->($ret->{1}->{RFC822}, 'IMAP RFC822 (full)');
+	$ret = $mic->fetch_hash(1, 'RFC822.HEADER');
+	$ok_inject->($ret->{1}->{'RFC822.HEADER'}, 'IMAP RFC822.HEADER');
+}
+{
+	my $nntp = Net::NNTP->new(my $host_port = tcp_host_port($srv{nntp}));
+	ok($nntp && $nntp->group($group), 'NNTP group');
+	$ok_inject->(join('', @{$nntp->article(1)}), 'NNTP ->article');
+	$ok_inject->(join('', @{$nntp->head(1)}), 'NNTP ->head');
+}
 {
-	my $c = tcp_connect($srv{imap});
-	my $msg = <$c>;
-	like($msg, qr/IMAP4rev1/, 'connected to IMAP');
+	my ($host, $port) = tcp_host_port($srv{pop3});
+	my $pop3 = Net::POP3->new($host, Port => $port);
+	my $locked_mb = ('e'x32)."\@$group";
+	ok($pop3 && $pop3->apop("$locked_mb.0", 'anonymous'), 'APOP connected');
+	$ok_inject->(join('', @{$pop3->get(1)}), 'POP3 ->get');
 }
 {
-	my $c = tcp_connect($srv{nntp});
-	my $msg = <$c>;
-	like($msg, qr/^201 .*? ready - post via email/, 'connected to NNTP');
+	my $c = tcp_connect($srv{http});
+	ok($c and print $c <<EOM, 'HTTP connected');
+GET /netd/20180720072141.GA15957\@example/raw HTTP/1.0\r\n\r
+EOM
+	my $s = do { local $/; <$c> };
+	$ok_inject->((split(/\r\n\r\n/, $s, 2))[1], 'HTTP $MSGID/raw');
 }
 
-# TODO: more tests
 done_testing;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC] support publicinbox.$FOO.appendHeader in read-only endpoints
  2023-06-20  2:37     ` [RFC] support publicinbox.$FOO.appendHeader in read-only endpoints Eric Wong
@ 2023-07-05 14:27       ` Ahmad Fatoum
  0 siblings, 0 replies; 8+ messages in thread
From: Ahmad Fatoum @ 2023-07-05 14:27 UTC (permalink / raw)
  To: Eric Wong, Konstantin Ryabitsev; +Cc: meta, u.kleine-koenig

Hello Eric,
Hello Konstantin,

On 20.06.23 04:37, Eric Wong wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
>> On Wed, Jun 14, 2023 at 11:50:15PM +0000, Eric Wong wrote:
>>> affects WWW and NNTP, but not IMAP/POP3.  I'm not sure if I want
>>> to reintroduce header injection in case there's some conflict
>>> with DKIM or other signature mechanisms[1]
>>
>> I don't think we need to worry about it if we pick a header that's almost
>> certain to not be included in the default DKIM signature set.
>> X-Originally-Archived-At: or some other header is guaranteed to never be
>> signed.
> 
> *shrug*  I'm not sure how useful this is, actually.

Thanks for the patch, Eric!

Konstantin, would this work for you? Is there something I could
help with?

Thanks,
Ahmad

> 
> -----------8<---------
> Subject: [PATCH] support publicinbox.$FOO.appendHeader in read-only endpoints
> 
> This may be used to inject arbitrary headers in the raw message
> from any read-only endpoints.  This may be useful for mirrors to
> indicate they're mirroring another source and can't edit/remove
> messages easily.
> 
> This is set per-inbox and supports multiple key:value pairs:
> 
> 	[publicinbox "foo"]
> 		appendheader = KEY1:VALUE1
> 		appendheader = KEY2:VALUE2
> 
> No variable expansion is currently supported as it's unclear if
> it's necessary (e.g. for Message-ID in URLs).
> 
> Link: https://public-inbox.org/meta/20230615-focal-erosion-poop-df8246@meerkat/
> ---
>  lib/PublicInbox/Config.pm |  2 +-
>  lib/PublicInbox/Eml.pm    | 20 +++++++++++++++
>  lib/PublicInbox/IMAP.pm   | 17 ++++++++++---
>  lib/PublicInbox/Inbox.pm  |  8 ++++++
>  lib/PublicInbox/Mbox.pm   |  8 +++---
>  lib/PublicInbox/NNTP.pm   |  2 ++
>  lib/PublicInbox/POP3.pm   |  2 +-
>  t/netd.t                  | 53 ++++++++++++++++++++++++++++++++-------
>  8 files changed, 94 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
> index 2f1b4122..c3ec1793 100644
> --- a/lib/PublicInbox/Config.pm
> +++ b/lib/PublicInbox/Config.pm
> @@ -457,7 +457,7 @@ sub _fill_ibx {
>  	# more things to encourage decentralization
>  	for my $k (qw(address altid nntpmirror imapmirror
>  			coderepo hide listid url
> -			infourl watchheader
> +			infourl watchheader appendheader
>  			nntpserver imapserver pop3server)) {
>  		my $v = $self->{"$pfx.$k"} // next;
>  		$ibx->{$k} = _array($v);
> diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
> index 8b999e1a..3f7e27e0 100644
> --- a/lib/PublicInbox/Eml.pm
> +++ b/lib/PublicInbox/Eml.pm
> @@ -392,6 +392,26 @@ sub header_str_set {
>  	header_set($self, $name, @vals);
>  }
>  
> +# only used for publicinbox.$NAME.appendHeader
> +sub header_append {
> +	my ($self, $pfx, @vals) = @_;
> +	$pfx .= ': ';
> +	my $len = 78 - length($pfx);
> +	@vals = map {;
> +		utf8::encode(my $v = $_); # to bytes, support SMTPUTF8
> +		# folding differs from Email::Simple::Header,
> +		# we favor tabs for visibility (and space savings :P)
> +		if (length($_) >= $len && (/\n[^ \t]/s || !/\n/s)) {
> +			local $Text::Wrap::columns = $len;
> +			local $Text::Wrap::huge = 'overflow';
> +			$pfx . wrap('', "\t", $v) . $self->{crlf};
> +		} else {
> +			$pfx . $v . $self->{crlf};
> +		}
> +	} @vals;
> +	${$self->{hdr}} .= join('', @vals);
> +}
> +
>  sub mhdr_decode ($) {
>  	eval { $MIME_Header->decode($_[0], Encode::FB_DEFAULT) } // $_[0];
>  }
> diff --git a/lib/PublicInbox/IMAP.pm b/lib/PublicInbox/IMAP.pm
> index 00f99ef7..dd55a2e6 100644
> --- a/lib/PublicInbox/IMAP.pm
> +++ b/lib/PublicInbox/IMAP.pm
> @@ -642,7 +642,7 @@ sub emit_rfc822_header {
>  	$self->msg_more(${$eml->{hdr}});
>  }
>  
> -# n.b. this is sorted to be after any emit_eml_new ops
> +# n.b. this is sorted to be after any op_eml_new ops
>  sub emit_rfc822_text {
>  	my ($self, $k, undef, $bref) = @_;
>  	$self->msg_more(" $k {".length($$bref)."}\r\n");
> @@ -668,9 +668,20 @@ sub to_crlf_full {
>  	${$_[0]} =~ s/\A[\r\n]*From [^\r\n]*\r\n//s;
>  }
>  
> -sub op_crlf_bref { to_crlf_full($_[3]) }
> +# used by PublicInbox::POP3, too
> +sub op_crlf_bref {
> +	if ($_[0]->{ibx}->{appendheader}) {
> +		my $eml = PublicInbox::Eml->new($_[3]);
> +		$_[0]->{ibx}->append_headers($eml);
> +		${$_[3]} = $eml->as_string; # replace bref
> +	}
> +	to_crlf_full($_[3]);
> +}
>  
> -sub op_crlf_hdr { to_crlf_full($_[4]->{hdr}) }
> +sub op_crlf_hdr { # $_[4] = eml
> +	$_[0]->{ibx}->append_headers($_[4]) if $_[0]->{ibx}->{appendheader};
> +	to_crlf_full($_[4]->{hdr});
> +}
>  
>  sub op_crlf_bdy { ${$_[4]->{bdy}} =~ s/(?<!\r)\n/\r\n/sg if $_[4]->{bdy} }
>  
> diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
> index 9afbb478..d5108e4a 100644
> --- a/lib/PublicInbox/Inbox.pm
> +++ b/lib/PublicInbox/Inbox.pm
> @@ -410,4 +410,12 @@ sub mailboxid { # rfc 8474, 8620, 8621
>  
>  sub thing_type { 'public inbox' }
>  
> +sub append_headers {
> +	my ($self, $eml) = @_;
> +	# TODO: do we need MSGID expansions?
> +	for (@{$self->{appendheader}}) {
> +		$eml->header_append(split(/\s*:\s*/, $_, 2));
> +	}
> +}
> +
>  1;
> diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
> index bf61bb0e..0b47cb8c 100644
> --- a/lib/PublicInbox/Mbox.pm
> +++ b/lib/PublicInbox/Mbox.pm
> @@ -89,15 +89,15 @@ sub emit_raw {
>  
>  sub msg_hdr ($$) {
>  	my ($ctx, $eml) = @_;
> -	my $header_obj = $eml->header_obj;
>  
>  	# drop potentially confusing headers, ssoma already should've dropped
>  	# Lines and Content-Length
>  	foreach my $d (qw(Lines Bytes Content-Length Status)) {
> -		$header_obj->header_set($d);
> +		$eml->header_set($d);
>  	}
> -	my $crlf = $header_obj->crlf;
> -	my $buf = $header_obj->as_string;
> +	$ctx->{ibx}->append_headers($eml) if $ctx->{ibx}->{appendheader};
> +	my $crlf = $eml->{crlf};
> +	my $buf = ${$eml->{hdr}};
>  	# fixup old bug from import (pre-a0c07cba0e5d8b6a)
>  	$buf =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
>  	"From mboxrd\@z Thu Jan  1 00:00:00 1970" . $crlf . $buf . $crlf;
> diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
> index 316b7775..91e6357f 100644
> --- a/lib/PublicInbox/NNTP.pm
> +++ b/lib/PublicInbox/NNTP.pm
> @@ -462,6 +462,8 @@ sub set_nntp_headers ($$) {
>  	# *something* here is required for leafnode, try to follow
>  	# RFC 5536 3.1.5...
>  	$hdr->header_set('Path', $server_name . '!not-for-mail');
> +
> +	$ibx->append_headers($hdr) if $ibx->{appendheader};
>  }
>  
>  sub art_lookup ($$$) {
> diff --git a/lib/PublicInbox/POP3.pm b/lib/PublicInbox/POP3.pm
> index d32793e4..863cb201 100644
> --- a/lib/PublicInbox/POP3.pm
> +++ b/lib/PublicInbox/POP3.pm
> @@ -229,7 +229,7 @@ sub retr_cb { # called by git->cat_async via ibx_async_cat
>  		$self->close;
>  		die "BUG: $hex != $oid";
>  	}
> -	PublicInbox::IMAP::to_crlf_full($bref);
> +	PublicInbox::IMAP::op_crlf_bref($self, undef, undef, $bref);
>  	if (defined $top_nr) {
>  		my ($hdr, $bdy) = split(/\r\n\r\n/, $$bref, 2);
>  		$bref = \$hdr;
> diff --git a/t/netd.t b/t/netd.t
> index abdde124..47a0182f 100644
> --- a/t/netd.t
> +++ b/t/netd.t
> @@ -6,7 +6,8 @@ use Socket qw(IPPROTO_TCP SOL_SOCKET);
>  use PublicInbox::TestCommon;
>  # IO::Poll and Net::NNTP are part of the standard library, but
>  # distros may split them off...
> -require_mods(qw(-imapd IO::Socket::SSL Mail::IMAPClient IO::Poll Net::NNTP));
> +require_mods(qw(-imapd IO::Socket::SSL Mail::IMAPClient IO::Poll Net::NNTP
> +	Net::POP3));
>  my $imap_client = 'Mail::IMAPClient';
>  $imap_client->can('starttls') or
>  	plan skip_all => 'Mail::IMAPClient does not support TLS';
> @@ -21,6 +22,7 @@ unless (-r $key && -r $cert) {
>  use_ok 'PublicInbox::TLS';
>  use_ok 'IO::Socket::SSL';
>  require_git('2.6');
> +require_mods(qw(File::FcntlLock)) if $^O !~ /\A(?:linux|freebsd)\z/;
>  
>  my ($tmpdir, $for_destroy) = tmpdir();
>  my $err = "$tmpdir/stderr.log";
> @@ -35,7 +37,8 @@ for (1..3) {
>  	pipe(my ($r, $w)) or xbail "pipe: $!";
>  	push @pad_pipes, $r, $w;
>  };
> -my %srv = map { $_ => tcp_server() } qw(imap nntp imaps nntps);
> +my %srv = map { $_ => tcp_server() } qw(imap nntp imaps nntps pop3 http);
> +my ($hdr_key, $hdr_val) = qw(x-archive-source https://example.com/);
>  my $ibx = create_inbox 'netd', version => 2,
>  			-primary_address => $addr, indexlevel => 'basic', sub {
>  	my ($im, $ibx) = @_;
> @@ -43,11 +46,14 @@ my $ibx = create_inbox 'netd', version => 2,
>  	$pi_config = "$ibx->{inboxdir}/pi_config";
>  	open my $fh, '>', $pi_config or BAIL_OUT "open: $!";
>  	print $fh <<EOF or BAIL_OUT "print: $!";
> +[publicinbox]
> +	pop3state = $tmpdir/p3state
>  [publicinbox "netd"]
>  	inboxdir = $ibx->{inboxdir}
>  	address = $addr
>  	indexlevel = basic
>  	newsgroup = $group
> +	appendHeader = $hdr_key:$hdr_val
>  EOF
>  	close $fh or BAIL_OUT "close: $!\n";
>  };
> @@ -70,16 +76,45 @@ my %o = (
>  	SSL_verify_mode => SSL_VERIFY_PEER(),
>  	SSL_ca_file => 'certs/test-ca.pem',
>  );
> +
> +my $ok_inject = sub {
> +	my ($blob, $msg) = @_;
> +	my $eml = PublicInbox::Eml->new($blob);
> +	is_deeply([$eml->header($hdr_key)], [ $hdr_val ], "$msg header added");
> +};
> +
> +{
> +	my ($host, $port) = tcp_host_port($srv{imap});
> +	my %mic_opt = (Server => $host, Port => $port, Uid => 1);
> +	$mic_opt{Authmechanism} = 'ANONYMOUS';
> +	$mic_opt{Authcallback} = sub { '' };
> +	my $mic = $imap_client->new(%mic_opt);
> +	ok($mic && $mic->examine("$group.0"), 'IMAP connected');
> +	my $ret = $mic->fetch_hash(1, 'RFC822');
> +	$ok_inject->($ret->{1}->{RFC822}, 'IMAP RFC822 (full)');
> +	$ret = $mic->fetch_hash(1, 'RFC822.HEADER');
> +	$ok_inject->($ret->{1}->{'RFC822.HEADER'}, 'IMAP RFC822.HEADER');
> +}
> +{
> +	my $nntp = Net::NNTP->new(my $host_port = tcp_host_port($srv{nntp}));
> +	ok($nntp && $nntp->group($group), 'NNTP group');
> +	$ok_inject->(join('', @{$nntp->article(1)}), 'NNTP ->article');
> +	$ok_inject->(join('', @{$nntp->head(1)}), 'NNTP ->head');
> +}
>  {
> -	my $c = tcp_connect($srv{imap});
> -	my $msg = <$c>;
> -	like($msg, qr/IMAP4rev1/, 'connected to IMAP');
> +	my ($host, $port) = tcp_host_port($srv{pop3});
> +	my $pop3 = Net::POP3->new($host, Port => $port);
> +	my $locked_mb = ('e'x32)."\@$group";
> +	ok($pop3 && $pop3->apop("$locked_mb.0", 'anonymous'), 'APOP connected');
> +	$ok_inject->(join('', @{$pop3->get(1)}), 'POP3 ->get');
>  }
>  {
> -	my $c = tcp_connect($srv{nntp});
> -	my $msg = <$c>;
> -	like($msg, qr/^201 .*? ready - post via email/, 'connected to NNTP');
> +	my $c = tcp_connect($srv{http});
> +	ok($c and print $c <<EOM, 'HTTP connected');
> +GET /netd/20180720072141.GA15957\@example/raw HTTP/1.0\r\n\r
> +EOM
> +	my $s = do { local $/; <$c> };
> +	$ok_inject->((split(/\r\n\r\n/, $s, 2))[1], 'HTTP $MSGID/raw');
>  }
>  
> -# TODO: more tests
>  done_testing;
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-07-05 14:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-14 18:42 Indicating the mirror's origin Konstantin Ryabitsev
2023-06-14 20:18 ` Uwe Kleine-König
2023-06-14 20:56   ` Konstantin Ryabitsev
2023-06-14 23:50 ` Eric Wong
2023-06-15 14:47   ` Konstantin Ryabitsev
2023-06-19  6:09     ` Uwe Kleine-König
2023-06-20  2:37     ` [RFC] support publicinbox.$FOO.appendHeader in read-only endpoints Eric Wong
2023-07-05 14:27       ` Ahmad Fatoum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).