From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 97BB41F461; Sat, 18 May 2019 07:15:18 +0000 (UTC) Date: Sat, 18 May 2019 07:15:18 +0000 From: Eric Wong To: "Eric W. Biederman" Cc: meta@public-inbox.org Subject: Re: [RFC][PATCH] Config.pm: Add support for mailing list information Message-ID: <20190518071518.fqcwveehh726wxdu@dcvr> References: <877eap26rx.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <877eap26rx.fsf@xmission.com> List-Id: "Eric W. Biederman" wrote: > > The world has turned since I first started following mailing lists and > to my surprise every mailling list that I am subscribed to properly > sets the "List-ID:" mailing list header. So instead of doing > something clever and flexible I am adding support for looking up > public inbox mailing lists by their mailing list name. Agreed, and good to know List-Id is getting standardized on. > That makes the work needed for each email trivial and easy to understand. > - Parse the "List-ID:" header. > - Lookup in the configuration which mailbox is connected to that > "List-ID:" > - Deliver the mail to that mailbox. This is a great idea. One great side-effect would be avoiding the the need to expose the mirror delivery address for server admins who use -mda for mirroring lists: https://public-inbox.org/meta/20180612170546.GA5945@chatter/ > To that end this change enhances PublicInbox to have an additional > mailbox configuration parameter "list" that holds the mailling list > name. I think using "listId" in configs would be preferable, as I think it's clear that we're matching the RFC2919 header and not any other thing like "X-Mailing-List". We'll use camelCase for user docs, and only alllowercase for parsing (consistent with git config naming). I actually hate that naming convention of git config keys, so "list_id" for internal variables. Spelling: s/mailling/mailing/g (you seem to spell it both ways throughout the email) Also, as a Perl user, the word "list" alone is also a bit ambiguous because it tends to get used interchangeably with "array"; and it could be confused in my brain as a throwaway variable. > A method is added to the PublicInbox config object called > lookup_list that given a mailing list name will return > the PublicInbox in the configarion that is configured to > handle that mailing list. > > Signed-off-by: "Eric W. Biederman" > --- > > Some context. I have my mailing list email coming in via imap, and I > have a script that looks at List-ID and delivers them to the appropriate > public-inbox. I am hoping to get my script at least into the PublicInbox > scripts directory. > > I have looked and see that you a watchheader directive that effectively > does the same thing that I am doing. Even so, I think a mailing list > having one or more mailling list names is more fundamental. A mailing > name is a well supported concept and it differs from an address > in that a mailing list name does not have an "@". Yes, for -watch, watchheader is a more flexible variant of this. listId could be used as a shortcut. > So as part of getting my small changes to your PublicInbox code > to support my email fetching script. Do you mind the addition > of a mailing list name configuration parameter? > If you don't mind can we apply the patch below? Don't mind at all, but I would like some comments in this email addressed, along with some tests for -mda, at least. > Do I need to tweak > public-inbox-watch to be able to use the list parameter? I think translating the listId directive into a watchheader regexp will be sufficient. > I can't > really test it as my setup is quite different but I am willing to > work things through so that the code is harmonized and people > can benefit from each others improvements. Looks like watchheader is also missing tests :x > lib/PublicInbox/Config.pm | 33 ++++++++++++++++++++++++++++++++- > 1 file changed, 32 insertions(+), 1 deletion(-) > > diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm > index 09f9179b085a..70bd88f51556 100644 > --- a/lib/PublicInbox/Config.pm > +++ b/lib/PublicInbox/Config.pm > @@ -25,6 +25,7 @@ sub new { > > # caches > $self->{-by_addr} ||= {}; > + $self->{-by_list} ||= {}; -by_list_id > $self->{-by_name} ||= {}; > $self->{-by_newsgroup} ||= {}; > $self->{-no_obfuscate} ||= {}; > @@ -84,6 +85,32 @@ sub lookup { > _fill($self, $pfx); > } > > +sub lookup_list { lookup_list_id > + my ($self, $list) = @_; > + my $inbox = $self->{-by_list}->{$list}; > + return $inbox if $inbox; "my $ibx" I've been starting to standardize on $ibx as the identifier for PublicInbox::Inbox references (and maybe leave $inbox for human-readable names). > + foreach my $k (keys %$self) { > + $k =~ /\A(publicinbox\.[\w-]+)\.list\z/ or next; \.listid\z/ > + my $v = $self->{$k}; > + if (ref($v) eq "ARRAY") { > + foreach my $alias (@$v) { > + ($alias eq $list) or next; > + $pfx = $1; > + last; > + } > + } else { > + ($v eq $list) or next; > + $pfx = $1; > + last; > + } It looks these "eq" comparisons should be case-insensitive: https://tools.ietf.org/html/rfc2919#section-6 > + } > + defined $pfx or return; > + _fill($self, $pfx); > +} > + > sub lookup_name ($$) { > my ($self, $name) = @_; > $self->{-by_name}->{$name} || _fill($self, "publicinbox.$name"); > @@ -389,7 +416,7 @@ sub _fill { > } > # TODO: more arrays, we should support multi-value for > # more things to encourage decentralization > - foreach my $k (qw(address altid nntpmirror coderepo hide)) { > + foreach my $k (qw(address altid nntpmirror coderepo hide list)) { > if (defined(my $v = $self->{"$pfx.$k"})) { > $ibx->{$k} = _array($v); > } > @@ -412,6 +439,10 @@ sub _fill { > $self->{-by_addr}->{$lc_addr} = $ibx; > $self->{-no_obfuscate}->{$lc_addr} = 1; > } > + foreach (@{$ibx->{list}}) { > + my $list = $_; > + $self->{-by_list}->{$list} = $ibx; > + } No sense in "my $list = $_"; either "foreach my $list_id" for larger loops, or use $_ for tiny loops. Anyways, great idea, needs some minor tweaks. Thanks!