unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* Using plus (+) in list name
@ 2022-08-21 20:02 Mark Wielaard
  2022-08-21 20:53 ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Wielaard @ 2022-08-21 20:02 UTC (permalink / raw)
  To: meta; +Cc: overseers

Hi,

We are setting up a public-inbox instance for cygwin/gcc/sourceware
lists at https://inbox.sourceware.org/ and it seems to work pretty
nicely. Thanks. Except for lists which have a + in their name like
libstdc++.

I assume this needs some escaping somewhere, but I cannot figure out
where. The .public-inbox/config snippet looks like:

[publicinbox "libstdc++"]
        address = libstdc++@gcc.gnu.org
        url = https://inbox.sourceware.org/libstdc++
        inboxdir = /home/inbox/lists/libstdc++
        indexlevel = full
        newsgroup = inbox.gcc.libstdc++
        listid = libstdc++.gcc.gnu.org

This seems to work fine for nntp and imap, but not https.

It does work when replacing the ++ with pp in the list name and
url. But that looks somewhat odd imho. And the name with ++ can be
used with e.g. mailman:
https://gcc.gnu.org/mailman/listinfo/libstdc++

Is there some way to configure public-inbox-http to be able to use ++
in list names and urls?

We are using the EPEL public-inbox package public-inbox-1.7.0-2.el8.noarch

Thanks,

Mark

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using plus (+) in list name
  2022-08-21 20:02 Using plus (+) in list name Mark Wielaard
@ 2022-08-21 20:53 ` Eric Wong
  2022-08-21 21:43   ` Mark Wielaard
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2022-08-21 20:53 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: meta, overseers

Mark Wielaard <mark@klomp.org> wrote:
> Hi,
> 
> We are setting up a public-inbox instance for cygwin/gcc/sourceware
> lists at https://inbox.sourceware.org/ and it seems to work pretty
> nicely. Thanks. Except for lists which have a + in their name like
> libstdc++.
> 
> I assume this needs some escaping somewhere, but I cannot figure out
> where. The .public-inbox/config snippet looks like:

I seem to remember '+' is OK as-is in the path component of HTTP URLs,
but is escaping for ' ' (SP) in query strings.

At least it's OK for a git-config section name:

> [publicinbox "libstdc++"]
>         address = libstdc++@gcc.gnu.org
>         url = https://inbox.sourceware.org/libstdc++
>         inboxdir = /home/inbox/lists/libstdc++
>         indexlevel = full
>         newsgroup = inbox.gcc.libstdc++
>         listid = libstdc++.gcc.gnu.org
> 
> This seems to work fine for nntp and imap, but not https.

Interesting that NNTP and IMAP work (I wasn't expecting it :x).

I can't remember off the top of my head, but is '+' allowed by
the relevant NNTP and List-Id RFCs?

Anyways, good to see public-inbox getting more adoption :>

> It does work when replacing the ++ with pp in the list name and
> url. But that looks somewhat odd imho. And the name with ++ can be
> used with e.g. mailman:
> https://gcc.gnu.org/mailman/listinfo/libstdc++
> 
> Is there some way to configure public-inbox-http to be able to use ++
> in list names and urls?
> 
> We are using the EPEL public-inbox package public-inbox-1.7.0-2.el8.noarch

Totally untested, but perhaps changing $INBOX_RE in
PublicInbox/WWW.pm will work:

diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index b9b68382..77f463d3 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -23,7 +23,7 @@ use PublicInbox::WwwStatic qw(r path_info_raw);
 use PublicInbox::Eml;
 
 # TODO: consider a routing tree now that we have more endpoints:
-our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
+our $INBOX_RE = qr!\A/([\w\-][\w\.\-\+]*)!;
 our $MID_RE = qr!([^/]+)!;
 our $END_RE = qr!(T/|t/|t\.mbox(?:\.gz)?|t\.atom|raw|)!;
 our $ATTACH_RE = qr!([0-9][0-9\.]*)-($PublicInbox::Hval::FN)!;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Using plus (+) in list name
  2022-08-21 20:53 ` Eric Wong
@ 2022-08-21 21:43   ` Mark Wielaard
  2022-08-21 22:11     ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Wielaard @ 2022-08-21 21:43 UTC (permalink / raw)
  To: Overseers mailing list; +Cc: Eric Wong, meta

Hi Eric,

On Sun, Aug 21, 2022 at 08:53:38PM +0000, Eric Wong via Overseers wrote:
> Mark Wielaard <mark@klomp.org> wrote:
> > We are setting up a public-inbox instance for cygwin/gcc/sourceware
> > lists at https://inbox.sourceware.org/ and it seems to work pretty
> > nicely. Thanks. Except for lists which have a + in their name like
> > libstdc++.
> > 
> > I assume this needs some escaping somewhere, but I cannot figure out
> > where. The .public-inbox/config snippet looks like:
> 
> I seem to remember '+' is OK as-is in the path component of HTTP URLs,
> but is escaping for ' ' (SP) in query strings.

Yes, '+' doesn't have a reserved purpose in the path component, but
does encode a space in the query string. So it doesn't have to be
escaped in the path component and can be used as is (although
percentage encoding is recommended nobody seems to do it).

> > This seems to work fine for nntp and imap, but not https.
> 
> Interesting that NNTP and IMAP work (I wasn't expecting it :x).
> 
> I can't remember off the top of my head, but is '+' allowed by
> the relevant NNTP and List-Id RFCs?

I don't know. I just observed that I can see the group name
inbox.gcc.libstdc++ in my nttp and imap readers when pointing at
inbox.sourceware.org.

> > We are using the EPEL public-inbox package public-inbox-1.7.0-2.el8.noarch
> 
> Totally untested, but perhaps changing $INBOX_RE in
> PublicInbox/WWW.pm will work:
> 
> diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
> index b9b68382..77f463d3 100644
> --- a/lib/PublicInbox/WWW.pm
> +++ b/lib/PublicInbox/WWW.pm
> @@ -23,7 +23,7 @@ use PublicInbox::WwwStatic qw(r path_info_raw);
>  use PublicInbox::Eml;
>  
>  # TODO: consider a routing tree now that we have more endpoints:
> -our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
> +our $INBOX_RE = qr!\A/([\w\-][\w\.\-\+]*)!;
>  our $MID_RE = qr!([^/]+)!;
>  our $END_RE = qr!(T/|t/|t\.mbox(?:\.gz)?|t\.atom|raw|)!;
>  our $ATTACH_RE = qr!([0-9][0-9\.]*)-($PublicInbox::Hval::FN)!;

That works! https://inbox.sourceware.org/libstdc++ looks fully
functional now.

Now to figure out how to properly include that patch before the other
sourceware overseers figure out I patched the packaged code in place.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using plus (+) in list name
  2022-08-21 21:43   ` Mark Wielaard
@ 2022-08-21 22:11     ` Eric Wong
  2022-08-21 22:21       ` [PATCH] www: support `+' in inbox names Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2022-08-21 22:11 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: overseers, meta

Mark Wielaard <mark@klomp.org> wrote:
> On Sun, Aug 21, 2022 at 08:53:38PM +0000, Eric Wong via Overseers wrote:
> > Mark Wielaard <mark@klomp.org> wrote:
> > Interesting that NNTP and IMAP work (I wasn't expecting it :x).
> > 
> > I can't remember off the top of my head, but is '+' allowed by
> > the relevant NNTP and List-Id RFCs?
> 
> I don't know. I just observed that I can see the group name
> inbox.gcc.libstdc++ in my nttp and imap readers when pointing at
> inbox.sourceware.org.

I suppose it's OK as long as real-world clients are happy.

> > +++ b/lib/PublicInbox/WWW.pm
> > @@ -23,7 +23,7 @@ use PublicInbox::WwwStatic qw(r path_info_raw);
> >  use PublicInbox::Eml;
> >  
> >  # TODO: consider a routing tree now that we have more endpoints:
> > -our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
> > +our $INBOX_RE = qr!\A/([\w\-][\w\.\-\+]*)!;
> >  our $MID_RE = qr!([^/]+)!;
> >  our $END_RE = qr!(T/|t/|t\.mbox(?:\.gz)?|t\.atom|raw|)!;
> >  our $ATTACH_RE = qr!([0-9][0-9\.]*)-($PublicInbox::Hval::FN)!;
> 
> That works! https://inbox.sourceware.org/libstdc++ looks fully
> functional now.

Good to know.  I'll make a patch ASAP and maybe some tests
down-the-line...

> Now to figure out how to properly include that patch before the other
> sourceware overseers figure out I patched the packaged code in place.

If you're using a .psgi config file to customize the middleware
layer, you should be able to access `our' vars through it:

diff --git a/examples/public-inbox.psgi b/examples/public-inbox.psgi
index e017b2fb..36cd8b57 100644
--- a/examples/public-inbox.psgi
+++ b/examples/public-inbox.psgi
@@ -14,6 +14,7 @@ use strict;
 use warnings;
 use PublicInbox::WWW;
 use Plack::Builder;
+$PublicInbox::WWW::INBOX_RE = qr!\A/([\w\-][\w\.\-\+]*)!;
 my $www = PublicInbox::WWW->new;
 $www->preload;
 
There's a bunch of `our' vars which happens to be accessible across
namespaces, some with the intent of being tweaked by end users,
some for internal-use only.

I can't remember what reasoning I had for making $INBOX_RE
globally-accessible, but I'm fine with it used in this way.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] www: support `+' in inbox names
  2022-08-21 22:11     ` Eric Wong
@ 2022-08-21 22:21       ` Eric Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2022-08-21 22:21 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: overseers, meta

`+' already seemed to works for IMAP mailboxes and NNTP newsgroup
names and git-config doesn't complain, either.  So allow it as
the path components of WWW URLs so projects like `libstdc++' can
use it.

Reported-by: Mark Wielaard <mark@klomp.org>
Tested-by: Mark Wielaard <mark@klomp.org>
Link: https://public-inbox.org/meta/YwKnFCvganW7ErXU@wildebeest.org/
---
 There should be better test coverage for this at some point,
 but I suppose this is good enough for now as I'm hacking on
 a more interesting part of this project :>

 lib/PublicInbox/WWW.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 755d7558..a33709e9 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -23,7 +23,7 @@ use PublicInbox::WwwStatic qw(r path_info_raw);
 use PublicInbox::Eml;
 
 # TODO: consider a routing tree now that we have more endpoints:
-our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
+our $INBOX_RE = qr!\A/([\w\-][\w\.\-\+]*)!;
 our $MID_RE = qr!([^/]+)!;
 our $END_RE = qr!(T/|t/|t\.mbox(?:\.gz)?|t\.atom|raw|)!;
 our $ATTACH_RE = qr!([0-9][0-9\.]*)-($PublicInbox::Hval::FN)!;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-21 22:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-21 20:02 Using plus (+) in list name Mark Wielaard
2022-08-21 20:53 ` Eric Wong
2022-08-21 21:43   ` Mark Wielaard
2022-08-21 22:11     ` Eric Wong
2022-08-21 22:21       ` [PATCH] www: support `+' in inbox names Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).