unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* bug: httpd: incorrect Unicode output of $INBOX_DIR/description
@ 2020-05-28 15:12 Julien Moutinho
  2020-05-28 18:37 ` [PATCH] treat $INBOX_DIR/description and gitweb.owner as UTF-8 Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Julien Moutinho @ 2020-05-28 15:12 UTC (permalink / raw)
  To: meta

Description
-----------
public-inbox-httpd does not output $INBOX_DIR/description
using the expected Unicode code points.

Reproducing
-----------
$ cat /var/lib/public-inbox/inboxes/equipage/description
Équipage

$ file $(readlink -e description)
/nix/store/a7m2gqmj417dlqzjq1arizm7gxxrdqqm-description: UTF-8 Unicode text, with no line terminators

Is rendered by public-inbox-httpd as:
$ curl -s http://example.org/lists/archives/ | grep quipage'$'
Équipage

My setup: public-inbox-1.5.0, or public-inbox-1.2.0, on NixOS.

Expecting
---------
$ curl -s http://example.org/lists/archives/ | grep quipage'$'
Êquipage

Or:
$ curl -s http://example.org/lists/archives/ | grep quipage'$'
Équipage

Debugging
---------
This may be due to using: ascii_html($ibx->description);

Thanks a lot for developing public-inbox,
Julien.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] treat $INBOX_DIR/description and gitweb.owner as UTF-8
  2020-05-28 15:12 bug: httpd: incorrect Unicode output of $INBOX_DIR/description Julien Moutinho
@ 2020-05-28 18:37 ` Eric Wong
  2020-05-28 19:07   ` Julien Moutinho
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2020-05-28 18:37 UTC (permalink / raw)
  To: Julien Moutinho; +Cc: meta

Julien Moutinho <julm+public-inbox@sourcephile.fr> wrote:
> public-inbox-httpd does not output $INBOX_DIR/description
> using the expected Unicode code points.

<snip> thanks for the bug report.  Below is a patch + tests
which should fix the bug.

> Debugging
> ---------
> This may be due to using: ascii_html($ibx->description);

Nope, ascii_html() is to ensure the HTML source is readable
to people on small/old systems with ASCII-only fonts.

> Thanks a lot for developing public-inbox,

You're welcome and thanks again for the bug report :>

----------8<--------
Subject: [PATCH] treat $INBOX_DIR/description and gitweb.owner as UTF-8

gitweb does the same with $GIT_DIR/description and gitweb.owner.

Allowing UTF-8 description should not cause problems when used
in responses for to the NNTP "LIST NEWSGROUPS" request, either,
since RFC 3977 section 7.6.6 recommends the description be UTF-8
(but does not require it).

Link: https://public-inbox.org/meta/20200528151216.l7vmnmrs4ojw372g@sourcephile.fr/
---
 lib/PublicInbox/Inbox.pm      | 1 +
 lib/PublicInbox/WwwListing.pm | 2 ++
 t/inbox.t                     | 7 ++++---
 t/www_listing.t               | 5 +++--
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 002b980f405..c295b2677e4 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -223,6 +223,7 @@ sub description {
 		my $desc = try_cat("$self->{inboxdir}/description");
 		local $/ = "\n";
 		chomp $desc;
+		utf8::decode($desc);
 		$desc =~ s/\s+/ /smg;
 		$desc eq '' ? undef : $desc;
 	}) // '($INBOX_DIR/description missing)';
diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm
index 38a37ddaae9..a416d24f0e4 100644
--- a/lib/PublicInbox/WwwListing.pm
+++ b/lib/PublicInbox/WwwListing.pm
@@ -159,6 +159,8 @@ sub manifest_add ($$;$$) {
 	chomp(my $desc = try_cat("$git_dir/description"));
 	$owner = undef if $owner eq '';
 	$desc = 'Unnamed repository' if $desc eq '';
+	utf8::decode($desc);
+	utf8::decode($owner);
 
 	# templates/hooks--update.sample and git-multimail in git.git
 	# only match "Unnamed repository", not the full contents of
diff --git a/t/inbox.t b/t/inbox.t
index b59d5dba8c0..08f1724f092 100644
--- a/t/inbox.t
+++ b/t/inbox.t
@@ -22,13 +22,14 @@ is($x->description, '($INBOX_DIR/description missing)', 'default description');
 	print $fh "https://example.com/inbox\n" or die;
 	close $fh or die;
 	open $fh, '>', "$x->{inboxdir}/description" or die;
-	print $fh "blah\n" or die;
+	print $fh "\xc4\x80blah\n" or die;
 	close $fh or die;
 }
 is_deeply($x->cloneurl, ['https://example.com/inbox'], 'cloneurls update');
-is($x->description, 'blah', 'description updated');
+ok(utf8::valid($x->description), 'description is utf8::valid');
+is($x->description, "\x{100}blah", 'description updated');
 is(unlink(glob("$x->{inboxdir}/*")), 2, 'unlinked cloneurl & description');
 is_deeply($x->cloneurl, ['https://example.com/inbox'], 'cloneurls memoized');
-is($x->description, 'blah', 'description memoized');
+is($x->description, "\x{100}blah", 'description memoized');
 
 done_testing();
diff --git a/t/www_listing.t b/t/www_listing.t
index 31d76356d88..0aededd43eb 100644
--- a/t/www_listing.t
+++ b/t/www_listing.t
@@ -46,7 +46,7 @@ sub tiny_test {
 	unlike($tmp, qr/"modified":\s*"/, 'modified is an integer');
 	my $manifest = $json->decode($tmp);
 	ok(my $clone = $manifest->{'/alt'}, '/alt in manifest');
-	is($clone->{owner}, 'lorelei', 'owner set');
+	is($clone->{owner}, "lorelei \x{100}", 'owner set');
 	is($clone->{reference}, '/bare', 'reference detected');
 	is($clone->{description}, "we're all clones", 'description read');
 	ok(my $bare = $manifest->{'/bare'}, '/bare in manifest');
@@ -88,7 +88,8 @@ SKIP: {
 	open $fh, '>', "$alt/description" or die;
 	print $fh "we're all clones\n" or die;
 	close $fh or die;
-	is(xsys('git', "--git-dir=$alt", qw(config gitweb.owner lorelei)), 0,
+	is(xsys('git', "--git-dir=$alt", qw(config gitweb.owner),
+		"lorelei \xc4\x80"), 0,
 		'set gitweb user');
 	ok(unlink("$bare->{git_dir}/description"), 'removed bare/description');
 	open $fh, '>', $cfgfile or die;

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] treat $INBOX_DIR/description and gitweb.owner as UTF-8
  2020-05-28 18:37 ` [PATCH] treat $INBOX_DIR/description and gitweb.owner as UTF-8 Eric Wong
@ 2020-05-28 19:07   ` Julien Moutinho
  0 siblings, 0 replies; 3+ messages in thread
From: Julien Moutinho @ 2020-05-28 19:07 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Le jeu. 28 mai 2020 18h37 +0000, Eric Wong a écrit :
> Julien Moutinho <julm+public-inbox@sourcephile.fr> wrote:
> > public-inbox-httpd does not output $INBOX_DIR/description
> > using the expected Unicode code points.
> 
> <snip> thanks for the bug report.  Below is a patch + tests
> which should fix the bug.
Wow, that was fast! and it works well, thanks!

> > Debugging
> > ---------
> > This may be due to using: ascii_html($ibx->description);
> 
> Nope, ascii_html() is to ensure the HTML source is readable
> to people on small/old systems with ASCII-only fonts.
Oh, right. Impressive level of compatibility ^_^

Cheers,
Julien.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-05-28 19:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-28 15:12 bug: httpd: incorrect Unicode output of $INBOX_DIR/description Julien Moutinho
2020-05-28 18:37 ` [PATCH] treat $INBOX_DIR/description and gitweb.owner as UTF-8 Eric Wong
2020-05-28 19:07   ` Julien Moutinho

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).