From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 5/5] viewvcs: support streaming large blobs
Date: Thu, 31 Jan 2019 04:27:24 +0000 [thread overview]
Message-ID: <20190131042724.2675-6-e@80x24.org> (raw)
In-Reply-To: <20190131042724.2675-1-e@80x24.org>
Forking off git-cat-file here for streaming large blobs is
reasonably efficient, at least no worse than using
git-http-backend for serving clones. So let our limiter
framework deal with it.
git itself isn't great for large files, and AFAIK there's no
stable/widely-available mechanisms for reading smaller chunks
of giant blobs in git itself.
Tested with some giant GPU headers in the Linux kernel.
---
lib/PublicInbox/ViewVCS.pm | 37 +++++++++++++++++++++++++++++++++----
1 file changed, 33 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/ViewVCS.pm b/lib/PublicInbox/ViewVCS.pm
index 85edf22..63731e9 100644
--- a/lib/PublicInbox/ViewVCS.pm
+++ b/lib/PublicInbox/ViewVCS.pm
@@ -34,6 +34,7 @@ END { $hl = undef };
my %QP_MAP = ( A => 'oid_a', B => 'oid_b', a => 'path_a', b => 'path_b' );
my $max_size = 1024 * 1024; # TODO: configurable
my $enc_utf8 = find_encoding('UTF-8');
+my $BIN_DETECT = 8000; # same as git
sub html_page ($$$) {
my ($ctx, $code, $strref) = @_;
@@ -43,7 +44,33 @@ sub html_page ($$$) {
my ($nr, undef) = @_;
$nr == 1 ? $$strref : undef;
});
- $wcb->($res);
+ $wcb ? $wcb->($res) : $res;
+}
+
+sub stream_large_blob ($$$$) {
+ my ($ctx, $res, $logref, $fn) = @_;
+ my ($git, $oid, $type, $size, $di) = @$res;
+ my $cmd = ['git', "--git-dir=$git->{git_dir}", 'cat-file', $type, $oid];
+ my $qsp = PublicInbox::Qspawn->new($cmd);
+ my @cl = ('Content-Length', $size);
+ my $env = $ctx->{env};
+ $env->{'qspawn.response'} = delete $ctx->{-wcb};
+ $qsp->psgi_return($env, undef, sub {
+ my ($r, $bref) = @_;
+ if (!defined $r) { # error
+ html_page($ctx, 500, $logref);
+ } elsif (index($$bref, "\0") >= 0) {
+ my $ct = 'application/octet-stream';
+ [200, ['Content-Type', $ct, @cl ] ];
+ } else {
+ my $n = bytes::length($$bref);
+ if ($n >= $BIN_DETECT || $n == $size) {
+ my $ct = 'text/plain; charset=UTF-8';
+ return [200, ['Content-Type', $ct, @cl] ];
+ }
+ undef; # bref keeps growing
+ }
+ });
}
sub solve_result {
@@ -65,9 +92,13 @@ sub solve_result {
$ref eq 'ARRAY' or return html_page($ctx, 500, \$log);
my ($git, $oid, $type, $size, $di) = @$res;
+ my $path = to_filename($di->{path_b} || $hints->{path_b} || 'blob');
+ my $raw_link = "(<a\nhref=$path>raw</a>)";
if ($size > $max_size) {
+ return stream_large_blob($ctx, $res, \$log, $fn) if defined $fn;
# TODO: stream the raw file if it's gigantic, at least
- $log = '<pre><b>Too big to show</b></pre>' . $log;
+ $log = "<pre><b>Too big to show, download available</b>\n" .
+ "$oid $type $size bytes $raw_link</pre>" . $log;
return html_page($ctx, 500, \$log);
}
@@ -86,8 +117,6 @@ sub solve_result {
return delete($ctx->{-wcb})->([200, $h, [ $$blob ]]);
}
- my $path = to_filename($di->{path_b} || $hints->{path_b} || 'blob');
- my $raw_link = "(<a\nhref=$path>raw</a>)";
if ($binary) {
$log = "<pre>$oid $type $size bytes (binary)" .
" $raw_link</pre>" . $log;
--
EW
prev parent reply other threads:[~2019-01-31 4:27 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-31 4:27 [PATCH 0/5] a few more solver fixups and improvements Eric Wong
2019-01-31 4:27 ` [PATCH 1/5] t/config.t: test PublicInbox::Git sharing between inboxes Eric Wong
2019-01-31 4:27 ` [PATCH 2/5] inbox: perform cleanup of Git objects for coderepos Eric Wong
2019-01-31 4:27 ` [PATCH 3/5] solvergit: allow searching on longer-than-needed OIDs Eric Wong
2019-01-31 4:27 ` [PATCH 4/5] solvergit: allow shorter-than-necessary OIDs from user Eric Wong
2019-01-31 4:27 ` Eric Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190131042724.2675-6-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).