From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 2233B208E9; Mon, 30 Jul 2018 05:46:47 +0000 (UTC) Date: Mon, 30 Jul 2018 05:46:46 +0000 From: Eric Wong To: "Eric W. Biederman" Cc: meta@public-inbox.org Subject: Re: [RFC][PATCH] ProcessPipe.pm: Use read not sysread Message-ID: <20180730054646.gaylszho35yclqjo@dcvr> References: <87y3dtl3f6.fsf@xmission.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87y3dtl3f6.fsf@xmission.com> List-Id: "Eric W. Biederman" wrote: > > While playing with git fast export I discovered that mixing <> and > read would give inconsistent results. I tracked the issue down to > using sysread in ProcessPipe instead of plain read. > > If it is desirable to use readline I can't see how using sysread > can work as readline to be efficient needs to use buffered I/O. Agreed. > Signed-off-by: "Eric W. Biederman" > --- > > Am I missing something or was this a fundamental bug from the beginning? I recall it being intentional; but it's subtle and I should've at least documented it :x Right now, none of the ProcessPipe users mix readline and read calls; and I've found sysread slightly faster with lots of data. The cases where I parse fast-export output uses the wantarray return from popen which bypasses ProcessPipe. But I'm not opposed to your patch, though. The bandwidth-hungry GitHTTPBackend/Qspawn case already uses the untied wantarray values and calls sysread directly. And off the top of my head, that's the only expensive data transfer in the PSGI interface. But yes, very subtle and I think I'll take your patch to simplify things since I don't think there's noticeable performance regressions. Probably something like the following is not worth it; but right now, uncommenting the "die" won't trigger failures in the current code base: diff --git a/lib/PublicInbox/ProcessPipe.pm b/lib/PublicInbox/ProcessPipe.pm index 7bb6dde..37e4859 100644 --- a/lib/PublicInbox/ProcessPipe.pm +++ b/lib/PublicInbox/ProcessPipe.pm @@ -11,9 +11,20 @@ sub TIEHANDLE { bless { pid => $pid, fh => $fh }, $class; } -sub READ { sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0) } +sub READ { + if ($_[0]->{did_readline}) { + # die "Using buffered read\n"; + read($_[0]->{fh}, $_[1], $_[2], $_[3] || 0); + } + else { + sysread($_[0]->{fh}, $_[1], $_[2], $_[3] || 0); + } +} -sub READLINE { readline($_[0]->{fh}) } +sub READLINE { + $_[0]->{did_readline} = 1; + readline($_[0]->{fh}); +} sub CLOSE { my $fh = delete($_[0]->{fh});