From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#35350: Some compile output still leaks through with --verbosity=1 Date: Tue, 23 Apr 2019 12:12:34 +0200 Message-ID: <87imv5jai5.fsf@gnu.org> References: <87mukkfd2j.fsf@netris.org> <87r29v2jz2.fsf@gnu.org> <87ftq9silk.fsf@netris.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Return-path: Received: from eggs.gnu.org ([209.51.188.92]:43396) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hIsQF-0001fl-FZ for bug-guix@gnu.org; Tue, 23 Apr 2019 06:13:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hIsQB-0000W4-Kt for bug-guix@gnu.org; Tue, 23 Apr 2019 06:13:05 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:38926) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hIsQB-0000VL-EK for bug-guix@gnu.org; Tue, 23 Apr 2019 06:13:03 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hIsQA-0000LY-4M for bug-guix@gnu.org; Tue, 23 Apr 2019 06:13:03 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87ftq9silk.fsf@netris.org> (Mark H. Weaver's message of "Mon, 22 Apr 2019 19:52:28 -0400") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Mark H Weaver Cc: 35350@debbugs.gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Mark, Mark H Weaver skribis: > Ludovic Court=C3=A8s writes: > >> Mark H Weaver skribis: >> >>> Sometimes when compiling a package with --verbosity=3D1, some parts of = the >>> compile output leak through. For example, see the transcript below. >> >> Weird. > > FWIW, a few observations, possibly relevant: > > (1) Each chunk of leaked output begins with 1 or 2 Unicode Replacement > characters (U+FFFD). In the transcript I provided, the first leak > began with 1 replacement char, and all later leaks began with 2. > > (2) The replacement characters are immediately followed by > "@ build-log 30033 4096\n", and that string is also sprinkled > throughout the leaked output, with approximately ~4060-4070 > characters of leaked output between each occurrence of > "@ build-log 30033 4096\n". Indeed. I managed to reproduce it while building modem-manager. I strace=E2=80=99d =E2=80=98guix build=E2=80=99 with an additional =E2=80=98p= k=E2=80=99=C2=B9 to see what happens, and here=E2=80=99s what leads to the wrong =E2=80=9Cwrite(2, "=EF=BF=BD@ build-= log=E2=80=A6")=E2=80=9D call: --8<---------------cut here---------------start------------->8--- read(13, "gmlo\0\0\0\0", 8) =3D 8 read(13, "\27\20\0\0\0\0\0\0", 8) =3D 8 read(13, "@ build-log 22090 4096\n [=E2=80=A6] warning: \342\200", 4119) = =3D 4119 read(13, "\0", 1) =3D 1 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1008 <> #f 0)\n", 25) =3D 25 write(1, "\n", 1) =3D 1 write(1, ";;; (write 985 <> 22090 4096)\n", 30) =3D 30 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1008 <> 22090 3111)\n", 31) =3D 31 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1008 <> 22090 2103)\n", 31) =3D 31 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1008 <> 22090 1095)\n", 31) =3D 31 write(1, "\n", 1) =3D 1 write(1, ";;; (write 88 <> 22090 87)\n", 27) =3D 27 write(2, "\r\33[K\\ 'build' phase", 19) =3D 19 [=E2=80=A6] write(2, "\r\33[K\\ 'build' phase", 19) =3D 19 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1 <> #f 0)\n", 22) =3D 22 read(13, "gmlo\0\0\0\0", 8) =3D 8 read(13, "\27\20\0\0\0\0\0\0", 8) =3D 8 read(13, "@ build-log 22090 4096\n\230g_simple_async_result_take_error\342\= 200\231 is deprecated[=E2=80=A6]", 4119) =3D 4119 read(13, "\0", 1) =3D 1 write(1, "\n", 1) =3D 1 write(1, ";;; (write 1008 <> #f 0)\n", 25) =3D 25 write(2, "\357\277\275@ build-log 22090 4096\n", 26) =3D 26 --8<---------------cut here---------------end--------------->8--- The third read(2) call here ends on a partial UTF-8 sequence for LEFT SINGLE QUOTATION MARK (we get the first two bytes of a three byte sequence.) What happens is that =E2=80=98process-stderr=E2=80=99 in (guix store) gets = that byte string from the daemon, passes it through =E2=80=98read-maybe-utf8-string= =E2=80=99, which replaces the last two bytes with REPLACEMENT CHARACTER, which is itself a 3-byte sequence. Thus, we have this extra byte that=E2=80=99s being inserted. That confuses= the whole machinery since the build log was announced as being 4096-byte long, and it=E2=80=99s now 4097-byte long. Internally, =E2=80=98build-event-output-port=E2=80=99 keeps the last byte o= f the REPLACEMENT CHARACTER sequence in the =E2=80=98%fragments=E2=80=99 buffer. Consequently, the =E2=80=9C@ build-log=E2=80=9D string that comes next does= n=E2=80=99t start on a newline, and thus it is considered build output. Since the first byte does not constitute a valid UTF-8 sequence, another REPLACEMENT CHARACTER is inserted there when it gets printed. So =E2=80=98build-event-output-port=E2=80=99 is working as expected. The p= roblem is the first layer of UTF-8 decoding that happens in =E2=80=98process-stderr=E2=80= =99, in the =E2=80=98%stderr-next=E2=80=99 case. We would need to disable it, but only= if the build output port is =E2=80=98build-event-output-port=E2=80=99 (i.e., it=E2=80=99= s capable of interpreting =E2=80=9Cmultiplexed build output=E2=80=9D correctly.) Thanks, Ludo=E2=80=99. =C2=B9 pk: --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/guix/status.scm b/guix/status.scm index cbea4151f2..4dcbcb0c1f 100644 --- a/guix/status.scm +++ b/guix/status.scm @@ -717,6 +717,7 @@ The second return value is a thunk to retrieve the current state." (pointer->bytevector ptr count))) (define (write! bv offset count) + (pk 'write count '<> %build-output-pid %build-output-left) (if %build-output-pid (let ((keep (min count %build-output-left))) (set! %build-output --=-=-=--