From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id 1WzkLT0gSGGePgEAgWs5BA (envelope-from ) for ; Mon, 20 Sep 2021 07:46:37 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id YC3lKD0gSGH1DQAAbx9fmQ (envelope-from ) for ; Mon, 20 Sep 2021 05:46:37 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 2213C22D10 for ; Mon, 20 Sep 2021 07:46:37 +0200 (CEST) Received: from localhost ([::1]:51598 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mSC8S-0000Tv-5V for larch@yhetil.org; Mon, 20 Sep 2021 01:46:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51772) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mSC7w-0000TX-5s for bug-guix@gnu.org; Mon, 20 Sep 2021 01:46:04 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:57334) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mSC7u-0007Vy-AZ for bug-guix@gnu.org; Mon, 20 Sep 2021 01:46:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mSC7u-0000if-7s for bug-guix@gnu.org; Mon, 20 Sep 2021 01:46:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#35350: Some compile output still leaks through with --verbosity=1 Resent-From: Sarah Morgensen Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Mon, 20 Sep 2021 05:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 35350 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Mark H Weaver Received: via spool by 35350-submit@debbugs.gnu.org id=B35350.163211670231817 (code B ref 35350); Mon, 20 Sep 2021 05:46:02 +0000 Received: (at 35350) by debbugs.gnu.org; 20 Sep 2021 05:45:02 +0000 Received: from localhost ([127.0.0.1]:40647 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mSC6v-0008GR-8C for submit@debbugs.gnu.org; Mon, 20 Sep 2021 01:45:02 -0400 Received: from out1.migadu.com ([91.121.223.63]:26885) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mSC6s-0008G5-OR for 35350@debbugs.gnu.org; Mon, 20 Sep 2021 01:44:59 -0400 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mgsn.dev; s=key1; t=1632116697; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L6/enyah5Tsg0qnWyv8Qz91KfXjfuXs/4WSEvbhhHJk=; b=c2VJQldKX1ipXo3hodoAvCa4ab7lVccTccApz4r8Nau5nNL0tbjSM5Ctje66amjmYdpM7h JcUjiA22O4/WBPXZVwsm9iB+qUc2Jvm2rZOnAkZtL355+5EBA0MrxcFeC/oAiDL9y1AMQI s26nGCpXkq7KZJJtRrty0e7Kr61dkX4= From: Sarah Morgensen References: <87mukkfd2j.fsf@netris.org> <87r29v2jz2.fsf@gnu.org> <87ftq9silk.fsf@netris.org> <87imv5jai5.fsf@gnu.org> <87k1fgh9c0.fsf@netris.org> <874l6jh0bx.fsf@gnu.org> <87imuvme7g.fsf@netris.org> <87r29e5zsw.fsf@gnu.org> <87tveauk2u.fsf@netris.org> Date: Sun, 19 Sep 2021 22:44:55 -0700 In-Reply-To: <87tveauk2u.fsf@netris.org> (Mark H. Weaver's message of "Sat, 04 May 2019 14:53:50 -0400 (2 years, 19 weeks, 5 days ago)") Message-ID: <864kafhkbs.fsf@mgsn.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 35350@debbugs.gnu.org Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1632116797; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=L6/enyah5Tsg0qnWyv8Qz91KfXjfuXs/4WSEvbhhHJk=; b=jydgM3xMhXj6HUwI7F9xobUog1SfJbgfSOYLoWRnRDbdOYxZ7poenUzRJomSvt/y/dp7GA kSALO9xxymU44TS2gWYt2vbxwcae6ltgUxmN5irQVwVl6TnpSscXRKn4hEhowumBo8Wl3w pkwmbpzxr0lT6RVlhyvCqwv80LAp8Hd05fcqZnARuPTT81FjYtt5wrB+munzTNIpff9v53 qqLg/IN+HAjAkBkIcMqsUzz/FIYcuCl6lVpyfaYT51w/8kN1Hlnz1XUgynmN/wfKJyqU9u bDQtC0XVkh/spkZGL5A+x3RJ9NeBGlk8Zzg+ABmjZGyTMxrMingJbjucauyQGA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1632116797; a=rsa-sha256; cv=none; b=TR+qIRjY2Ix0BcBT7dX1ZBsR+8r2ilKA2tNXZUM6h/T3u4D8Wcr0L+g+3ik1Z5x4kcifUy QenB5r7/6/soL8ZKjHNxiqSAs5rrCPRFSyW4T55avkWjgHh8bHESIrHV8+XUwPY2hN7Hw8 Nw2VV5awaFL7MN2cjuvEu/V6VG+k1Vk/xFmKLBxum0LY7+LDZCqSLa6q8CzR4G2zdvnP1A ii2RhCXvFNFxIp56cgNHGGKFQ1tGumcCB1FjAlEPNGrhiuBBcz/GOvHtkT5SovZmPh+QuH wd0OHLy1AAiJxt6n4V+fobUvcc+EoKzXXyAgmYo2H0R+zGWDe7V6Npbw7zFybg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=mgsn.dev header.s=key1 header.b=c2VJQldK; dmarc=fail reason="SPF not aligned (relaxed)" header.from=mgsn.dev (policy=none); spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Spam-Score: 0.21 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=mgsn.dev header.s=key1 header.b=c2VJQldK; dmarc=fail reason="SPF not aligned (relaxed)" header.from=mgsn.dev (policy=none); spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Migadu-Queue-Id: 2213C22D10 X-Spam-Score: 0.21 X-Migadu-Scanner: scn0.migadu.com X-TUID: HPSto8zh9OXB Hello, I encountered this issue today. This looks like a pretty complete solution ready to go. Did this ever make it into Guile/Guix? (Ironically I was also reaching for a "make-custom-textual-output-port" the other day!) -- Sarah Mark H Weaver writes: > Hi Ludovic, > > Ludovic Court=C3=A8s writes: > >> Mark H Weaver skribis: >> >>> Ludovic Court=C3=A8s writes: >> >> [...] >> >>>> So there are two things. To fix the issue you reported (build output >>>> that goes through), I think we must simply turn off UTF-8 decoding from >>>> =E2=80=98process-stderr=E2=80=99 and leave that entirely to =E2=80=98b= uild-event-output-port=E2=80=99. >>> >>> Can we assume that UTF-8 is the appropriate encoding for >>> (current-build-output-port)? My interpretation of the Guix manual entry >>> for 'current-build-output-port' suggests that the answer should be "no". >> >> What goes to =E2=80=98current-build-output-port=E2=80=99 comes from buil= ds processes. >> It=E2=80=99s usually UTF-8 but it can be anything, including binary garb= age, >> which should be gracefully handled. >> >> That=E2=80=99s why =E2=80=98process-stderr=E2=80=99 currently uses =E2= =80=98read-maybe-utf8-string=E2=80=99. > > I agree that we should (permissively) interpret the build process output > as UTF-8, regardless of locale settings. However, the encoding of > 'current-build-output-port' is orthogonal, and I see no reason to assume > that it's UTF-8. > > As 'process-stderr' is currently implemented, it makes no assumptions > about the encoding of 'current-build-output-port'. That's because it > uses only textual I/O on it. The end result is that the UTF-8 build > output is effectively converted into the port encoding of > 'current-build-output-port', whatever it might be. I think that's how > it should be, no? > >>> Also, in your previous message you wrote: >>> >>> The problem is the first layer of UTF-8 decoding that happens in >>> =E2=80=98process-stderr=E2=80=99, in the =E2=80=98%stderr-next=E2=80= =99 case. We would need to >>> disable it, but only if the build output port is >>> =E2=80=98build-event-output-port=E2=80=99 (i.e., it=E2=80=99s capable= of interpreting >>> =E2=80=9Cmultiplexed build output=E2=80=9D correctly.) >>> >>> It sounds like you're suggesting that 'process-stderr' should look to >>> see if (current-build-output-port) is a 'build-event-output-port', and >>> in that case it should use binary I/O primitives to write raw binary >>> data to it, otherwise it should use text I/O primitives and write >>> characters to it. Do I understand correctly? >> >> Yes. (Actually, rather than guessing if (current-build-output-port) is >> a =E2=80=98build-event-output-port=E2=80=99, there could be a fluid to a= sk for the use >> of raw binary primitives.) >> >>> IMO, it would be cleaner to treat 'build-event-output-port' uniformly, >>> and specifically as a textual port of unknown encoding. >> >> (You mean =E2=80=98current-build-output-port=E2=80=99, right?) > > Yes, indeed. > >> I think you=E2=80=99re right. I=E2=80=99m not yet entirely sure what th= e implications >> are. There=E2=80=99s a couple of tests in tests/store.scm for UTF-8 >> interpretation that describe behavior that I think we should preserve. > > I certainly agree that we should preserve those tests. I would go > further and add two more tests that bind 'current-build-output-port' to > a port with a non-UTF-8 encoding (e.g. UTF-16) and verify that the =CE=BB > gets converted correctly. The test build process would output the =CE=BB= as > UTF-8, but it should be written to 'current-build-output-port' as > e.g. UTF-16. > > What do you think? > >>> I would suggest changing 'build-event-output-port' to create an R6RS >>> custom *textual* output port, so that it wouldn't have to worry about >>> encodings at all, and it would only be given whole characters. >>> Internally, it would be doing exactly what you suggest above, but those >>> details would be encapsulated within the custom textual port. >>> >>> However, I don't think we can use Guile's current implementation of R6RS >>> custom textual output ports, which are currently built on Guile's legacy >>> soft ports, which I suspect have a similar bug with multibyte characters >>> sometimes being split (see 'soft_port_write' in vports.c). >>> >>> Having said all of this, my suggestions would ultimately entail having >>> two separate places along the stderr pipeline where 'utf8->string!' >>> would be used, and maybe that's too much until we have a more optimized >>> C implementation of it. >> >> Yeah it looks like we don=E2=80=99t yet have custom textual output ports= that we >> could rely on, do we? >> >> I support your work to add that in Guile proper! > > For now, I can offer a new implementation of custom textual output ports > built upon custom binary ports and the 'utf8->string!' that I previously > sent. See attached. > > Thanks, > Mark > > GNU Guile 2.2.4 > Copyright (C) 1995-2017 Free Software Foundation, Inc. > > Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. > This program is free software, and you are welcome to redistribute it > under certain conditions; type `,show c' for details. > > Enter `,help' for help. > scheme@(guile-user)> (load "utf8-decoder.scm") > scheme@(guile-user)> (load "guile-new-custom-textual-ports.scm") > scheme@(guile-user)> (define (my-write! str start count) > (pk 'my-write! (substring str start (+ start count= ))) > count) > scheme@(guile-user)> (define port (make-custom-textual-output-port "test1= " my-write! #f #f #f)) > scheme@(guile-user)> (display "Hello =CE=BB world!" port) > scheme@(guile-user)> (force-output port) > > ;;; (my-write! "Hello =CE=BB world!") > scheme@(guile-user)> (string->utf8 "=CE=BB") > $2 =3D #vu8(206 187) > scheme@(guile-user)> (string->utf8 "Hello =CE=BB world!") > $3 =3D #vu8(72 101 108 108 111 32 206 187 32 119 111 114 108 100 33) > scheme@(guile-user)> (put-bytevector port #vu8(72 101 108 108 111 32 206)) > scheme@(guile-user)> (force-output port) > > ;;; (my-write! "Hello ") > scheme@(guile-user)> (put-bytevector port #vu8(187 32 119 111 114 108 100= 33)) > scheme@(guile-user)> (force-output port) > > ;;; (my-write! "=CE=BB world!") > scheme@(guile-user)> > > ;;; Copyright =C2=A9 2019 Mark H Weaver > ;;; > ;;; This program is free software: you can redistribute it and/or modify > ;;; it under the terms of the GNU General Public License as published by > ;;; the Free Software Foundation, either version 3 of the License, or > ;;; (at your option) any later version. > ;;; > ;;; This program is distributed in the hope that it will be useful, > ;;; but WITHOUT ANY WARRANTY; without even the implied warranty of > ;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > ;;; GNU General Public License for more details. > ;;; > ;;; You should have received a copy of the GNU General Public License > ;;; along with this program. If not, see . > > (use-modules (rnrs io ports)) > > (define (make-custom-textual-output-port id > write! > get-position > set-position! > close) > (let (;; Allocate a per-port string buffer which will be used as a > ;; temporary buffer for decoding, to avoid heap allocation > ;; during normal operation. > (buffer (make-string 4096)) > ;; 'state' is the UTF-8 decoder state, which represents a > ;; proper prefix of a well-formed UTF-8 byte sequence. These > ;; are bytes that 'binary-write!' has accepted and reported as > ;; having been written, although we are not able to decode > ;; them into a character to pass to (textual) 'write!' until > ;; more bytes arrive. > (state 0)) > (define (binary-write! bv start count) > (call-with-values (lambda () > ;; XXX FIXME: Consider performing this > ;; decoding strictly. > (utf8->string! state bv start (+ start count) > buffer 0 (string-length buffer))) > (lambda (new-state bv-pos char-count) > (let* (;; Avoid calling write! with (char-count =3D 0) unless > ;; (count =3D 0) was passed to us, because calling > ;; 'write!' with count=3D0 has a special meaning: it > ;; means to pass an EOF object to the byte/character > ;; sink. > (chars-accepted (if (and (zero? char-count) > (not (zero? count))) > 0 > (write! buffer 0 char-count))) > ;; Compute 'bytes-accepted' in such a way that the > ;; bytes from STATE are not included, because they > ;; were passed to us in previous calls, and are not > ;; part of the bytevector range that we are now being > ;; asked to write. However, it's important to note > ;; that if 'write!' did not accept the bytes from > ;; STATE, 'bytes-accepted' will be negative. We must > ;; handle that case specially below. > (bytes-accepted (- count (string-utf8-length > (substring buffer > chars-accepted > char-count))))) > ;; If 'bytes-accepted' is negative, that means the bytes > ;; from STATE were not written. This can only happen if > ;; 'chars-accepted' is 0, because 'write!' can only accept > ;; whole code points, and the bytes from STATE are part of > ;; at most a single code point. In this case, we must > ;; leave STATE unchanged and return 0. > (if (negative? bytes-accepted) > 0 > (begin > (set! state new-state) > bytes-accepted)))))) > (define (binary-close) > (set! buffer #f) > (when close (close))) > (define port > (make-custom-binary-output-port id > binary-write! > get-position > set-position! > binary-close)) > ;; Always use UTF-8 as the encoding for custom textual ports, as > ;; an internal implementation detail, to ensure that all Unicode > ;; characters will pass through regardless of the current locale. > (set-port-encoding! port "UTF-8") > port))