bug#35350: Some compile output still leaks through with --verbosity=1

all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Mark H Weaver <mhw@netris.org>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 35350@debbugs.gnu.org
Subject: bug#35350: Some compile output still leaks through with --verbosity=1
Date: Sat, 04 May 2019 14:53:50 -0400	[thread overview]
Message-ID: <87tveauk2u.fsf@netris.org> (raw)
In-Reply-To: <87r29e5zsw.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Sat, 04 May 2019 11:33:51 +0200")

[-- Attachment #1: Type: text/plain, Size: 5958 bytes --]

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>>> So there are two things.  To fix the issue you reported (build output
>>> that goes through), I think we must simply turn off UTF-8 decoding from
>>> ‘process-stderr’ and leave that entirely to ‘build-event-output-port’.
>>
>> Can we assume that UTF-8 is the appropriate encoding for
>> (current-build-output-port)?  My interpretation of the Guix manual entry
>> for 'current-build-output-port' suggests that the answer should be "no".
>
> What goes to ‘current-build-output-port’ comes from builds processes.
> It’s usually UTF-8 but it can be anything, including binary garbage,
> which should be gracefully handled.
>
> That’s why ‘process-stderr’ currently uses ‘read-maybe-utf8-string’.

I agree that we should (permissively) interpret the build process output
as UTF-8, regardless of locale settings.  However, the encoding of
'current-build-output-port' is orthogonal, and I see no reason to assume
that it's UTF-8.

As 'process-stderr' is currently implemented, it makes no assumptions
about the encoding of 'current-build-output-port'.  That's because it
uses only textual I/O on it.  The end result is that the UTF-8 build
output is effectively converted into the port encoding of
'current-build-output-port', whatever it might be.  I think that's how
it should be, no?

>> Also, in your previous message you wrote:
>>
>>   The problem is the first layer of UTF-8 decoding that happens in
>>   ‘process-stderr’, in the ‘%stderr-next’ case.  We would need to
>>   disable it, but only if the build output port is
>>   ‘build-event-output-port’ (i.e., it’s capable of interpreting
>>   “multiplexed build output” correctly.)
>>
>> It sounds like you're suggesting that 'process-stderr' should look to
>> see if (current-build-output-port) is a 'build-event-output-port', and
>> in that case it should use binary I/O primitives to write raw binary
>> data to it, otherwise it should use text I/O primitives and write
>> characters to it.  Do I understand correctly?
>
> Yes.  (Actually, rather than guessing if (current-build-output-port) is
> a ‘build-event-output-port’, there could be a fluid to ask for the use
> of raw binary primitives.)
>
>> IMO, it would be cleaner to treat 'build-event-output-port' uniformly,
>> and specifically as a textual port of unknown encoding.
>
> (You mean ‘current-build-output-port’, right?)

Yes, indeed.

> I think you’re right.  I’m not yet entirely sure what the implications
> are.  There’s a couple of tests in tests/store.scm for UTF-8
> interpretation that describe behavior that I think we should preserve.

I certainly agree that we should preserve those tests.  I would go
further and add two more tests that bind 'current-build-output-port' to
a port with a non-UTF-8 encoding (e.g. UTF-16) and verify that the λ
gets converted correctly.  The test build process would output the λ as
UTF-8, but it should be written to 'current-build-output-port' as
e.g. UTF-16.

What do you think?

>> I would suggest changing 'build-event-output-port' to create an R6RS
>> custom *textual* output port, so that it wouldn't have to worry about
>> encodings at all, and it would only be given whole characters.
>> Internally, it would be doing exactly what you suggest above, but those
>> details would be encapsulated within the custom textual port.
>>
>> However, I don't think we can use Guile's current implementation of R6RS
>> custom textual output ports, which are currently built on Guile's legacy
>> soft ports, which I suspect have a similar bug with multibyte characters
>> sometimes being split (see 'soft_port_write' in vports.c).
>>
>> Having said all of this, my suggestions would ultimately entail having
>> two separate places along the stderr pipeline where 'utf8->string!'
>> would be used, and maybe that's too much until we have a more optimized
>> C implementation of it.
>
> Yeah it looks like we don’t yet have custom textual output ports that we
> could rely on, do we?
>
> I support your work to add that in Guile proper!

For now, I can offer a new implementation of custom textual output ports
built upon custom binary ports and the 'utf8->string!' that I previously
sent.  See attached.

     Thanks,
       Mark


--8<---------------cut here---------------start------------->8---
GNU Guile 2.2.4
Copyright (C) 1995-2017 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> (load "utf8-decoder.scm")
scheme@(guile-user)> (load "guile-new-custom-textual-ports.scm")
scheme@(guile-user)> (define (my-write! str start count)
                       (pk 'my-write! (substring str start (+ start count)))
                       count)
scheme@(guile-user)> (define port (make-custom-textual-output-port "test1" my-write! #f #f #f))
scheme@(guile-user)> (display "Hello λ world!" port)
scheme@(guile-user)> (force-output port)

;;; (my-write! "Hello λ world!")
scheme@(guile-user)> (string->utf8 "λ")
$2 = #vu8(206 187)
scheme@(guile-user)> (string->utf8 "Hello λ world!")
$3 = #vu8(72 101 108 108 111 32 206 187 32 119 111 114 108 100 33)
scheme@(guile-user)> (put-bytevector port #vu8(72 101 108 108 111 32 206))
scheme@(guile-user)> (force-output port)

;;; (my-write! "Hello ")
scheme@(guile-user)> (put-bytevector port #vu8(187 32 119 111 114 108 100 33))
scheme@(guile-user)> (force-output port)

;;; (my-write! "λ world!")
scheme@(guile-user)>
--8<---------------cut here---------------end--------------->8---


[-- Attachment #2: New implementation of custom textual output ports for Guile --]
[-- Type: text/plain, Size: 4559 bytes --]

;;; Copyright © 2019 Mark H Weaver <mhw@netris.org>
;;;
;;; This program is free software: you can redistribute it and/or modify
;;; it under the terms of the GNU General Public License as published by
;;; the Free Software Foundation, either version 3 of the License, or
;;; (at your option) any later version.
;;;
;;; This program is distributed in the hope that it will be useful,
;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;;; GNU General Public License for more details.
;;;
;;; You should have received a copy of the GNU General Public License
;;; along with this program.  If not, see <http://www.gnu.org/licenses/>.

(use-modules (rnrs io ports))

(define (make-custom-textual-output-port id
                                         write!
                                         get-position
                                         set-position!
                                         close)
  (let (;; Allocate a per-port string buffer which will be used as a
        ;; temporary buffer for decoding, to avoid heap allocation
        ;; during normal operation.
        (buffer (make-string 4096))
        ;; 'state' is the UTF-8 decoder state, which represents a
        ;; proper prefix of a well-formed UTF-8 byte sequence.  These
        ;; are bytes that 'binary-write!' has accepted and reported as
        ;; having been written, although we are not able to decode
        ;; them into a character to pass to (textual) 'write!' until
        ;; more bytes arrive.
        (state 0))
    (define (binary-write! bv start count)
      (call-with-values (lambda ()
                          ;; XXX FIXME: Consider performing this
                          ;; decoding strictly.
                          (utf8->string! state bv start (+ start count)
                                         buffer 0 (string-length buffer)))
        (lambda (new-state bv-pos char-count)
          (let* (;; Avoid calling write! with (char-count = 0) unless
                 ;; (count = 0) was passed to us, because calling
                 ;; 'write!' with count=0 has a special meaning: it
                 ;; means to pass an EOF object to the byte/character
                 ;; sink.
                 (chars-accepted (if (and (zero? char-count)
                                          (not (zero? count)))
                                     0
                                     (write! buffer 0 char-count)))
                 ;; Compute 'bytes-accepted' in such a way that the
                 ;; bytes from STATE are not included, because they
                 ;; were passed to us in previous calls, and are not
                 ;; part of the bytevector range that we are now being
                 ;; asked to write.  However, it's important to note
                 ;; that if 'write!' did not accept the bytes from
                 ;; STATE, 'bytes-accepted' will be negative.  We must
                 ;; handle that case specially below.
                 (bytes-accepted (- count (string-utf8-length
                                           (substring buffer
                                                      chars-accepted
                                                      char-count)))))
            ;; If 'bytes-accepted' is negative, that means the bytes
            ;; from STATE were not written.  This can only happen if
            ;; 'chars-accepted' is 0, because 'write!' can only accept
            ;; whole code points, and the bytes from STATE are part of
            ;; at most a single code point.  In this case, we must
            ;; leave STATE unchanged and return 0.
            (if (negative? bytes-accepted)
                0
                (begin
                  (set! state new-state)
                  bytes-accepted))))))
    (define (binary-close)
      (set! buffer #f)
      (when close (close)))
    (define port
      (make-custom-binary-output-port id
                                      binary-write!
                                      get-position
                                      set-position!
                                      binary-close))
    ;; Always use UTF-8 as the encoding for custom textual ports, as
    ;; an internal implementation detail, to ensure that all Unicode
    ;; characters will pass through regardless of the current locale.
    (set-port-encoding! port "UTF-8")
    port))

next prev parent reply	other threads:[~2019-05-04 18:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-20 23:53 bug#35350: Some compile output still leaks through with --verbosity=1 Mark H Weaver
2019-04-21 20:15 ` Ludovic Courtès
2019-04-22 23:52   ` Mark H Weaver
2019-04-23  8:45     ` Mark H Weaver
2019-04-23 10:12     ` Ludovic Courtès
2019-04-26 19:09       ` Mark H Weaver
2019-04-27  0:45         ` Mark H Weaver
2019-04-27  7:56           ` Mark H Weaver
2019-04-27 16:36         ` Ludovic Courtès
2019-04-30 20:26           ` Mark H Weaver
2019-05-04  9:33             ` Ludovic Courtès
2019-05-04 18:53               ` Mark H Weaver [this message]
2021-09-20  5:44                 ` Sarah Morgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tveauk2u.fsf@netris.org \
    --to=mhw@netris.org \
    --cc=35350@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.