all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 35350@debbugs.gnu.org
Subject: bug#35350: Some compile output still leaks through with --verbosity=1
Date: Tue, 30 Apr 2019 16:26:32 -0400	[thread overview]
Message-ID: <87imuvme7g.fsf@netris.org> (raw)
In-Reply-To: <874l6jh0bx.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Sat, 27 Apr 2019 18:36:34 +0200")

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>>
>>> The third read(2) call here ends on a partial UTF-8 sequence for LEFT
>>> SINGLE QUOTATION MARK (we get the first two bytes of a three byte
>>> sequence.)
>>>
>>> What happens is that ‘process-stderr’ in (guix store) gets that byte
>>> string from the daemon, passes it through ‘read-maybe-utf8-string’,
>>> which replaces the last two bytes with REPLACEMENT CHARACTER, which is
>>> itself a 3-byte sequence.
>>
>> It seems to me that what's needed here is to save the UTF-8 decoder
>> state between calls to 'process-stderr'.
>
> So there are two things.  To fix the issue you reported (build output
> that goes through), I think we must simply turn off UTF-8 decoding from
> ‘process-stderr’ and leave that entirely to ‘build-event-output-port’.

Can we assume that UTF-8 is the appropriate encoding for
(current-build-output-port)?  My interpretation of the Guix manual entry
for 'current-build-output-port' suggests that the answer should be "no".

Also, in your previous message you wrote:

  The problem is the first layer of UTF-8 decoding that happens in
  ‘process-stderr’, in the ‘%stderr-next’ case.  We would need to
  disable it, but only if the build output port is
  ‘build-event-output-port’ (i.e., it’s capable of interpreting
  “multiplexed build output” correctly.)

It sounds like you're suggesting that 'process-stderr' should look to
see if (current-build-output-port) is a 'build-event-output-port', and
in that case it should use binary I/O primitives to write raw binary
data to it, otherwise it should use text I/O primitives and write
characters to it.  Do I understand correctly?

IMO, it would be cleaner to treat 'build-event-output-port' uniformly,
and specifically as a textual port of unknown encoding.

What do you think?

> However, ‘build-event-output-port’ would still fail to properly decode
> split UTF-8 sequences, and for that we’d need to preserve decoder
> state as you describe.

I would suggest changing 'build-event-output-port' to create an R6RS
custom *textual* output port, so that it wouldn't have to worry about
encodings at all, and it would only be given whole characters.
Internally, it would be doing exactly what you suggest above, but those
details would be encapsulated within the custom textual port.

However, I don't think we can use Guile's current implementation of R6RS
custom textual output ports, which are currently built on Guile's legacy
soft ports, which I suspect have a similar bug with multibyte characters
sometimes being split (see 'soft_port_write' in vports.c).

Having said all of this, my suggestions would ultimately entail having
two separate places along the stderr pipeline where 'utf8->string!'
would be used, and maybe that's too much until we have a more optimized
C implementation of it.

Thoughts?

      Mark

  reply	other threads:[~2019-04-30 20:37 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-20 23:53 bug#35350: Some compile output still leaks through with --verbosity=1 Mark H Weaver
2019-04-21 20:15 ` Ludovic Courtès
2019-04-22 23:52   ` Mark H Weaver
2019-04-23  8:45     ` Mark H Weaver
2019-04-23 10:12     ` Ludovic Courtès
2019-04-26 19:09       ` Mark H Weaver
2019-04-27  0:45         ` Mark H Weaver
2019-04-27  7:56           ` Mark H Weaver
2019-04-27 16:36         ` Ludovic Courtès
2019-04-30 20:26           ` Mark H Weaver [this message]
2019-05-04  9:33             ` Ludovic Courtès
2019-05-04 18:53               ` Mark H Weaver
2021-09-20  5:44                 ` Sarah Morgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87imuvme7g.fsf@netris.org \
    --to=mhw@netris.org \
    --cc=35350@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.