From: "Ludovic Courtès" <ludo@gnu.org>
To: Mark H Weaver <mhw@netris.org>
Cc: 35350@debbugs.gnu.org
Subject: bug#35350: Some compile output still leaks through with --verbosity=1
Date: Sat, 27 Apr 2019 18:36:34 +0200 [thread overview]
Message-ID: <874l6jh0bx.fsf@gnu.org> (raw)
In-Reply-To: <87k1fgh9c0.fsf@netris.org> (Mark H. Weaver's message of "Fri, 26 Apr 2019 15:09:24 -0400")
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> The third read(2) call here ends on a partial UTF-8 sequence for LEFT
>> SINGLE QUOTATION MARK (we get the first two bytes of a three byte
>> sequence.)
>>
>> What happens is that ‘process-stderr’ in (guix store) gets that byte
>> string from the daemon, passes it through ‘read-maybe-utf8-string’,
>> which replaces the last two bytes with REPLACEMENT CHARACTER, which is
>> itself a 3-byte sequence.
>
> It seems to me that what's needed here is to save the UTF-8 decoder
> state between calls to 'process-stderr'.
So there are two things. To fix the issue you reported (build output
that goes through), I think we must simply turn off UTF-8 decoding from
‘process-stderr’ and leave that entirely to ‘build-event-output-port’.
However, ‘build-event-output-port’ would still fail to properly decode
split UTF-8 sequences, and for that we’d need to preserve decoder state
as you describe.
> Coincidentally, I also needed something like this a week ago, when I
> tried implementing R6RS custom textual input/output ports on top of
> R6RS custom binary input/output ports.
>
> To meet these needs, I've implemented a fairly efficient, purely
> functional UTF-8 decoder in Scheme that accepts a decoder state and an
> arbitrary range from a bytevector, and returns a new decoder state.
> There's a macro that allows arbitrary actions to be performed when a
> code point (or maximal subpart in the case of errors) is found.
>
> This macro is then used to implement a decoder (utf8->string!) that
> writes into an arbitrary range of an existing string. Of course, it's
> not purely functional, but it avoids heap allocation when compiled with
> Guile. On my Thinkpad X200, it can process around 10 megabytes per
> second.
>
> The state is represented as an exact integer between 0 and #xF48FBF
> inclusive, which are simply the bytes that have been seen so far in the
> current code sequence, in big-endian order, or 0 for the start state.
> For example, #xF48FBF represents the state where the bytes (F4 8F BF)
> have been read. The state is always either 0 or a proper prefix of a
> valid UTF-8 byte sequence.
Awesome! I think that’s something we should definitely add to Guile
proper. We can use it in Guix before or after it’s included in Guile.
Thank you!
Ludo’.
next prev parent reply other threads:[~2019-04-27 16:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-20 23:53 bug#35350: Some compile output still leaks through with --verbosity=1 Mark H Weaver
2019-04-21 20:15 ` Ludovic Courtès
2019-04-22 23:52 ` Mark H Weaver
2019-04-23 8:45 ` Mark H Weaver
2019-04-23 10:12 ` Ludovic Courtès
2019-04-26 19:09 ` Mark H Weaver
2019-04-27 0:45 ` Mark H Weaver
2019-04-27 7:56 ` Mark H Weaver
2019-04-27 16:36 ` Ludovic Courtès [this message]
2019-04-30 20:26 ` Mark H Weaver
2019-05-04 9:33 ` Ludovic Courtès
2019-05-04 18:53 ` Mark H Weaver
2021-09-20 5:44 ` Sarah Morgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874l6jh0bx.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=35350@debbugs.gnu.org \
--cc=mhw@netris.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).