* Re: 01/02: utils: Change 'patch-shebangs' to use binary input.
[not found] ` <E1YRUzi-0004cx-0N@vcs.savannah.gnu.org>
@ 2015-02-28 4:30 ` Mark H Weaver
2015-02-28 9:51 ` Andreas Enge
2015-02-28 14:50 ` Ludovic Courtès
0 siblings, 2 replies; 5+ messages in thread
From: Mark H Weaver @ 2015-02-28 4:30 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
Ludovic Courtès <ludo@gnu.org> writes:
> commit ca1e3ad2faa59d5b32289f84e0937fa476e21a1a
> Author: Ludovic Courtès <ludo@gnu.org>
> Date: Sat Feb 28 01:01:51 2015 +0100
>
> utils: Change 'patch-shebangs' to use binary input.
>
> * guix/build/utils.scm (get-char*): New procedure.
> (patch-shebang): Use it instead of 'read-char'.
> (fold-port-matches): Remove local 'get-char' and use 'get-char*'
> instead.
> ---
> guix/build/utils.scm | 22 +++++++++++-----------
> 1 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/guix/build/utils.scm b/guix/build/utils.scm
> index a3f8911..c98c4ca 100644
> --- a/guix/build/utils.scm
> +++ b/guix/build/utils.scm
> @@ -618,6 +618,14 @@ transferred and the continuation of the transfer as a thunk."
> (stat:atimensec stat)
> (stat:mtimensec stat)))
>
> +(define (get-char* p)
> + ;; We call it `get-char', but that's really a binary version
> + ;; thereof. (The real `get-char' cannot be used here because our
> + ;; bootstrap Guile is hacked to always use UTF-8.)
> + (match (get-u8 p)
> + ((? integer? x) (integer->char x))
> + (x x)))
> +
This is equivalent to reading with the ISO-8859-1 encoding. The problem
is that the procedures that use 'get-char*' will then typically use
UTF-8 to write these characters back, so all non-ASCII characters will
get corrupted by these filters.
For now, I would suggest just using ISO-8859-1 for all of these build
utilities that filter or substitute existing files, and then use the
textual I/O procedures. A better solution going forward would be to
implement and use a permissive UTF-8 encoding in Guile.
What do you think?
Mark
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 01/02: utils: Change 'patch-shebangs' to use binary input.
2015-02-28 4:30 ` 01/02: utils: Change 'patch-shebangs' to use binary input Mark H Weaver
@ 2015-02-28 9:51 ` Andreas Enge
2015-02-28 11:07 ` Andreas Enge
2015-02-28 14:50 ` Ludovic Courtès
1 sibling, 1 reply; 5+ messages in thread
From: Andreas Enge @ 2015-02-28 9:51 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guix-devel
By the way, the latest modifications broke core-updates:
http://hydra.gnu.org/build/262811/nixlog/2/tail-reload
ends with
In ice-9/regex.scm:
189: 3 [list-matches # ...]
176: 2 [fold-matches # ...]
In unknown file:
?: 1 [regexp-exec # ...]
In ice-9/boot-9.scm:
106: 0 [#<procedure a15cf60 at ice-9/boot-9.scm:97:6 (thrown-k . args)> encoding-error ...]
ice-9/boot-9.scm:106:20: In procedure #<procedure a15cf60 at ice-9/boot-9.scm:97:6 (thrown-k . args)>:
ice-9/boot-9.scm:106:20: Throw to key `encoding-error' with args `("scm_to_stringn" "cannot convert narrow string to output locale" 84 #f #f)'.
This looks related...
Andreas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 01/02: utils: Change 'patch-shebangs' to use binary input.
2015-02-28 9:51 ` Andreas Enge
@ 2015-02-28 11:07 ` Andreas Enge
2015-02-28 11:11 ` Andreas Enge
0 siblings, 1 reply; 5+ messages in thread
From: Andreas Enge @ 2015-02-28 11:07 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guix-devel
For the record, I tried to work on top of
commit f1886b51bd86bd80a47c5b4aafc16039126315e8
gnu: cmake: Update to 3.1.3.
of core-updates. There I get a test failure
============================================================================
Testsuite summary for gettext-tools 0.19.4
============================================================================
# TOTAL: 397
# PASS: 357
# SKIP: 38
# XFAIL: 0
# FAIL: 2
# XPASS: 0
# ERROR: 0
Just in case we need to trace back problems.
Andreas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 01/02: utils: Change 'patch-shebangs' to use binary input.
2015-02-28 11:07 ` Andreas Enge
@ 2015-02-28 11:11 ` Andreas Enge
0 siblings, 0 replies; 5+ messages in thread
From: Andreas Enge @ 2015-02-28 11:11 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guix-devel
In fact, this one has been fixed in e8c9f0498f9f3ead4ea345d49f1c5e630ff158f8.
So please disregard my message.
Andreas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 01/02: utils: Change 'patch-shebangs' to use binary input.
2015-02-28 4:30 ` 01/02: utils: Change 'patch-shebangs' to use binary input Mark H Weaver
2015-02-28 9:51 ` Andreas Enge
@ 2015-02-28 14:50 ` Ludovic Courtès
1 sibling, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2015-02-28 14:50 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guix-devel
Mark H Weaver <mhw@netris.org> skribis:
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> commit ca1e3ad2faa59d5b32289f84e0937fa476e21a1a
>> Author: Ludovic Courtès <ludo@gnu.org>
>> Date: Sat Feb 28 01:01:51 2015 +0100
>>
>> utils: Change 'patch-shebangs' to use binary input.
>>
>> * guix/build/utils.scm (get-char*): New procedure.
>> (patch-shebang): Use it instead of 'read-char'.
>> (fold-port-matches): Remove local 'get-char' and use 'get-char*'
>> instead.
>> ---
>> guix/build/utils.scm | 22 +++++++++++-----------
>> 1 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/guix/build/utils.scm b/guix/build/utils.scm
>> index a3f8911..c98c4ca 100644
>> --- a/guix/build/utils.scm
>> +++ b/guix/build/utils.scm
>> @@ -618,6 +618,14 @@ transferred and the continuation of the transfer as a thunk."
>> (stat:atimensec stat)
>> (stat:mtimensec stat)))
>>
>> +(define (get-char* p)
>> + ;; We call it `get-char', but that's really a binary version
>> + ;; thereof. (The real `get-char' cannot be used here because our
>> + ;; bootstrap Guile is hacked to always use UTF-8.)
>> + (match (get-u8 p)
>> + ((? integer? x) (integer->char x))
>> + (x x)))
>> +
>
> This is equivalent to reading with the ISO-8859-1 encoding. The problem
> is that the procedures that use 'get-char*' will then typically use
> UTF-8 to write these characters back, so all non-ASCII characters will
> get corrupted by these filters.
>
> For now, I would suggest just using ISO-8859-1 for all of these build
> utilities that filter or substitute existing files, and then use the
> textual I/O procedures.
The difficulty is that ISO-8859-1 is not available during bootstrap, due
to guile-default-utf8.patch.
Commit dd0a8ef asks for ISO-8859-1 in the patch-* procedures, as you
suggest, but in reality during bootstrap what happens is not exactly
that.
If the bootstrap glibc had statically-linked gconv modules, we could get
rid of guile-default-utf8.patch.
> A better solution going forward would be to implement and use a
> permissive UTF-8 encoding in Guile.
Probably, although it’s not completely clear to me how that would work.
I suppose the idea would be to change to ISO-8859-1 when an invalid byte
sequence is encountered?
Ludo’.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-02-28 14:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20150228001057.17733.82336@vcs.savannah.gnu.org>
[not found] ` <E1YRUzi-0004cx-0N@vcs.savannah.gnu.org>
2015-02-28 4:30 ` 01/02: utils: Change 'patch-shebangs' to use binary input Mark H Weaver
2015-02-28 9:51 ` Andreas Enge
2015-02-28 11:07 ` Andreas Enge
2015-02-28 11:11 ` Andreas Enge
2015-02-28 14:50 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).