* Re: 01/02: utils: Change 'patch-shebangs' to use binary input. [not found] ` <E1YRUzi-0004cx-0N@vcs.savannah.gnu.org> @ 2015-02-28 4:30 ` Mark H Weaver 2015-02-28 9:51 ` Andreas Enge 2015-02-28 14:50 ` Ludovic Courtès 0 siblings, 2 replies; 5+ messages in thread From: Mark H Weaver @ 2015-02-28 4:30 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel Ludovic Courtès <ludo@gnu.org> writes: > commit ca1e3ad2faa59d5b32289f84e0937fa476e21a1a > Author: Ludovic Courtès <ludo@gnu.org> > Date: Sat Feb 28 01:01:51 2015 +0100 > > utils: Change 'patch-shebangs' to use binary input. > > * guix/build/utils.scm (get-char*): New procedure. > (patch-shebang): Use it instead of 'read-char'. > (fold-port-matches): Remove local 'get-char' and use 'get-char*' > instead. > --- > guix/build/utils.scm | 22 +++++++++++----------- > 1 files changed, 11 insertions(+), 11 deletions(-) > > diff --git a/guix/build/utils.scm b/guix/build/utils.scm > index a3f8911..c98c4ca 100644 > --- a/guix/build/utils.scm > +++ b/guix/build/utils.scm > @@ -618,6 +618,14 @@ transferred and the continuation of the transfer as a thunk." > (stat:atimensec stat) > (stat:mtimensec stat))) > > +(define (get-char* p) > + ;; We call it `get-char', but that's really a binary version > + ;; thereof. (The real `get-char' cannot be used here because our > + ;; bootstrap Guile is hacked to always use UTF-8.) > + (match (get-u8 p) > + ((? integer? x) (integer->char x)) > + (x x))) > + This is equivalent to reading with the ISO-8859-1 encoding. The problem is that the procedures that use 'get-char*' will then typically use UTF-8 to write these characters back, so all non-ASCII characters will get corrupted by these filters. For now, I would suggest just using ISO-8859-1 for all of these build utilities that filter or substitute existing files, and then use the textual I/O procedures. A better solution going forward would be to implement and use a permissive UTF-8 encoding in Guile. What do you think? Mark ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 01/02: utils: Change 'patch-shebangs' to use binary input. 2015-02-28 4:30 ` 01/02: utils: Change 'patch-shebangs' to use binary input Mark H Weaver @ 2015-02-28 9:51 ` Andreas Enge 2015-02-28 11:07 ` Andreas Enge 2015-02-28 14:50 ` Ludovic Courtès 1 sibling, 1 reply; 5+ messages in thread From: Andreas Enge @ 2015-02-28 9:51 UTC (permalink / raw) To: Mark H Weaver; +Cc: guix-devel By the way, the latest modifications broke core-updates: http://hydra.gnu.org/build/262811/nixlog/2/tail-reload ends with In ice-9/regex.scm: 189: 3 [list-matches # ...] 176: 2 [fold-matches # ...] In unknown file: ?: 1 [regexp-exec # ...] In ice-9/boot-9.scm: 106: 0 [#<procedure a15cf60 at ice-9/boot-9.scm:97:6 (thrown-k . args)> encoding-error ...] ice-9/boot-9.scm:106:20: In procedure #<procedure a15cf60 at ice-9/boot-9.scm:97:6 (thrown-k . args)>: ice-9/boot-9.scm:106:20: Throw to key `encoding-error' with args `("scm_to_stringn" "cannot convert narrow string to output locale" 84 #f #f)'. This looks related... Andreas ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 01/02: utils: Change 'patch-shebangs' to use binary input. 2015-02-28 9:51 ` Andreas Enge @ 2015-02-28 11:07 ` Andreas Enge 2015-02-28 11:11 ` Andreas Enge 0 siblings, 1 reply; 5+ messages in thread From: Andreas Enge @ 2015-02-28 11:07 UTC (permalink / raw) To: Mark H Weaver; +Cc: guix-devel For the record, I tried to work on top of commit f1886b51bd86bd80a47c5b4aafc16039126315e8 gnu: cmake: Update to 3.1.3. of core-updates. There I get a test failure ============================================================================ Testsuite summary for gettext-tools 0.19.4 ============================================================================ # TOTAL: 397 # PASS: 357 # SKIP: 38 # XFAIL: 0 # FAIL: 2 # XPASS: 0 # ERROR: 0 Just in case we need to trace back problems. Andreas ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 01/02: utils: Change 'patch-shebangs' to use binary input. 2015-02-28 11:07 ` Andreas Enge @ 2015-02-28 11:11 ` Andreas Enge 0 siblings, 0 replies; 5+ messages in thread From: Andreas Enge @ 2015-02-28 11:11 UTC (permalink / raw) To: Mark H Weaver; +Cc: guix-devel In fact, this one has been fixed in e8c9f0498f9f3ead4ea345d49f1c5e630ff158f8. So please disregard my message. Andreas ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 01/02: utils: Change 'patch-shebangs' to use binary input. 2015-02-28 4:30 ` 01/02: utils: Change 'patch-shebangs' to use binary input Mark H Weaver 2015-02-28 9:51 ` Andreas Enge @ 2015-02-28 14:50 ` Ludovic Courtès 1 sibling, 0 replies; 5+ messages in thread From: Ludovic Courtès @ 2015-02-28 14:50 UTC (permalink / raw) To: Mark H Weaver; +Cc: guix-devel Mark H Weaver <mhw@netris.org> skribis: > Ludovic Courtès <ludo@gnu.org> writes: > >> commit ca1e3ad2faa59d5b32289f84e0937fa476e21a1a >> Author: Ludovic Courtès <ludo@gnu.org> >> Date: Sat Feb 28 01:01:51 2015 +0100 >> >> utils: Change 'patch-shebangs' to use binary input. >> >> * guix/build/utils.scm (get-char*): New procedure. >> (patch-shebang): Use it instead of 'read-char'. >> (fold-port-matches): Remove local 'get-char' and use 'get-char*' >> instead. >> --- >> guix/build/utils.scm | 22 +++++++++++----------- >> 1 files changed, 11 insertions(+), 11 deletions(-) >> >> diff --git a/guix/build/utils.scm b/guix/build/utils.scm >> index a3f8911..c98c4ca 100644 >> --- a/guix/build/utils.scm >> +++ b/guix/build/utils.scm >> @@ -618,6 +618,14 @@ transferred and the continuation of the transfer as a thunk." >> (stat:atimensec stat) >> (stat:mtimensec stat))) >> >> +(define (get-char* p) >> + ;; We call it `get-char', but that's really a binary version >> + ;; thereof. (The real `get-char' cannot be used here because our >> + ;; bootstrap Guile is hacked to always use UTF-8.) >> + (match (get-u8 p) >> + ((? integer? x) (integer->char x)) >> + (x x))) >> + > > This is equivalent to reading with the ISO-8859-1 encoding. The problem > is that the procedures that use 'get-char*' will then typically use > UTF-8 to write these characters back, so all non-ASCII characters will > get corrupted by these filters. > > For now, I would suggest just using ISO-8859-1 for all of these build > utilities that filter or substitute existing files, and then use the > textual I/O procedures. The difficulty is that ISO-8859-1 is not available during bootstrap, due to guile-default-utf8.patch. Commit dd0a8ef asks for ISO-8859-1 in the patch-* procedures, as you suggest, but in reality during bootstrap what happens is not exactly that. If the bootstrap glibc had statically-linked gconv modules, we could get rid of guile-default-utf8.patch. > A better solution going forward would be to implement and use a > permissive UTF-8 encoding in Guile. Probably, although it’s not completely clear to me how that would work. I suppose the idea would be to change to ISO-8859-1 when an invalid byte sequence is encountered? Ludo’. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-02-28 14:50 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20150228001057.17733.82336@vcs.savannah.gnu.org> [not found] ` <E1YRUzi-0004cx-0N@vcs.savannah.gnu.org> 2015-02-28 4:30 ` 01/02: utils: Change 'patch-shebangs' to use binary input Mark H Weaver 2015-02-28 9:51 ` Andreas Enge 2015-02-28 11:07 ` Andreas Enge 2015-02-28 11:11 ` Andreas Enge 2015-02-28 14:50 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).