From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: 01/02: utils: Change 'patch-shebangs' to use binary input. Date: Sat, 28 Feb 2015 15:50:23 +0100 Message-ID: <87vbimjdjk.fsf@gnu.org> References: <20150228001057.17733.82336@vcs.savannah.gnu.org> <87r3tay7xv.fsf@netris.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:57212) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YRiis-00071u-0H for guix-devel@gnu.org; Sat, 28 Feb 2015 09:50:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YRiio-0007L5-MS for guix-devel@gnu.org; Sat, 28 Feb 2015 09:50:29 -0500 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:39617) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YRiio-0007L1-J4 for guix-devel@gnu.org; Sat, 28 Feb 2015 09:50:26 -0500 In-Reply-To: <87r3tay7xv.fsf@netris.org> (Mark H. Weaver's message of "Fri, 27 Feb 2015 23:30:04 -0500") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Mark H Weaver Cc: guix-devel@gnu.org Mark H Weaver skribis: > Ludovic Court=C3=A8s writes: > >> commit ca1e3ad2faa59d5b32289f84e0937fa476e21a1a >> Author: Ludovic Court=C3=A8s >> Date: Sat Feb 28 01:01:51 2015 +0100 >> >> utils: Change 'patch-shebangs' to use binary input. >>=20=20=20=20=20 >> * guix/build/utils.scm (get-char*): New procedure. >> (patch-shebang): Use it instead of 'read-char'. >> (fold-port-matches): Remove local 'get-char' and use 'get-char*' >> instead. >> --- >> guix/build/utils.scm | 22 +++++++++++----------- >> 1 files changed, 11 insertions(+), 11 deletions(-) >> >> diff --git a/guix/build/utils.scm b/guix/build/utils.scm >> index a3f8911..c98c4ca 100644 >> --- a/guix/build/utils.scm >> +++ b/guix/build/utils.scm >> @@ -618,6 +618,14 @@ transferred and the continuation of the transfer as= a thunk." >> (stat:atimensec stat) >> (stat:mtimensec stat))) >>=20=20 >> +(define (get-char* p) >> + ;; We call it `get-char', but that's really a binary version >> + ;; thereof. (The real `get-char' cannot be used here because our >> + ;; bootstrap Guile is hacked to always use UTF-8.) >> + (match (get-u8 p) >> + ((? integer? x) (integer->char x)) >> + (x x))) >> + > > This is equivalent to reading with the ISO-8859-1 encoding. The problem > is that the procedures that use 'get-char*' will then typically use > UTF-8 to write these characters back, so all non-ASCII characters will > get corrupted by these filters. > > For now, I would suggest just using ISO-8859-1 for all of these build > utilities that filter or substitute existing files, and then use the > textual I/O procedures. The difficulty is that ISO-8859-1 is not available during bootstrap, due to guile-default-utf8.patch. Commit dd0a8ef asks for ISO-8859-1 in the patch-* procedures, as you suggest, but in reality during bootstrap what happens is not exactly that. If the bootstrap glibc had statically-linked gconv modules, we could get rid of guile-default-utf8.patch. > A better solution going forward would be to implement and use a > permissive UTF-8 encoding in Guile. Probably, although it=E2=80=99s not completely clear to me how that would w= ork. I suppose the idea would be to change to ISO-8859-1 when an invalid byte sequence is encountered? Ludo=E2=80=99.